Neural networks are about finding (or optimizing) a function that maps inputs to desired outputs. Essentially, they create a function that, for example, takes an image of a cat as input and outputs the label "cat." This process involves learning patterns and relationships in the data to generalize and make accurate predictions.

In [8]:
import numpy as np #The line import numpy as np is a Python statement that imports the numpy 
                 #library and assigns it the alias np. NumPy is a powerful library for numerical
                 #computing in Python and is widely used in fields such as data science, machine 
                 #learning, and scientific computing. It provides support for working with large,
                 #multi-dimensional arrays and matrices, along with a collection of mathematical 
                 #functions to operate on these data structures efficiently.
import os  #The line import os is a Python statement that imports the os module,
           #which is part of Python's standard library. The os module provides a
           #wide range of functions for interacting with the operating system.
           #It acts as a bridge between Python programs and the underlying operating 
           #system, allowing you to perform tasks such as file and directory manipulation, 
           #environment variable access, and process management.
import urllib.request  #The line import urllib.request is a Python statement that imports the
                       #request module from the urllib package. The urllib package is part of Python's 
                       #standard library and provides tools for working with URLs and handling HTTP requests. 
                       #By importing urllib.request, you gain access to a set of functions and classes that allow
                       #you to interact with web resources, such as downloading files, fetching data from APIs, or 
                       #submitting HTTP requests.
import npzviewer # Open and inspect the mnist.npz file using npzviewer
                      

In [9]:
# Download MNIST dataset (mnist.npz) if not already available
if not os.path.exists("mnist.npz"):
    print("Downloading mnist.npz...")
    url = "https://s3.amazonaws.com/img-datasets/mnist.npz"
    urllib.request.urlretrieve(url, "mnist.npz")
    # This code snippet checks if a file named `mnist.npz` exists in the current working directory. 
    # If the file does not exist, it downloads the file from a specified URL. Here's a breakdown of the logic:
    # 1. **File Existence Check**: The condition `if not os.path.exists("mnist.npz")` uses the `os.path.exists` 
    #    function to determine whether the file `mnist.npz` is present. If the file does not exist, the condition 
    #    evaluates to `True`, and the code inside the `if` block is executed.
    # 2. **User Notification**: If the file is missing, the `print` function outputs the message `"Downloading mnist.npz..."` 
    #    to inform the user that the file is being downloaded.
    # 3. **URL Definition**: The variable `url` is assigned the string `"https://s3.amazonaws.com/img-datasets/mnist.npz"`, 
    #    which is the web address where the file can be downloaded. This URL points to a hosted version of the MNIST dataset, 
    #    a popular dataset used for training and testing machine learning models.
    # 4. **File Download**: The `urllib.request.urlretrieve` function is called with two arguments: the `url` and the local 
    #    filename `"mnist.npz"`. This function downloads the file from the specified URL and saves it to the current working 
    #    directory with the name `mnist.npz`.
    # This code is a common pattern for ensuring that required resources, such as datasets, are available before proceeding 
    # with further computations. It avoids redundant downloads by checking for the file's existence first, which can save 
    # timeand  bandwidth.

In [10]:
# Load data from the .npz file
data = np.load("mnist.npz")
x_train = data["x_train"]
y_train = data["y_train"]
x_test = data["x_test"]
y_test = data["y_test"]

# Inspect the contents of the mnist.npz file
print("Keys in the .npz file:", data.files)
for key in data.files:
    print(f"{key}: shape {data[key].shape}, dtype {data[key].dtype}")

# Preprocess data
# Flatten the images and normalize pixel values to [0, 1]
x_train = x_train.reshape(-1, 28 * 28).astype(np.float32) / 255.0
x_test = x_test.reshape(-1, 28 * 28).astype(np.float32) / 255.0
# This code snippet preprocesses the MNIST dataset by flattening the images and normalizing their pixel values. 
# Preprocessing is an essential step in preparing data for machine learning models, as it ensures the data is in a format suitable for training.
# 1. **Flattening the Images**:  
#    The `reshape(-1, 28 * 28)` operation transforms each image from its original 2D shape of `28x28` pixels into a 1D array of `784` pixels. 
#    The `-1` in the first dimension allows NumPy to automatically infer the number of samples based on the total size of the array. 
#    Flattening is necessary because many machine learning models, such as fully connected neural networks, expect input data to be in a 1D format rather than 2D.
# 2. **Normalizing Pixel Values**:  
#    The pixel values in the MNIST dataset typically range from `0` to `255`, representing grayscale intensity. 
#    Dividing by `255.0` scales these values to the range `[0, 1]`. Normalization is important because it ensures that all input features have a consistent scale, 
#    which can improve the convergence and stability of machine learning algorithms.
# 3. **Data Type Conversion**:  
#    The `astype(np.float32)` method converts the pixel values to the `float32` data type. 
#    This is necessary because many machine learning frameworks and models require input data to be in a floating-point format for numerical computations.
# 4. **Separate Processing for Training and Testing Data**:  
#    The code applies the same preprocessing steps to both the training data (`x_train`) and the testing data (`x_test`). 
#    This ensures consistency between the datasets, which is crucial for evaluating the model's performance accurately.
# In summary, this preprocessing step prepares the MNIST images for input into a machine learning model by flattening them into 1D arrays, 
# normalizing their pixel values to a range of `[0, 1]`, and ensuring the data type is compatible with numerical computations. 
# These transformations are standard practices in image-based machine learning workflows.

# One-hot encode the labels
def one_hot(y, num_classes=10):
    return np.eye(num_classes)[y]

y_train_oh = one_hot(y_train, 10)
y_test_oh = one_hot(y_test, 10)
# Imagine you have a list of categories, like fruit types (apple, banana, orange).
# One-hot encoding is a way to convert these categorical labels into a numerical format
# that machine learning algorithms can understand. Instead of using a single number to represent each category,
# we create a vector of zeros with a single '1' in the position corresponding to that category.
# np.eye(num_classes): This creates an identity matrix. For num_classes=10, it generates a 10x10 matrix.
# y_train_oh = one_hot(y_train, 10): This encodes the training labels (y_train) into a one-hot format.
# y_test_oh = one_hot(y_test, 10): This encodes the testing labels (y_test) similarly.
# The resulting y_train_oh and y_test_oh are 2D arrays where each row corresponds to the one-hot encoded representation of a label.

Keys in the .npz file: ['x_test', 'x_train', 'y_train', 'y_test']
x_test: shape (10000, 28, 28), dtype uint8
x_train: shape (60000, 28, 28), dtype uint8
y_train: shape (60000,), dtype uint8
y_test: shape (10000,), dtype uint8


This section is unrelated to neural networks and is included to provide a better understanding of one-hot encoding.

In [14]:
def one_hot_explanation(y, num_classes=5):
    # Create the identity matrix
    identity = np.eye(num_classes)
    
    # Select rows based on labels
    one_hot_encoded = identity[y]
    
    return one_hot_encoded

# Example
labels = [2, 3, 1]
result = one_hot_explanation(labels)
print(result)

[[0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 1. 0. 0. 0.]]


### Gradient Descent Explained with f(x) = x² + 3

#### Understanding the Function
The function `f(x) = x² + 3` represents a parabola (a U-shaped curve).  
- The lowest point of this parabola is the minimum value of the function.  
- In this simple equation, the minimum occurs at `x = 0`, where `f(x) = 3`.  
- Gradient descent helps computers find the minimum of more complex equations.

#### The Gradient (Slope)
To find the slope, we calculate the derivative of the function:  
- The derivative of `f(x) = x² + 3` is `f'(x) = 2x`.  
- This derivative, `2x`, is the gradient. It tells us the slope of the curve at any point `x`.

#### The Gradient Descent Process
1. **Start with a Guess**  
    - Let's start with `x = 2`.  
    - At this point, `f(2) = 2² + 3 = 7`.

2. **Calculate the Gradient**  
    - The gradient at `x = 2` is `f'(2) = 2 * 2 = 4`.  
    - Since the slope is positive, the function is going uphill. To go downhill, we move in the opposite direction.

3. **Take a Step**  
    - Use the formula:  
      `x_new = x_old - (learning_rate * gradient)`  
    - With a learning rate of `0.1`:  
      `x_new = 2 - (0.1 * 4) = 2 - 0.4 = 1.6`.

4. **Repeat**  
    - Now, repeat the process with `x = 1.6`:  
      - `f'(1.6) = 2 * 1.6 = 3.2`  
      - `x_new = 1.6 - (0.1 * 3.2) = 1.6 - 0.32 = 1.28`  
    - Each iteration brings `x` closer to `0`, and `f(x)` closer to `3`.

#### Key Points
- **Learning Rate**:  
  - If the learning rate is too large, we might overshoot the minimum.  
  - If it's too small, convergence will take a long time.  
- **Minimizing**:  
  - Gradient descent finds the `x` value that minimizes `f(x)`.  
  - In this case, the minimum is at `x = 0`.


In [12]:
# Activation functions and their derivatives
def relu(x):
    return np.maximum(0, x)

def relu_derivative(x):
    return (x > 0).astype(np.float32)

def softmax(x):
    # subtract max for numerical stability
    exps = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exps / np.sum(exps, axis=1, keepdims=True)

# Hyperparameters
input_size = 28 * 28
hidden_size = 128
output_size = 10
learning_rate = 0.1
epochs = 10
batch_size = 128

# Initialize weights and biases with He initialization for ReLU layers
W1 = np.random.randn(input_size, hidden_size) * np.sqrt(2.0 / input_size)
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size) * np.sqrt(2.0 / hidden_size)
b2 = np.zeros((1, output_size))

# Training loop
num_samples = x_train.shape[0]
num_batches = num_samples // batch_size

for epoch in range(epochs):
    # Shuffle the training data at the start of each epoch
    indices = np.arange(num_samples)
    np.random.shuffle(indices)
    x_train = x_train[indices]
    y_train_oh = y_train_oh[indices]
    
    epoch_loss = 0.0
    for i in range(num_batches):
        start = i * batch_size
        end = start + batch_size
        x_batch = x_train[start:end]
        y_batch = y_train_oh[start:end]
        
        # Forward pass
        z1 = np.dot(x_batch, W1) + b1          # Linear transformation for hidden layer
        a1 = relu(z1)                          # ReLU activation
        z2 = np.dot(a1, W2) + b2               # Linear transformation for output layer
        a2 = softmax(z2)                       # Softmax activation for probabilities
        
        # Compute cross-entropy loss
        loss = -np.sum(y_batch * np.log(a2 + 1e-8)) / batch_size
        epoch_loss += loss
        
        # Backward pass (gradient computation)
        dz2 = a2 - y_batch                     # Derivative of loss w.r.t. z2
        dW2 = np.dot(a1.T, dz2) / batch_size
        db2 = np.sum(dz2, axis=0, keepdims=True) / batch_size
        
        da1 = np.dot(dz2, W2.T)
        dz1 = da1 * relu_derivative(z1)        # Backprop through ReLU
        dW1 = np.dot(x_batch.T, dz1) / batch_size
        db1 = np.sum(dz1, axis=0, keepdims=True) / batch_size
        
        # Update weights and biases using gradient descent
        W2 -= learning_rate * dW2
        b2 -= learning_rate * db2
        W1 -= learning_rate * dW1
        b1 -= learning_rate * db1

    avg_loss = epoch_loss / num_batches
    print(f"Epoch {epoch+1}/{epochs}, Loss: {avg_loss:.4f}")

# Evaluation on test set
z1_test = np.dot(x_test, W1) + b1
a1_test = relu(z1_test)
z2_test = np.dot(a1_test, W2) + b2
a2_test = softmax(z2_test)
predictions = np.argmax(a2_test, axis=1)
accuracy = np.mean(predictions == y_test)
print("Test accuracy: {:.2f}%".format(accuracy * 100))


Epoch 1/10, Loss: 0.4511
Epoch 2/10, Loss: 0.2528
Epoch 3/10, Loss: 0.2043
Epoch 4/10, Loss: 0.1734
Epoch 5/10, Loss: 0.1511
Epoch 6/10, Loss: 0.1339
Epoch 7/10, Loss: 0.1214
Epoch 8/10, Loss: 0.1104
Epoch 9/10, Loss: 0.1014
Epoch 10/10, Loss: 0.0936
Test accuracy: 96.89%
