# Lecture 22: Understanding Neural Networks Architecture

Backpropagation is a cornerstone algorithm for training modern neural networks. At its core, backpropagation is a method for computing gradients of the loss function with respect to the weights of the network efficiently and effectively. This process involves two main steps: a forward pass and a backward pass. 

In the forward pass, data moves through the network, and its output is compared to the expected output to calculate the loss. The backward pass then propagates this loss backward through the network, from the output layer to the input layer, calculating the gradient of the loss function with respect to each weight by the chain rule. This information is then used to update the weights in a direction that minimally reduces the loss, typically through gradient descent or its variants.

Understanding backpropagation is crucial because it underpins the learning process of deep learning models, allowing them to adjust their parameters and improve their predictions based on the feedback from the loss function.


### Set up imports

In [None]:
import pickle
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from skimage.color import rgb2gray
from skimage.filters import sobel

### Load images from CIFAR-10 Dataset

The CIFAR-10 dataset is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. [Learn more about CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html).

In [None]:
# use this function to load the CIFAR-10 dataset from the data folder
def load_cifar_batch(filename):
    """Load a single batch of CIFAR-10."""
    with open(filename, 'rb') as file:
        # The encoding 'bytes' is required for Python 3 compatibility
        batch = pickle.load(file, encoding='bytes')
        images = batch[b'data']
        labels = batch[b'labels']
        # Reshape the images: the dataset is flattened, so you need to reshape it to 32x32x3
        images = images.reshape((len(images), 3, 32, 32)).transpose(0, 2, 3, 1)
        labels = np.array(labels)
        return images, labels

In [None]:
# load batch 1 of CIFAR-10
file_name = 'data/batch_1'

images, labels = load_cifar_batch(file_name)


### Inspect images and labels

Understanding the structure and format of our dataset is crucial. Let’s start by examining the lengths of images and labels to get a sense of the dataset's size.

In [None]:
# look at length of images and labels
print(len(images))
print(len(labels))

### Inspect the first three images

Visualizing our data is just as important as understanding its structure. Let’s display the first few images from our dataset along with their corresponding labels to see what we’re working with.


In [None]:
# Function to show an image
def show_image(img, label):
    plt.imshow(img)
    plt.title(label)
    plt.show()

# Show the first three images
for index in range(3):
    print(index)
    show_image(images[index], labels[index])

### Define labels
Each image in CIFAR-10 is associated with a label from 10 classes. Here, we define a list of label names to make our data more understandable.


In [None]:
label_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
label_names


### Inspect the first label

To further familiarize ourselves with the dataset, let's inspect the label of the first image. This step helps us connect an image with its categorical representation.


In [None]:
# inspect first label
first_label = labels[0]

label_names[first_label]

In [None]:
# look at first image data
images[0]

### Preprocess data

Data preprocessing is a critical step in any machine learning workflow. Here, we'll normalize pixel values to improve our model's convergence during training. We'll also reshape the data to fit our model's input requirements.


In [None]:
# Normalize pixel values to be between 0 and 1
images_normalized = images / 255.0

images_normalized[0]


In [None]:
# inspect the shape of the images
images_normalized.shape


In [None]:
# Flatten the images
images_flattened = images_normalized.reshape(images_normalized.shape[0], -1)

In [None]:
# inspect flattened images
images_flattened.shape

In [None]:
32 * 32 * 3

### Splitting Dataset into Training and Test Sets

In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(images_flattened, labels, test_size=0.2, random_state=42)

In [None]:
# check that it worked
X_train.shape

### Training a Machine Learning Model
This time use an MLPClassifier (Neural Network) with 2 hidden layers with 32 nodes each. Use the 'relu' activation function, and a maximum of 50 iterations.

In [None]:
# Initialize the model
model = MLPClassifier(
  hidden_layer_sizes=(16,16), 
  activation='relu', 
  max_iter=200, 
  random_state=42, 
  verbose=True
)

# Train the model
model.fit(X_train, y_train)

### Inspect the model

In [None]:
# look at all the weights in the model using ._coefs


In [None]:
# Look at the shape of the coefficients (weights) in first layer 


In [None]:
# Look at the shape of the coefficients (weights) in second layer


In [None]:
# print the shape of each layer of weights (coefficients)


In [None]:
# print the shape of each layer of biases (intercepts)


### Plot distributions of weights and biases in layer

In [None]:
# Plot distributions of weights and biases in layer


### Activity 1
How does neural network shape affect the number of parameters?

In [None]:
# Determine the total number of parameters (weights plus biases) in a model with 
# two hidden layers of size 16 and 16, respectively, and an output layer of size 10

In [None]:
# Determine the total number of parameters (weights plus biases) in a model with 
# two hidden layers of size 32 and 32, respectively, and an output layer of size 10

In [None]:
# Write a function that takes in a list of hidden layer sizes and 
# returns the total number of parameters in the model


In [None]:
# Use a loop to fill a dataframe with the number of parameters for models with
# two hidden layers with sizes ranging from 1 to 100, in increments of 10

In [None]:
# Plot the number of parameters as a function of the size of the hidden layers

### Activity 2
Tracking the value of a single parameter over many iterations

In [None]:
# for the model with two hidden layers of size 16 and 16, trained for 150 iterations
# find the first bias value in the first hidden layer


In [None]:
# write a function that takes in a number of iterations,
# and returns the value of the first bias in the first hidden layer

In [None]:
# write a loop that adds the value of the first bias in the first hidden layer
# for models trained for 1 to 150 iterations, in increments of 10

In [None]:
# plot the value of the first bias in the first hidden layer 
# as a function of the number of iterations