# MLF Week 3: Neural Networks Part 1 - Foundations

The notebook accompanies the **MLF Week 3 Slides**, with a focus on the foundations of neural networks, including **perceptron and single-layer networks**, **activation functions**, **forward propagation**, and **loss functions**, and their corresponding implementation in PyTorch.  

**You will:**
- Define simple PyTorch models using **torch.nn.Linear**, and activation functions (**torch.nn.ReLU**, **torch.nn.tanh**)
- Manually compute forward passes
- Implement a simple 2-hidden-layer network to solve a basic classification task
- Plot decision boundaries and predictions
- Build and train an MLP for classification using the MNIST dataset 

In [None]:
#Install the necessary libraries
%pip install torch
%pip install torchvision
%pip install matplotlib

In [None]:
#Import the necessary libraries that were installed 
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import matplotlib.pyplot as plt

# Building Neural Networks

Using PyTorch, we can build neural networks in a layer-by-layer manner. These layers each perform operations on our input data. The **torch.nn** package provides all the necessary building blocks to build any neural network.

We define the neural network as **python class** which subclasses the **nn.Module** class. For a simple feed-forward neural network (FFNN), we initalize the layers in **\__self__** and define a method called **forward** to compute the forward pass. 

Note: the function that computes the forward pass **must be named forward** otherwise the nn.Module class will not know how to use it to compute the forward pass. 

The following is an example of a simple neural network that consists of 1 input layer, 1 hidden layer, and 1 output layer.

In [None]:
class simpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(in_features=2, out_features=5) #input_layer dim = 2, hidden_layer dim = 5
        self.relu = nn.ReLU #ReLU activation function
        self.layer2 = nn.Linear(in_features=5, out_features=1) #hidden_layer dim = 5, output_layer dim = 1
    
    def forward(self, x):
        x = self.layer1(x) #data passes from input to hidden layer
        x = self.relu(x) #relu activation function is applied to output of hidden layer
        x = self.layer2(x) #data passes from hidden layer to output layer
        return x #output from hidden layer

## Exercise 1: Now it's your turn!

TODO: Define and build a simple PyTorch neural network model using torch.nn.Linear, torch.nn.ReLU, torch.nn.tanh:


In [None]:
#A sample solution could be:
class NN(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(in_features=2, out_features=8) 
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(in_features=8, out_features=16) 
        self.layer3 = nn.Linear(in_features=16, out_features=1)
        self.tanh = nn.Tanh()
    
    def forward(self, x):
        x = self.layer1(x)
        x = self.relu(x)
        x = self.layer2(x)
        x = self.relu(x)
        x = self.layer3(x)
        x = self.tanh(x)
        return x

## Forward propagation

As a reminder, forward propagation refers to how input data flows the the network and is transformed by each layer it flows through.

Forward propagation can a given layer $l$ can be represented using the following formula: 
\begin{equation}
\mathbf{z}^{[l]} = \phi^{[l]}\left(\mathbf{W}^{[l]} \mathbf{x}^{[l-1]} + \mathbf{b}^{[l]}  \right)
\end{equation}
Where:
- $ \mathbf{x}^{[l-1]} $ is the input to given layer $l$ / post-activation output from the previous layer $l-1$
- $ \mathbf{W}^{[l]} $ and $ \mathbf{b}^{[l]} $ are the weights and biases at layer $ l $
- $ \phi^{[l]} $ is the activation function at layer $ l $



## Exercise 2: Manually compute a forward pass for a simple neural network

TODO: Complete the function that manually computes the forward pass for a simple neural network consisting of 1 hidden layer. For the activation function, use ReLU. Do not apply an activation fucntion to the output.

In [None]:
def manual_forward_pass(x, w1, w2, b1, b2):
    #TODO: Complete the function
    z1 = w1 @ x + b1
    h1 = F.relu(z1)
    z2 = w2 @ h1 + b2
    y = F.relu(z2)
    return y

To check if your implementation of the manual_forward_pass function is correct, please run the following function that will grade your implementation

In [None]:
from utils import test_exercise_2, show_result

res = test_exercise_2(manual_forward_pass)
show_result("Exercise 2", res)

## Exercise 3: Basic classification task

In this exercise you will build a simple 2-hidden-layer neural network to solve a basic classification task. The classification task will be to classify data points that are grouped into two concentric circles, similar to what you have seen in this week's presentation.

 ## 3.1 Generate dataset:

 The dataset is generated and can be visualized using the code below

In [None]:
#Run the following cell to create the dataset

from sklearn.datasets import make_circles

X, y = make_circles(n_samples=2000, factor=0.2, noise=0.1, random_state=42)

In [None]:
#Run the following cell to visualize the data

X_class0 = X[y == 0]
X_class1 = X[y == 1]
plt.figure(figsize=(6, 6))
plt.scatter(X_class0[:, 0], X_class0[:, 1], color='blue', label='Class 0', edgecolor='k', s=40)
plt.scatter(X_class1[:, 0], X_class1[:, 1], color='red', label='Class 1', edgecolor='k', s=40)
plt.title("2D Circle Classification Data")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.grid(True)
plt.legend()
plt.show()

## 3.2 Perform a train-test split

In [None]:
#Train-test split on data

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X = StandardScaler().fit_transform(X)
y = y.reshape(-1, 1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train = torch.FloatTensor(X_train)
y_train = torch.FloatTensor(y_train)
X_test = torch.FloatTensor(X_test)
y_test = torch.FloatTensor(y_test)

## 3.3 Design Neural Network

TODO: Design a neural network called FFNN that has 2 hidden layers that performs binary classification on the dataset we created earlier

Note: Since we are performing a binary classification task, we should have a sigmoid activation function at the output layer

In [None]:
class FFNN(nn.Module):
    #TODO: Build a model that can perform this classification task
    def __init__(self):
        super().__init__()
        #An example solution could be
        self.layer1 = nn.Linear(2, 16)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(16, 32)
        self.layer3 = nn.Linear(32, 1)
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        x = self.layer1(x)
        x = self.relu(x)
        x = self.layer2(x)
        x = self.relu(x)
        x = self.layer3(x)
        x = self.sigmoid(x)
        return x

## 3.4 Instantiate and train the model

The code below instantiates and trains the model. Since we are performing binary classification, the loss function we use should be binary cross-entropy loss. In addition, we are also using Adam as an optimizer to learn the neural network weights and biases. 

Feel free to change the number of epochs to see how it changes the model accuracy.

Do not worry if you are struggling to understand the code, you will learn how neural networks are trained to learn weights and biases through backpropagation in the session next week.

In [None]:
model = FFNN()
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

epochs = 15
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()
    with torch.no_grad():
      preds = (outputs >= 0.5).int()
      correct = (preds == y_train).sum().item()
      total = y_train.size(0)
      accuracy = correct / total

    print(f"{epoch+1}/{epochs}, Loss: {loss.item():.4f}, Accuracy:{accuracy:.4f}")

## 3.5 Visualizing decision boundary and predictions

Run the cell below to see how the neural network classifies data points in the test dataset and visualizes the decision boundary. 

In [None]:
import numpy as np

x_min, x_max = X_test[:, 0].min() - 0.5, X_test[:, 0].max() + 0.5
y_min, y_max = X_test[:, 1].min() - 0.5, X_test[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300),
                      np.linspace(y_min, y_max, 300))

#flatten and create grid tensor
grid = np.c_[xx.ravel(), yy.ravel()]
grid_tensor = torch.FloatTensor(grid)

#model looks at every point on the grid and classifies them as class 0 or class 1 based on what it had learnt
model.eval()
with torch.no_grad():
    probs = model(grid_tensor).reshape(xx.shape).numpy()

#model runs predictions on test data
with torch.no_grad():
    test_probs = model(X_test)
    test_preds = (test_probs >= 0.5).int().squeeze().numpy()
correct = (test_preds == y_test.squeeze())

#plot decision boundary
plt.figure(figsize=(8, 6))
plt.contourf(xx, yy, probs, levels=[0, 0.5, 1], cmap='RdBu', alpha=0.6)

#plot correct predictions (green), incorrect (red)
plt.scatter(X_test[correct, 0], X_test[correct, 1], c='green', label='Correct', edgecolors='k', s=40)
plt.scatter(X_test[~correct, 0], X_test[~correct, 1], c='red', label='Incorrect', edgecolors='k', s=40)

plt.title("Decision Boundary with Test Predictions")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.grid(True)
plt.show()

## Exercise 4: Build and train an MLP for the MNIST dataset

The MNIST dataset consists of 28x28 dimensional images (i.e 2D tensors) of different handwritten digits. In this exercise, you will build a neural network that trains and performs multi-class classification on a more robust dataset. Feel free to experiment with any amount of hidden layers and nodes in your implementation.

Note: Since the input data is 2D, you will need to flatten the input to make it a 1D vector to feed into the neural network model

In [None]:
#Import libarries for dataset
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

## 4.1 Prepare the MNIST data by creating the dataloaders

In [None]:
#Loading the data and performing train-test split
transform = transforms.ToTensor()
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=1024, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1024, shuffle=False)
num_output_classes = 10

## 4.2 Designing the neural network model

TODO: Design a neural network model to perform multi-class classification

In [None]:
class MNISTNN(nn.Module):
    #TODO: Define the MLP Model
    def __init__(self):
        super().__init__()
        #An example solution could be:
        self.flatten = nn.Flatten() 
        self.layer1 = nn.Linear(28*28, 128)
        self.layer2 = nn.Linear(128, 64)
        self.layer3 = nn.Linear(64, num_output_classes)
    
    def forward(self, x):
        x = self.flatten(x)
        x = F.relu(self.layer1(x))
        x = F.relu(self.layer2(x))
        x = self.layer3(x)
        return x

## 4.1 Instantiate and train the model

It may be better to do this exercise in Google Colab to use its GPU capabilities


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

In [None]:
model = MNISTNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

epochs = 10
for epoch in range(epochs):
    model.train()
    train_loss = 0
    correct = 0
    total = 0
    for data, target in train_loader:
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = torch.max(output, 1)
        total += target.size(0)
        correct += (predicted == target).sum().item()
        accuracy = correct / total
    print(f"{epoch+1}/{epochs}, Loss={train_loss/len(train_loader)}, Accuracy={accuracy*100}%")

## 4.2 Evaluate the model


In [None]:
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for data, target in test_loader:
        data, target = data.to(device), target.to(device)
        outputs = model(data)
        _, predicted = torch.max(outputs, 1)
        total += target.size(0)
        correct += (predicted == target).sum().item()

print(f"Test accuracy {100 * correct/total:.2f}%")

## 5 Conclusion

You now have:
- Built a simple neural network using torch.nn.Linear layers and various activation functions
- Manually computed forward passes between layers of a neural network
- Used neural networks to perform classification tasks

**Great job on completing Week 3!**