## Implementation of a Feed-Forward Neural Network

### Task (a): Constructing a Custom Feed-Forward Neural Network

In this section, we develop a feed-forward neural network (FNN) with three hidden layers using PyTorch. The network follows the given specifications:

- **Input Layer**: Accepts a flattened 28×28 FashionMNIST image (784 input features).
- **Hidden Layers**:
  - **Layer 1**: 512 neurons
  - **Layer 2**: 256 neurons
  - **Layer 3**: 128 neurons
- **Output Layer**: 10 neurons (corresponding to 10 FashionMNIST classes).
- **Activation Function**: Applied after each hidden layer.
- **Dropout**: Used to improve generalization and prevent overfitting.

### Implementation Steps:
1. **Define the `CustomFeedForwardNN` class**:
   - Uses `nn.ModuleList` to handle multiple hidden layers dynamically.
   - Applies the activation function and dropout after each hidden layer.
   - Uses a final linear layer to produce class logits.

2. **Instantiate the Model**:
   - Input size is 784 (28×28).
   - Uses ReLU as the activation function (which can be varied later).
   - Applies a dropout rate of 0.2.

3. **Verify Model Summary**:
   - Uses `torchinfo.summary` to inspect the architecture and parameter count.


In [5]:
import torch.nn as nn
import torch.nn.functional as F

class CustomFeedForwardNN(nn.Module):

  def __init__(self, input_size, num_classes, hidden_dims, dropout, activation_fn):
    super().__init__()

    # Ensure that hidden_dims is a non-empty list
    assert isinstance(hidden_dims, list) and len(hidden_dims) > 0

    # Initialize a ModuleList to store the hidden layers
    self.hidden_layers = nn.ModuleList()

    # Input layer to first hidden layer
    self.hidden_layers.append(nn.Linear(input_size, hidden_dims[0]))

    # Subsequent hidden layers
    for i in range(1, len(hidden_dims)):
      self.hidden_layers.append(nn.Linear(hidden_dims[i-1], hidden_dims[i]))

    # Set up the nonlinearity to use between layers.
    self.nonlinearity = activation_fn

    # Set up the dropout layer.
    self.dropout = nn.Dropout(dropout)

    # Set up the final transform to a distribution over classes.
    self.output_projection = nn.Linear(hidden_dims[-1], num_classes)



  def forward(self, x):
    
    # Apply the hidden layers, nonlinearity, and dropout.
    for hidden_layer in self.hidden_layers:
      x = hidden_layer(x)
      x = self.nonlinearity(x)
      x = self.dropout(x)
      
    # Output logits
    out = self.output_projection(x)

    return out

In [6]:
import torch.nn as nn

# Define the model parameters
input_size = 28 * 28  # For 28x28 pixel images
hidden_dims = [512, 256, 128]
num_classes = 10  # Number of output classes in FashionMNIST
dropout_rate = 0.2
activation_fn = nn.ReLU()  # Example activation function

# Instantiate the model
model = CustomFeedForwardNN(input_size, num_classes, hidden_dims, dropout_rate, activation_fn)


In [7]:
# Verification of model information

from torchinfo import summary
summary(model, input_size=(1,input_size))

Layer (type:depth-idx)                   Output Shape              Param #
CustomFeedForwardNN                      [1, 10]                   --
├─ModuleList: 1-7                        --                        (recursive)
│    └─Linear: 2-1                       [1, 512]                  401,920
├─ReLU: 1-2                              [1, 512]                  --
├─Dropout: 1-3                           [1, 512]                  --
├─ModuleList: 1-7                        --                        (recursive)
│    └─Linear: 2-2                       [1, 256]                  131,328
├─ReLU: 1-5                              [1, 256]                  --
├─Dropout: 1-6                           [1, 256]                  --
├─ModuleList: 1-7                        --                        (recursive)
│    └─Linear: 2-3                       [1, 128]                  32,896
├─ReLU: 1-8                              [1, 128]                  --
├─Dropout: 1-9                           [1,

## Experimenting with Activation Functions and Optimizers

### Task (b): Evaluating Model Performance with Different Configurations

In this section, we experiment with various activation functions and optimizers to evaluate their impact on model performance. The goal is to determine which combination yields the best test accuracy on the FashionMNIST dataset.

### Experimental Setup:
1. **Dataset Loading**:
   - The **FashionMNIST** dataset is loaded and transformed into tensors.
   - Training and testing data are handled using PyTorch `DataLoader` with a batch size of **64**.

2. **Activation Functions Tested**:
   - **ReLU** (Rectified Linear Unit)
   - **Sigmoid**
   - **Tanh**

3. **Optimizers Tested**:
   - **SGD (Stochastic Gradient Descent)** with momentum
   - **Adam (Adaptive Moment Estimation)**

4. **Training and Evaluation**:
   - The model is trained for **5 epochs** using a **cross-entropy loss function**.
   - Training accuracy and loss are monitored per epoch.
   - After training, the model is evaluated on the test set, and the test accuracy is recorded.

5. **Result Storage**:
   - The results (activation function, optimizer type, and test accuracy) are stored in a Pandas DataFrame for comparison.


In [17]:
import torch
from torchvision.transforms import transforms 
from torchvision.datasets import FashionMNIST
from torch.utils.data import DataLoader

### Loading MINST data

# Define Transformation
transform = transforms.ToTensor()

train_dataset = FashionMNIST(root='./torchvision-data',
                             train=True,
                             transform=transform,
                             download=True)

test_dataset = FashionMNIST(root='./torchvision-data', 
                            train=False,
                            transform=transform,
                            download=True)

# Create DataLoaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Simple GPU check
using_GPU = torch.cuda.is_available()
print("Using GPU?", using_GPU)

Using GPU? False


In [14]:
# Activation Functions
data = torch.randn(2, 3)
print(data)

# Nonlinearities are layers too!
relu = nn.ReLU()
print(relu)
print(relu(data))

tanh = nn.Tanh()
print(tanh)
print(tanh(data))

sigmoid = nn.Sigmoid()
print(sigmoid)
print(sigmoid(data))


activations = [nn.ReLU(), nn.Sigmoid(), nn.Tanh()]
optimizers = ["SGD", "Adam"]


tensor([[ 0.7628,  1.3272, -0.0509],
        [ 0.0763,  0.2536,  0.2860]])
ReLU()
tensor([[0.7628, 1.3272, 0.0000],
        [0.0763, 0.2536, 0.2860]])
Tanh()
tensor([[ 0.6427,  0.8686, -0.0509],
        [ 0.0761,  0.2483,  0.2784]])
Sigmoid()
tensor([[0.6820, 0.7904, 0.4873],
        [0.5191, 0.5631, 0.5710]])


In [18]:
import torch
import torch.optim as optim

def train_and_evaluate_model(activation_fn, optimizer_type, learning_rate=0.001, num_epochs=5):
    """
    Trains and evaluates the model using the given activation function and optimizer.
    """
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    # Instantiate the model
    model = CustomFeedForwardNN(input_size=784, num_classes=10, hidden_dims=[512, 256, 128], dropout=0.2, activation_fn=activation_fn)
    model.to(device)

    # Loss Function
    criterion = nn.CrossEntropyLoss()

    # Optimizer Selection
    if optimizer_type == "SGD":
        optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)
    elif optimizer_type == "Adam":
        optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    # Training Loop
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        correct = 0
        total = 0
        
        for images, labels in train_loader:
            images = images.view(images.shape[0], -1).to(device)  # Flatten images
            labels = labels.to(device)

            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            _, predicted = outputs.max(1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)

        train_accuracy = 100 * correct / total
        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}, Accuracy: {train_accuracy:.2f}%")

    # Evaluate Model
    model.eval()
    correct = 0
    total = 0

    with torch.no_grad():
        for images, labels in test_loader:
            images = images.view(images.shape[0], -1).to(device)
            labels = labels.to(device)
            outputs = model(images)
            _, predicted = outputs.max(1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)

    test_accuracy = 100 * correct / total
    print(f"Test Accuracy: {test_accuracy:.2f}%")

    return test_accuracy

In [None]:

activations = {
    "ReLU": nn.ReLU(),
    "Sigmoid": nn.Sigmoid(),
    "Tanh": nn.Tanh()
}

optimizers = ["SGD", "Adam"]

# This line is to test on a single configuration. Before running on all combinations
# train_and_evaluate_model(activation_fn=nn.ReLU(), optimizer_type="SGD")

Epoch [1/5], Loss: 1.8716, Accuracy: 33.16%
Epoch [2/5], Loss: 0.9570, Accuracy: 62.92%
Epoch [3/5], Loss: 0.7659, Accuracy: 71.23%
Epoch [4/5], Loss: 0.6811, Accuracy: 75.46%
Epoch [5/5], Loss: 0.6209, Accuracy: 78.14%
Test Accuracy: 79.55%


79.55

In [None]:
import pandas as pd

results_df = pd.DataFrame(columns=["Activation", "Optimizer", "Test Accuracy (%)"])

for act_name, act_fn in activations.items():
    for opt in optimizers:
        print(f"\n===== Running: Activation={act_name}, Optimizer={opt} =====")
        acc = train_and_evaluate_model(activation_fn=act_fn, optimizer_type=opt)

        # Append results to DataFrame
        results_df = pd.concat([results_df, pd.DataFrame({
            "Activation": [act_name],
            "Optimizer": [opt],
            "Test Accuracy (%)": [acc]
        })], ignore_index=True)

In [None]:
print(results_df)

Part c

In [25]:
import torch.optim as optim

def train_with_scheduler(activation_fn, optimizer_type, learning_rate=0.001, num_epochs=15):
    """
    Train and evaluate the improved model with learning rate scheduling.
    """
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    model = CustomFeedForwardNN(input_size=784, num_classes=10, hidden_dims=[512, 256, 128], dropout=0.2, activation_fn=activation_fn)
    model.to(device)

    criterion = nn.CrossEntropyLoss()
    
    # Optimizer Selection
    if optimizer_type == "SGD":
        optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)
    elif optimizer_type == "Adam":
        optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    # Learning Rate Scheduler
    scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)  # Reduce LR every 5 epochs

    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        correct = 0
        total = 0

        for images, labels in train_loader:
            images = images.view(images.shape[0], -1).to(device)
            labels = labels.to(device)

            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            _, predicted = outputs.max(1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)

        train_accuracy = 100 * correct / total
        scheduler.step()  # Adjust learning rate

        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}, Accuracy: {train_accuracy:.2f}%")

    return model


In [None]:
# Train the improved model
print("Training Improved Model (LR Scheduler, 15 Epochs)")
activation_fn = nn.ReLU()
optimizer = "Adam"

improved_model = train_with_scheduler(activation_fn, optimizer, num_epochs=15)


Training Improved Model (LR Scheduler, 15 Epochs)
