<a href="https://colab.research.google.com/github/BillyWong2755/BillyWong2755-DataScience-GenAI-Submissions/blob/main/Assignment_1/6_02_DNN_101_COMPLETED.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![](https://drive.google.com/uc?export=view&id=1xqQczl0FG-qtNA2_WQYuWePW9oU8irqJ)

# 6.02 Dense Neural Network (with PyTorch)
This will expand on our logistic regression example and take us through building our first neural network. If you haven't already, be sure to check (and if neccessary) switch to GPU processing by clicking Runtime > Change runtime type and selecting GPU. We can test this has worked with the following code:

In [None]:
import torch

# Check for GPU availability
print("Num GPUs Available: ", torch.cuda.device_count())

Num GPUs Available:  1


Hopefully your code shows you have 1 GPU available! Next let's get some data. We'll start with another in-built dataset:

In [None]:
# upload an in-built Python (OK semi-in-built) dataset
from sklearn.datasets import load_diabetes

import pandas as pd
import numpy as np

# import the data
data = load_diabetes()
data

{'data': array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
          0.01990749, -0.01764613],
        [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
         -0.06833155, -0.09220405],
        [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
          0.00286131, -0.02593034],
        ...,
        [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
         -0.04688253,  0.01549073],
        [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
          0.04452873, -0.02593034],
        [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
         -0.00422151,  0.00306441]]),
 'target': array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
         69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
         68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
         87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
        259.,  53., 190., 142.,  75., 142., 155., 225.,  59

We are working on a regression problem, with "structured" data which has already been cleaned and normalised. We can skip the usual cleaning/engineering steps. However, we do need to get the data into PyTorch:

In [None]:
# Convert data to PyTorch tensors
X = torch.tensor(data.data, dtype=torch.float32)
y = torch.tensor(data.target, dtype=torch.float32).reshape(-1, 1) # Reshape y to be a column vector

Now our data is stored in tensors we can do train/test splitting as before (in fact we can use sklearn as before):

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

torch.Size([353, 10]) torch.Size([353, 1])
torch.Size([89, 10]) torch.Size([89, 1])


Now we can set up our batches for training. As we have a nice round 400 let's go with batches of 50 (8 batches in total). We'll also seperate the features and labels:

In [None]:
from torch.utils.data import TensorDataset, DataLoader

# Create TensorDatasets and DataLoaders
train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=50, shuffle=True)

test_dataset = TensorDataset(X_test, y_test)
test_loader = DataLoader(test_dataset, batch_size=50, shuffle=False)

Now its time to build our model. We'll keep it simple ... a model with an input layer of 10 features and then 2x _Dense_ (fully connected) layers each with 5 neurons and ReLU activation. Our output layer will be size=1 given this is a regression problem and we want a single value output per prediction.

This will be easier to understand if you have read through the logistic regression tutorial.

In [None]:
import torch
import torch.nn as nn

# Define the model
class DiabetesModel(nn.Module):
    def __init__(self):
        super(DiabetesModel, self).__init__()
        # we'll set up the layers as a sequence using nn.Sequential
        self.layers = nn.Sequential(

            # first layer will be a linear layer that has 5x neurons
            # (5x sets of linear regression)
            # the layer takes the 10 features as input (i.e. 10, 5)
            nn.Linear(10, 5),

            nn.ReLU(), # ReLU activation

            # second linear layer again has 5 neurons
            # this time taking the input as the output of the last layer
            # (which had 5x neurons)
            nn.Linear(5, 5),

            nn.ReLU(), # ReLU again

            # last linear layer takes the output from the previous 5 neurons
            # this time its a single output with no activation
            # i.e. this is the predicitons (regression)
            nn.Linear(5, 1)
        )

    def forward(self, x):
        return self.layers(x) # pass the data through the layers

As before we need to create a model object, specify the loss (criterion) and an optimiser (which we cover next week):

In [None]:
import torch.optim as optim

# Initialize the model, loss function, and optimizer
model = DiabetesModel()
criterion = nn.MSELoss() # MSE loss function
optimiser = optim.Adam(model.parameters(), lr=0.001)

Now we can train the model. Again, the logistic regression tutorial (6.01) may help you undertstand this:

In [None]:
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Training loop (example - you'll likely want to add more epochs)
epochs = 100 # 100 epochs

for epoch in range(epochs):
  # use the train_loader to pass the inputs (x) and targets (y)
  for inputs, targets in train_loader:
    # pass to the GPU (hopefully)
    inputs, targets = inputs.to(device), targets.to(device)

    # pass model to GPU as well
    model.to(device)

    model.train() # put the model object in train mode
    optimiser.zero_grad() # reset the gradiants
    outputs = model(inputs) # create outputs
    loss = criterion(outputs, targets) # compare with Y to get loss
    loss.backward() # backpropogate the loss (next week)
    optimiser.step() # # update the parameters based on this round of training

  # every 10 steps we will print out the current loss
    if (epoch+1) % 10 == 0: # modular arithmetic
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {round(loss.item(), 4)}')

Epoch [10/100], Loss: 27792.6836
Epoch [10/100], Loss: 29229.0
Epoch [10/100], Loss: 33445.2578
Epoch [10/100], Loss: 34277.4727
Epoch [10/100], Loss: 30698.2168
Epoch [10/100], Loss: 29240.0117
Epoch [10/100], Loss: 23086.3223
Epoch [10/100], Loss: 41652.0742
Epoch [20/100], Loss: 36997.7031
Epoch [20/100], Loss: 31123.291
Epoch [20/100], Loss: 37788.7305
Epoch [20/100], Loss: 24601.3066
Epoch [20/100], Loss: 25494.0449
Epoch [20/100], Loss: 26740.2871
Epoch [20/100], Loss: 23668.1992
Epoch [20/100], Loss: 49332.1055
Epoch [30/100], Loss: 23400.6035
Epoch [30/100], Loss: 28949.0312
Epoch [30/100], Loss: 28942.0488
Epoch [30/100], Loss: 33954.5898
Epoch [30/100], Loss: 34329.9961
Epoch [30/100], Loss: 27848.959
Epoch [30/100], Loss: 27293.4375
Epoch [30/100], Loss: 51907.5312
Epoch [40/100], Loss: 29354.3652
Epoch [40/100], Loss: 26514.3477
Epoch [40/100], Loss: 39807.8398
Epoch [40/100], Loss: 24769.0312
Epoch [40/100], Loss: 31261.3535
Epoch [40/100], Loss: 24624.8711
Epoch [40/100],

We can see loss is significantly lower at the end than it was at the start. However, it is also bouncing around a little still which suggests the model needs more training (100 epochs is not a lot in deep learning terms). However, let's evaluate as before:

In [None]:
# Evaluation (example)
model.eval() # testing mode
mse_values = [] # collect the MSE scores

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs) # predict the test data

        # Calculate Mean Squared Error
        mse = criterion(outputs, targets) # calcualte mse for the batch
        mse_values.append(mse.item()) # add to the list of MSE values

# Calculate and print the average MSE
avg_mse = np.mean(mse_values)
print(f"Average MSE on test set: {avg_mse}")

Average MSE on test set: 19474.7724609375


MSE looks expected given training (no obvious sign of overfitting). However, we probably can get better results with tuning and more epochs.

Let's run the loop again a little differently to collect the predicted values (y_hat) and actuals (y) and add them to a dataset for comparions:

In [None]:
# Evaluation
model.eval()
predictions = []
actuals = []

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)
        predictions.extend(outputs.cpu().numpy())
        actuals.extend(targets.cpu().numpy())

# Create DataFrame
results_df = pd.DataFrame({'Predicted': np.array(predictions).flatten(), 'Actual': np.array(actuals).flatten()})
results_df

Unnamed: 0,Predicted,Actual
0,26.645344,219.0
1,25.103045,70.0
2,26.455063,202.0
3,32.560081,230.0
4,25.477730,111.0
...,...,...
84,22.497879,153.0
85,20.658676,98.0
86,18.699120,37.0
87,19.472189,63.0


Side-by-side, they don't look great. Can you improve them?

<br><br>

## EXERCISE #1
Try increasing the number of epochs to 1,000 (when the model is fairly well trained then the results printed for each 10x epochs will be fairly stable and not change much). Does this give better results?

<br><br>

## EXERCISE #2 (optional)
Try experimenting with the architecture (number of neurons and/or number of layers). Can we reach an optimal architecture?

##EXERCISE #1

In [None]:
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Training loop (example - you'll likely want to add more epochs)
epochs = 1000 # 1000 epochs

for epoch in range(epochs):
  # use the train_loader to pass the inputs (x) and targets (y)
  for inputs, targets in train_loader:
    # pass to the GPU (hopefully)
    inputs, targets = inputs.to(device), targets.to(device)

    # pass model to GPU as well
    model.to(device)

    model.train() # put the model object in train mode
    optimiser.zero_grad() # reset the gradiants
    outputs = model(inputs) # create outputs
    loss = criterion(outputs, targets) # compare with Y to get loss
    loss.backward() # backpropogate the loss (next week)
    optimiser.step() # # update the parameters based on this round of training

  # every 10 steps we will print out the current loss
    if (epoch+1) % 10 == 0: # modular arithmetic
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {round(loss.item(), 4)}')

Epoch [10/1000], Loss: 2831.1997
Epoch [10/1000], Loss: 4241.2793
Epoch [10/1000], Loss: 2508.1794
Epoch [10/1000], Loss: 3424.5417
Epoch [10/1000], Loss: 2878.22
Epoch [10/1000], Loss: 2453.1174
Epoch [10/1000], Loss: 2537.0288
Epoch [10/1000], Loss: 2320.3032
Epoch [20/1000], Loss: 3399.8606
Epoch [20/1000], Loss: 2994.2737
Epoch [20/1000], Loss: 2764.938
Epoch [20/1000], Loss: 2788.2737
Epoch [20/1000], Loss: 3271.467
Epoch [20/1000], Loss: 3175.3723
Epoch [20/1000], Loss: 2555.5181
Epoch [20/1000], Loss: 602.6763
Epoch [30/1000], Loss: 2751.5481
Epoch [30/1000], Loss: 2244.5798
Epoch [30/1000], Loss: 3756.0793
Epoch [30/1000], Loss: 3050.4077
Epoch [30/1000], Loss: 3115.4746
Epoch [30/1000], Loss: 2633.3606
Epoch [30/1000], Loss: 3365.3438
Epoch [30/1000], Loss: 1041.2585
Epoch [40/1000], Loss: 2729.0117
Epoch [40/1000], Loss: 2859.6768
Epoch [40/1000], Loss: 3021.7949
Epoch [40/1000], Loss: 2720.7773
Epoch [40/1000], Loss: 2875.0876
Epoch [40/1000], Loss: 2518.5581
Epoch [40/1000]

In [None]:
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Training loop
epochs = 1000  # 1000 epochs

for epoch in range(epochs):
    for inputs, targets in train_loader:  # Loop through each mini-batch
        inputs, targets = inputs.to(device), targets.to(device)
        model.to(device)

        model.train()  # Set the model to training mode
        optimiser.zero_grad()  # Reset the gradients
        outputs = model(inputs)  # Create outputs
        loss = criterion(outputs, targets)  # Calculate loss
        loss.backward()  # Backpropagate the loss
        optimiser.step()  # Update model parameters

    # Move this print statement outside the inner loop
    if (epoch + 1) % 10 == 0:  # Print every 10 epochs
        print(f'Epoch [{epoch + 1}/{epochs}], Loss: {round(loss.item(), 4)}')

# I think this is what we want to see, rather than a loss result for every mini batch.

Epoch [10/1000], Loss: 6434.3159
Epoch [20/1000], Loss: 607.5897
Epoch [30/1000], Loss: 1833.1444
Epoch [40/1000], Loss: 4755.5967
Epoch [50/1000], Loss: 7128.689
Epoch [60/1000], Loss: 2974.7104
Epoch [70/1000], Loss: 4553.4272
Epoch [80/1000], Loss: 5294.9897
Epoch [90/1000], Loss: 1623.6699
Epoch [100/1000], Loss: 1183.1851
Epoch [110/1000], Loss: 4874.0273
Epoch [120/1000], Loss: 11091.3184
Epoch [130/1000], Loss: 1224.8403
Epoch [140/1000], Loss: 608.1396
Epoch [150/1000], Loss: 2473.0676
Epoch [160/1000], Loss: 6082.0186
Epoch [170/1000], Loss: 4935.4443
Epoch [180/1000], Loss: 4512.9033
Epoch [190/1000], Loss: 926.8905
Epoch [200/1000], Loss: 2647.1196
Epoch [210/1000], Loss: 5413.8213
Epoch [220/1000], Loss: 562.4753
Epoch [230/1000], Loss: 1737.8423
Epoch [240/1000], Loss: 4754.376
Epoch [250/1000], Loss: 2340.3015
Epoch [260/1000], Loss: 5145.7295
Epoch [270/1000], Loss: 1918.2134
Epoch [280/1000], Loss: 4178.7485
Epoch [290/1000], Loss: 1381.9252
Epoch [300/1000], Loss: 1955

In [None]:
# Evaluation (example)
model.eval() # testing mode
mse_values = [] # collect the MSE scores

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs) # predict the test data

        # Calculate Mean Squared Error
        mse = criterion(outputs, targets) # calcualte mse for the batch
        mse_values.append(mse.item()) # add to the list of MSE values

# Calculate and print the average MSE
avg_mse = np.mean(mse_values)
print(f"Average MSE on test set: {avg_mse}")

Average MSE on test set: 2872.356201171875


In [None]:
# Evaluation
model.eval()
predictions = []
actuals = []

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)
        predictions.extend(outputs.cpu().numpy())
        actuals.extend(targets.cpu().numpy())

# Create DataFrame
results_df = pd.DataFrame({'Predicted': np.array(predictions).flatten(), 'Actual': np.array(actuals).flatten()})
results_df

Unnamed: 0,Predicted,Actual
0,139.645920,219.0
1,181.327118,70.0
2,139.051834,202.0
3,294.347931,230.0
4,120.210449,111.0
...,...,...
84,111.432915,153.0
85,85.554550,98.0
86,80.706291,37.0
87,62.741089,63.0


Yes the results are much better.

##EXERCISE #2

In [None]:
# import torch
# import torch.nn as nn
# import torch.optim as optim
# from sklearn.model_selection import train_test_split

# Define the model
class DiabetesModel(nn.Module):
    def __init__(self, num_neurons):
        super(DiabetesModel, self).__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, num_neurons),  # First layer with specified number of neurons
            nn.ReLU(),
            nn.Linear(num_neurons, num_neurons),  # Second layer with specified number of neurons
            nn.ReLU(),
            nn.Linear(num_neurons, 1)  # Output layer
        )

    def forward(self, x):
        return self.layers(x)

# Function to train and evaluate the model
def train_and_evaluate(num_neurons, X_train, y_train, X_val, y_val, epochs=100):
    model = DiabetesModel(num_neurons)
    criterion = nn.MSELoss()  # Mean Squared Error for regression
    optimizer = optim.Adam(model.parameters(), lr=0.01)  # Adam optimizer

    # Training loop
    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()  # Reset gradients
        outputs = model(X_train)  # Forward pass
        loss = criterion(outputs, y_train)  # Compute loss
        loss.backward()  # Backpropagation
        optimizer.step()  # Update weights

    # Evaluate on validation data
    model.eval()
    with torch.no_grad():
        val_outputs = model(X_val)
        val_loss = criterion(val_outputs, y_val)  # Validation loss

    return val_loss.item()  # Return the validation loss value

# Example dataset
# Assume X and y are your dataset features and labels
# X = ...  # Your input data (shape: [n_samples, 10])
# y = ...  # Your targets (shape: [n_samples, 1])

# Split dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Define hyperparameter grid
param_grid = {
    'num_neurons': [5, 10, 20]  # Try different numbers of neurons
}

# Store results
results = []

# Grid Search
for num_neurons in param_grid['num_neurons']:
    print(f'Training with {num_neurons} neurons...')
    val_loss = train_and_evaluate(num_neurons, X_train, y_train, X_val, y_val)
    results.append((num_neurons, val_loss))
    print(f'Validation Loss: {val_loss:.4f}')

# Display results
for num_neurons, val_loss in results:
    print(f'Neuron count: {num_neurons}, Validation Loss: {val_loss:.4f}')

print('\n')

# Identify and print the best configuration
best_configuration = min(results, key=lambda x: x[1])  # Get configuration with the lowest loss
best_neurons, best_val_loss = best_configuration
print(f'Best Configuration: {best_neurons} neurons with Validation Loss: {best_val_loss:.4f}')

Training with 5 neurons...
Validation Loss: 20826.4043
Training with 10 neurons...
Validation Loss: 4761.3345
Training with 20 neurons...
Validation Loss: 3326.6748
Neuron count: 5, Validation Loss: 20826.4043
Neuron count: 10, Validation Loss: 4761.3345
Neuron count: 20, Validation Loss: 3326.6748


Best Configuration: 20 neurons with Validation Loss: 3326.6748


In [None]:
# import torch
# import torch.nn as nn
# import torch.optim as optim

# Define the model
class DiabetesModel(nn.Module):
    def __init__(self, num_neurons):
        super(DiabetesModel, self).__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, num_neurons),  # First layer with specified number of neurons
            nn.ReLU(),
            nn.Linear(num_neurons, num_neurons),  # Second layer
            nn.ReLU(),
            nn.Linear(num_neurons, 1)  # Output layer
        )

    def forward(self, x):
        return self.layers(x)

# Initialize the model, loss function, and optimizer
model = DiabetesModel(num_neurons=10)  # Example number of neurons
criterion = nn.MSELoss()  # Mean Squared Error loss function
optimiser = optim.Adam(model.parameters(), lr=0.001)  # Optimizer

In [None]:
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Training loop
epochs = 1000  # 1000 epochs

for epoch in range(epochs):
    for inputs, targets in train_loader:  # Loop through each mini-batch
        inputs, targets = inputs.to(device), targets.to(device)
        model.to(device)

        model.train()  # Set the model to training mode
        optimiser.zero_grad()  # Reset the gradients
        outputs = model(inputs)  # Create outputs
        loss = criterion(outputs, targets)  # Calculate loss
        loss.backward()  # Backpropagate the loss
        optimiser.step()  # Update model parameters

    # Move this print statement outside the inner loop
    if (epoch + 1) % 10 == 0:  # Print every 10 epochs
        print(f'Epoch [{epoch + 1}/{epochs}], Loss: {round(loss.item(), 4)}')

Epoch [10/1000], Loss: 29092.8203
Epoch [20/1000], Loss: 20807.7227
Epoch [30/1000], Loss: 69729.6641
Epoch [40/1000], Loss: 23701.7246
Epoch [50/1000], Loss: 36085.6719
Epoch [60/1000], Loss: 44277.4961
Epoch [70/1000], Loss: 4569.9287
Epoch [80/1000], Loss: 17589.1035
Epoch [90/1000], Loss: 29442.3828
Epoch [100/1000], Loss: 2877.0623
Epoch [110/1000], Loss: 15457.1543
Epoch [120/1000], Loss: 11190.1514
Epoch [130/1000], Loss: 4006.5928
Epoch [140/1000], Loss: 1163.1003
Epoch [150/1000], Loss: 3519.437
Epoch [160/1000], Loss: 1173.3394
Epoch [170/1000], Loss: 2044.0442
Epoch [180/1000], Loss: 1468.3828
Epoch [190/1000], Loss: 5716.2617
Epoch [200/1000], Loss: 5538.5674
Epoch [210/1000], Loss: 1888.0559
Epoch [220/1000], Loss: 3542.5496
Epoch [230/1000], Loss: 4783.582
Epoch [240/1000], Loss: 2970.1084
Epoch [250/1000], Loss: 6371.5449
Epoch [260/1000], Loss: 4765.249
Epoch [270/1000], Loss: 3613.3901
Epoch [280/1000], Loss: 1409.1812
Epoch [290/1000], Loss: 6490.644
Epoch [300/1000],

In [None]:
# Evaluation (example)
model.eval() # testing mode
mse_values = [] # collect the MSE scores

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs) # predict the test data

        # Calculate Mean Squared Error
        mse = criterion(outputs, targets) # calcualte mse for the batch
        mse_values.append(mse.item()) # add to the list of MSE values

# Calculate and print the average MSE
avg_mse = np.mean(mse_values)
print(f"Average MSE on test set: {avg_mse}")

Average MSE on test set: 2851.9542236328125


In [None]:
# Evaluation
model.eval()
predictions = []
actuals = []

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)
        predictions.extend(outputs.cpu().numpy())
        actuals.extend(targets.cpu().numpy())

# Create DataFrame
results_df = pd.DataFrame({'Predicted': np.array(predictions).flatten(), 'Actual': np.array(actuals).flatten()})
results_df

Unnamed: 0,Predicted,Actual
0,144.537506,219.0
1,178.024811,70.0
2,140.150604,202.0
3,296.237030,230.0
4,125.762863,111.0
...,...,...
84,115.660217,153.0
85,88.124664,98.0
86,77.564072,37.0
87,65.011871,63.0
