<a href="https://colab.research.google.com/github/AmberMynott/AmberMynott-DataScience-GenAI-Submissions/blob/main/Assignment_5/6_02_DNN_101_COMPLETED.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![](https://drive.google.com/uc?export=view&id=1xqQczl0FG-qtNA2_WQYuWePW9oU8irqJ)

# 6.02 Dense Neural Network (with PyTorch)
This will expand on our logistic regression example and take us through building our first neural network. If you haven't already, be sure to check (and if neccessary) switch to GPU processing by clicking Runtime > Change runtime type and selecting GPU. We can test this has worked with the following code:

In [1]:
import torch

# Check for GPU availability
print("Num GPUs Available: ", torch.cuda.device_count())

Num GPUs Available:  1


Hopefully your code shows you have 1 GPU available! Next let's get some data. We'll start with another in-built dataset:

In [2]:
# upload an in-built Python (OK semi-in-built) dataset
from sklearn.datasets import load_diabetes

import pandas as pd
import numpy as np

# import the data
data = load_diabetes()
data

{'data': array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
          0.01990749, -0.01764613],
        [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
         -0.06833155, -0.09220405],
        [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
          0.00286131, -0.02593034],
        ...,
        [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
         -0.04688253,  0.01549073],
        [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
          0.04452873, -0.02593034],
        [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
         -0.00422151,  0.00306441]]),
 'target': array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
         69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
         68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
         87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
        259.,  53., 190., 142.,  75., 142., 155., 225.,  59

We are working on a regression problem, with "structured" data which has already been cleaned and normalised. We can skip the usual cleaning/engineering steps. However, we do need to get the data into PyTorch:

In [3]:
# Convert data to PyTorch tensors
X = torch.tensor(data.data, dtype=torch.float32)
y = torch.tensor(data.target, dtype=torch.float32).reshape(-1, 1) # Reshape y to be a column vector

Now our data is stored in tensors we can do train/test splitting as before (in fact we can use sklearn as before):

In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

torch.Size([353, 10]) torch.Size([353, 1])
torch.Size([89, 10]) torch.Size([89, 1])


Now we can set up our batches for training. As we have a nice round 400 let's go with batches of 50 (8 batches in total). We'll also seperate the features and labels:

In [5]:
from torch.utils.data import TensorDataset, DataLoader

# Create TensorDatasets and DataLoaders
train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=50, shuffle=True)

test_dataset = TensorDataset(X_test, y_test)
test_loader = DataLoader(test_dataset, batch_size=50, shuffle=False)

Now its time to build our model. We'll keep it simple ... a model with an input layer of 10 features and then 2x _Dense_ (fully connected) layers each with 5 neurons and ReLU activation. Our output layer will be size=1 given this is a regression problem and we want a single value output per prediction.

This will be easier to understand if you have read through the logistic regression tutorial.

In [6]:
import torch
import torch.nn as nn

# Define the model
class DiabetesModel(nn.Module):
    def __init__(self):
        super(DiabetesModel, self).__init__()
        # we'll set up the layers as a sequence using nn.Sequential
        self.layers = nn.Sequential(

            # first layer will be a linear layer that has 5x neurons
            # (5x sets of linear regression)
            # the layer takes the 10 features as input (i.e. 10, 5)
            nn.Linear(10, 5),

            nn.ReLU(), # ReLU activation

            # second linear layer again has 5 neurons
            # this time taking the input as the output of the last layer
            # (which had 5x neurons)
            nn.Linear(5, 5),

            nn.ReLU(), # ReLU again

            # last linear layer takes the output from the previous 5 neurons
            # this time its a single output with no activation
            # i.e. this is the predicitons (regression)
            nn.Linear(5, 1)
        )

    def forward(self, x):
        return self.layers(x) # pass the data through the layers

As before we need to create a model object, specify the loss (criterion) and an optimiser (which we cover next week):

In [7]:
import torch.optim as optim

# Initialize the model, loss function, and optimizer
model = DiabetesModel()
criterion = nn.MSELoss() # MSE loss function
optimiser = optim.Adam(model.parameters(), lr=0.001)

Now we can train the model. Again, the logistic regression tutorial (6.01) may help you undertstand this:

In [8]:
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Training loop (example - you'll likely want to add more epochs)
epochs = 100 # 100 epochs

for epoch in range(epochs):
  # use the train_loader to pass the inputs (x) and targets (y)
  for inputs, targets in train_loader:
    # pass to the GPU (hopefully)
    inputs, targets = inputs.to(device), targets.to(device)

    # pass model to GPU as well
    model.to(device)

    model.train() # put the model object in train mode
    optimiser.zero_grad() # reset the gradiants
    outputs = model(inputs) # create outputs
    loss = criterion(outputs, targets) # compare with Y to get loss
    loss.backward() # backpropogate the loss (next week)
    optimiser.step() # # update the parameters based on this round of training

  # every 10 steps we will print out the current loss
    if (epoch+1) % 10 == 0: # modular arithmetic
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {round(loss.item(), 4)}')

Epoch [10/100], Loss: 34702.3203
Epoch [10/100], Loss: 27381.416
Epoch [10/100], Loss: 32135.9395
Epoch [10/100], Loss: 28475.0664
Epoch [10/100], Loss: 27145.125
Epoch [10/100], Loss: 22990.7871
Epoch [10/100], Loss: 34069.5508
Epoch [10/100], Loss: 51140.8906
Epoch [20/100], Loss: 26486.2988
Epoch [20/100], Loss: 37021.1953
Epoch [20/100], Loss: 26186.2988
Epoch [20/100], Loss: 27901.7188
Epoch [20/100], Loss: 29693.7227
Epoch [20/100], Loss: 34835.0195
Epoch [20/100], Loss: 26120.9941
Epoch [20/100], Loss: 15953.6826
Epoch [30/100], Loss: 29584.8535
Epoch [30/100], Loss: 32580.4863
Epoch [30/100], Loss: 25158.5938
Epoch [30/100], Loss: 30366.209
Epoch [30/100], Loss: 36888.7188
Epoch [30/100], Loss: 24940.4746
Epoch [30/100], Loss: 26601.1328
Epoch [30/100], Loss: 31364.0059
Epoch [40/100], Loss: 25749.8516
Epoch [40/100], Loss: 26061.2949
Epoch [40/100], Loss: 33702.2891
Epoch [40/100], Loss: 29322.3398
Epoch [40/100], Loss: 29819.5352
Epoch [40/100], Loss: 32677.2988
Epoch [40/100

We can see loss is significantly lower at the end than it was at the start. However, it is also bouncing around a little still which suggests the model needs more training (100 epochs is not a lot in deep learning terms). However, let's evaluate as before:

In [9]:
# Evaluation (example)
model.eval() # testing mode
mse_values = [] # collect the MSE scores

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs) # predict the test data

        # Calculate Mean Squared Error
        mse = criterion(outputs, targets) # calcualte mse for the batch
        mse_values.append(mse.item()) # add to the list of MSE values

# Calculate and print the average MSE
avg_mse = np.mean(mse_values)
print(f"Average MSE on test set: {avg_mse}")

Average MSE on test set: 21168.029296875


MSE looks expected given training (no obvious sign of overfitting). However, we probably can get better results with tuning and more epochs.

Let's run the loop again a little differently to collect the predicted values (y_hat) and actuals (y) and add them to a dataset for comparions:

In [10]:
# Evaluation
model.eval()
predictions = []
actuals = []

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)
        predictions.extend(outputs.cpu().numpy())
        actuals.extend(targets.cpu().numpy())

# Create DataFrame
results_df = pd.DataFrame({'Predicted': np.array(predictions).flatten(), 'Actual': np.array(actuals).flatten()})
results_df

Unnamed: 0,Predicted,Actual
0,19.434847,219.0
1,18.456427,70.0
2,19.085600,202.0
3,23.223705,230.0
4,18.548630,111.0
...,...,...
84,16.609005,153.0
85,15.480692,98.0
86,14.078079,37.0
87,14.644762,63.0


Side-by-side, they don't look great. Can you improve them?

<br><br>

## EXERCISE #1
Try increasing the number of epochs to 1,000 (when the model is fairly well trained then the results printed for each 10x epochs will be fairly stable and not change much). Does this give better results?

<br><br>



Firstly, I'm going to retrain the model but instead of having epochs = 100 like before, I'm going to have epochs = 1000

In [12]:
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Training loop (example - you'll likely want to add more epochs)
epochs = 1000 # 1000 epochs

for epoch in range(epochs):
  # use the train_loader to pass the inputs (x) and targets (y)
  for inputs, targets in train_loader:
    # pass to the GPU (hopefully)
    inputs, targets = inputs.to(device), targets.to(device)

    # pass model to GPU as well
    model.to(device)

    model.train() # put the model object in train mode
    optimiser.zero_grad() # reset the gradiants
    outputs = model(inputs) # create outputs
    loss = criterion(outputs, targets) # compare with Y to get loss
    loss.backward() # backpropogate the loss (next week)
    optimiser.step() # # update the parameters based on this round of training

  # every 10 steps we will print out the current loss
    if (epoch+1) % 10 == 0: # modular arithmetic
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {round(loss.item(), 4)}')

Epoch [10/1000], Loss: 6945.0659
Epoch [10/1000], Loss: 7791.3438
Epoch [10/1000], Loss: 5701.9312
Epoch [10/1000], Loss: 7085.4507
Epoch [10/1000], Loss: 6884.3848
Epoch [10/1000], Loss: 7287.3286
Epoch [10/1000], Loss: 6859.7075
Epoch [10/1000], Loss: 3291.5115
Epoch [20/1000], Loss: 7115.7163
Epoch [20/1000], Loss: 5325.5547
Epoch [20/1000], Loss: 6805.6167
Epoch [20/1000], Loss: 6431.2847
Epoch [20/1000], Loss: 7104.2637
Epoch [20/1000], Loss: 3902.4119
Epoch [20/1000], Loss: 5570.001
Epoch [20/1000], Loss: 1104.7549
Epoch [30/1000], Loss: 7301.665
Epoch [30/1000], Loss: 3685.4348
Epoch [30/1000], Loss: 5434.646
Epoch [30/1000], Loss: 4899.7295
Epoch [30/1000], Loss: 6334.002
Epoch [30/1000], Loss: 4392.8955
Epoch [30/1000], Loss: 5244.3213
Epoch [30/1000], Loss: 6598.1279
Epoch [40/1000], Loss: 4027.6719
Epoch [40/1000], Loss: 5510.0859
Epoch [40/1000], Loss: 5032.4785
Epoch [40/1000], Loss: 3650.4297
Epoch [40/1000], Loss: 6815.8311
Epoch [40/1000], Loss: 4800.1377
Epoch [40/1000

While the loss still bounces around a bit, it's now reaching lower numbers, with the final loss value being 699 compared to 36023 before. This implies that increases the number of epochs has improved our model. In order to further test this, I'm going to evaluate the model like before.

In [13]:
# Evaluation (example)
model.eval() # testing mode
mse_values = [] # collect the MSE scores

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs) # predict the test data

        # Calculate Mean Squared Error
        mse = criterion(outputs, targets) # calcualte mse for the batch
        mse_values.append(mse.item()) # add to the list of MSE values

# Calculate and print the average MSE
avg_mse = np.mean(mse_values)
print(f"Average MSE on test set: {avg_mse}")

Average MSE on test set: 2877.7706298828125


Our MSE has also significantly decreased; in fact, it's decreased by over 80% from 21168 to 2878. This furthers my idea that the model had improved.

Finally, I am going to repeat what we did earlier by collecting the predicted values (y_hat) and actuals (y) and add them to a dataset for comparions

In [14]:
# Evaluation
model.eval()
predictions = []
actuals = []

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)
        predictions.extend(outputs.cpu().numpy())
        actuals.extend(targets.cpu().numpy())

# Create DataFrame
results_df = pd.DataFrame({'Predicted': np.array(predictions).flatten(), 'Actual': np.array(actuals).flatten()})
results_df

Unnamed: 0,Predicted,Actual
0,148.347992,219.0
1,173.067734,70.0
2,143.428619,202.0
3,290.197235,230.0
4,130.204163,111.0
...,...,...
84,116.300079,153.0
85,88.164093,98.0
86,74.084595,37.0
87,66.077423,63.0


While there is still error in these predictions, they are much more accurate than the model with only 100 epochs.

## EXERCISE #2 (optional)
Try experimenting with the architecture (number of neurons and/or number of layers). Can we reach an optimal architecture?

Continuing with 1000 epochs from the exercise above, I'm now going to experiment by increasing the number of neurons.

In [15]:
import torch
import torch.nn as nn

# Define the model with increased neurons
class DiabetesModel(nn.Module):
    def __init__(self):
        super(DiabetesModel, self).__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 10), # Increased from 5 neurons
            nn.ReLU(),
            nn.Linear(10, 10), # Increased from 5 neurons
            nn.ReLU(),
            nn.Linear(10, 1) # Output layer remains 1
        )

    def forward(self, x):
        return self.layers(x)

print("DiabetesModel class updated with 10 neurons in hidden layers.")

DiabetesModel class updated with 10 neurons in hidden layers.


Now once again predicting and evaluating the model to see if this has made an improvement.

First I need to retrain the model

In [16]:
import torch.optim as optim

# Initialize the model, loss function, and optimizer with the updated architecture
model = DiabetesModel()
criterion = nn.MSELoss()
optimiser = optim.Adam(model.parameters(), lr=0.001)

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Training loop with 1000 epochs
epochs = 1000

for epoch in range(epochs):
  for inputs, targets in train_loader:
    inputs, targets = inputs.to(device), targets.to(device)
    model.to(device)

    model.train()
    optimiser.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, targets)
    loss.backward()
    optimiser.step()

    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {round(loss.item(), 4)}')


Epoch [10/1000], Loss: 33904.2148
Epoch [10/1000], Loss: 22684.0293
Epoch [10/1000], Loss: 30570.6914
Epoch [10/1000], Loss: 34103.1641
Epoch [10/1000], Loss: 29310.0703
Epoch [10/1000], Loss: 24896.3867
Epoch [10/1000], Loss: 33524.4688
Epoch [10/1000], Loss: 9720.6787
Epoch [20/1000], Loss: 31786.9844
Epoch [20/1000], Loss: 33524.3008
Epoch [20/1000], Loss: 28248.3887
Epoch [20/1000], Loss: 21912.3789
Epoch [20/1000], Loss: 30960.6836
Epoch [20/1000], Loss: 32924.0156
Epoch [20/1000], Loss: 27962.9902
Epoch [20/1000], Loss: 6245.9116
Epoch [30/1000], Loss: 25008.5898
Epoch [30/1000], Loss: 30736.3516
Epoch [30/1000], Loss: 24757.4844
Epoch [30/1000], Loss: 28279.709
Epoch [30/1000], Loss: 29293.1914
Epoch [30/1000], Loss: 32153.6797
Epoch [30/1000], Loss: 30231.7461
Epoch [30/1000], Loss: 51778.043
Epoch [40/1000], Loss: 29835.0098
Epoch [40/1000], Loss: 19475.0312
Epoch [40/1000], Loss: 28935.6035
Epoch [40/1000], Loss: 30665.2598
Epoch [40/1000], Loss: 24075.6562
Epoch [40/1000], L

Interestingly, by increasing our number of neurons here to 10, the losses have actually increased. However, while the final value for the loss previously was only 699, on average, the losses here are fairly similar if not slightly lower. I'm going to calculate the MSE to investigate this further.

In [17]:
# Evaluation (example)
model.eval() # testing mode
mse_values = [] # collect the MSE scores

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs) # predict the test data

        # Calculate Mean Squared Error
        mse = criterion(outputs, targets) # calcualte mse for the batch
        mse_values.append(mse.item()) # add to the list of MSE values

# Calculate and print the average MSE
avg_mse = np.mean(mse_values)
print(f"Average MSE on test set: {avg_mse}")

Average MSE on test set: 2853.203125


The MSE has slightly reduced (by 24)