<a href="https://colab.research.google.com/github/Mannat5649144/DataScience-GenAI-Submissions-/blob/main/Week%206_6_02_DNN_101_COMPLETED.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![](https://drive.google.com/uc?export=view&id=1xqQczl0FG-qtNA2_WQYuWePW9oU8irqJ)

# 6.02 Dense Neural Network (with PyTorch)
This will expand on our logistic regression example and take us through building our first neural network. If you haven't already, be sure to check (and if neccessary) switch to GPU processing by clicking Runtime > Change runtime type and selecting GPU. We can test this has worked with the following code:

In [26]:
import torch

# Check for GPU availability
print("Num GPUs Available: ", torch.cuda.device_count())

Num GPUs Available:  1


Hopefully your code shows you have 1 GPU available! Next let's get some data. We'll start with another in-built dataset:

In [27]:
# upload an in-built Python (OK semi-in-built) dataset
from sklearn.datasets import load_diabetes

import pandas as pd
import numpy as np

# import the data
data = load_diabetes()
data

{'data': array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
          0.01990749, -0.01764613],
        [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
         -0.06833155, -0.09220405],
        [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
          0.00286131, -0.02593034],
        ...,
        [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
         -0.04688253,  0.01549073],
        [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
          0.04452873, -0.02593034],
        [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
         -0.00422151,  0.00306441]]),
 'target': array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
         69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
         68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
         87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
        259.,  53., 190., 142.,  75., 142., 155., 225.,  59

We are working on a regression problem, with "structured" data which has already been cleaned and normalised. We can skip the usual cleaning/engineering steps. However, we do need to get the data into PyTorch:

In [28]:
# Convert data to PyTorch tensors
X = torch.tensor(data.data, dtype=torch.float32)
y = torch.tensor(data.target, dtype=torch.float32).reshape(-1, 1) # Reshape y to be a column vector

Now our data is stored in tensors we can do train/test splitting as before (in fact we can use sklearn as before):

In [29]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

torch.Size([353, 10]) torch.Size([353, 1])
torch.Size([89, 10]) torch.Size([89, 1])


Now we can set up our batches for training. As we have a nice round 400 let's go with batches of 50 (8 batches in total). We'll also seperate the features and labels:

In [30]:
from torch.utils.data import TensorDataset, DataLoader

# Create TensorDatasets and DataLoaders
train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=50, shuffle=True)

test_dataset = TensorDataset(X_test, y_test)
test_loader = DataLoader(test_dataset, batch_size=50, shuffle=False)

Now its time to build our model. We'll keep it simple ... a model with an input layer of 10 features and then 2x _Dense_ (fully connected) layers each with 5 neurons and ReLU activation. Our output layer will be size=1 given this is a regression problem and we want a single value output per prediction.

This will be easier to understand if you have read through the logistic regression tutorial.

In [31]:
import torch
import torch.nn as nn

# Define the model
class DiabetesModel(nn.Module):
    def __init__(self):
        super(DiabetesModel, self).__init__()
        # we'll set up the layers as a sequence using nn.Sequential
        self.layers = nn.Sequential(

            # first layer will be a linear layer that has 5x neurons
            # (5x sets of linear regression)
            # the layer takes the 10 features as input (i.e. 10, 5)
            nn.Linear(10, 5),

            nn.ReLU(), # ReLU activation

            # second linear layer again has 5 neurons
            # this time taking the input as the output of the last layer
            # (which had 5x neurons)
            nn.Linear(5, 5),

            nn.ReLU(), # ReLU again

            # last linear layer takes the output from the previous 5 neurons
            # this time its a single output with no activation
            # i.e. this is the predicitons (regression)
            nn.Linear(5, 1)
        )

    def forward(self, x):
        return self.layers(x) # pass the data through the layers

As before we need to create a model object, specify the loss (criterion) and an optimiser (which we cover next week):

In [32]:
import torch.optim as optim

# Initialize the model, loss function, and optimizer
model = DiabetesModel()
criterion = nn.MSELoss() # MSE loss function
optimiser = optim.Adam(model.parameters(), lr=0.001)

Now we can train the model. Again, the logistic regression tutorial (6.01) may help you undertstand this:

In [33]:
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Training loop (example - you'll likely want to add more epochs)
epochs = 100 # 100 epochs

for epoch in range(epochs):
  # use the train_loader to pass the inputs (x) and targets (y)
  for inputs, targets in train_loader:
    # pass to the GPU (hopefully)
    inputs, targets = inputs.to(device), targets.to(device)

    # pass model to GPU as well
    model.to(device)

    model.train() # put the model object in train mode
    optimiser.zero_grad() # reset the gradiants
    outputs = model(inputs) # create outputs
    loss = criterion(outputs, targets) # compare with Y to get loss
    loss.backward() # backpropogate the loss (next week)
    optimiser.step() # # update the parameters based on this round of training

  # every 10 steps we will print out the current loss
    if (epoch+1) % 10 == 0: # modular arithmetic
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {round(loss.item(), 4)}')

Epoch [10/100], Loss: 33178.4531
Epoch [10/100], Loss: 29347.9551
Epoch [10/100], Loss: 29577.2695
Epoch [10/100], Loss: 33449.4648
Epoch [10/100], Loss: 28005.0938
Epoch [10/100], Loss: 25836.7324
Epoch [10/100], Loss: 26826.4336
Epoch [10/100], Loss: 56940.1641
Epoch [20/100], Loss: 31163.1523
Epoch [20/100], Loss: 21224.2695
Epoch [20/100], Loss: 30665.7441
Epoch [20/100], Loss: 31530.5469
Epoch [20/100], Loss: 33302.0352
Epoch [20/100], Loss: 27244.3789
Epoch [20/100], Loss: 30352.9238
Epoch [20/100], Loss: 52511.2969
Epoch [30/100], Loss: 30939.6777
Epoch [30/100], Loss: 27470.207
Epoch [30/100], Loss: 29212.1523
Epoch [30/100], Loss: 24753.6211
Epoch [30/100], Loss: 30049.7402
Epoch [30/100], Loss: 35816.543
Epoch [30/100], Loss: 27536.75
Epoch [30/100], Loss: 13526.4492
Epoch [40/100], Loss: 26086.2012
Epoch [40/100], Loss: 27212.3652
Epoch [40/100], Loss: 29363.1152
Epoch [40/100], Loss: 25104.4668
Epoch [40/100], Loss: 31665.8594
Epoch [40/100], Loss: 33502.9805
Epoch [40/100]

We can see loss is significantly lower at the end than it was at the start. However, it is also bouncing around a little still which suggests the model needs more training (100 epochs is not a lot in deep learning terms). However, let's evaluate as before:

In [34]:
# Evaluation (example)
model.eval() # testing mode
mse_values = [] # collect the MSE scores

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs) # predict the test data

        # Calculate Mean Squared Error
        mse = criterion(outputs, targets) # calcualte mse for the batch
        mse_values.append(mse.item()) # add to the list of MSE values

# Calculate and print the average MSE
avg_mse = np.mean(mse_values)
print(f"Average MSE on test set: {avg_mse}")

Average MSE on test set: 14579.125


MSE looks expected given training (no obvious sign of overfitting). However, we probably can get better results with tuning and more epochs.

Let's run the loop again a little differently to collect the predicted values (y_hat) and actuals (y) and add them to a dataset for comparions:

In [35]:
# Evaluation
model.eval()
predictions = []
actuals = []

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)
        predictions.extend(outputs.cpu().numpy())
        actuals.extend(targets.cpu().numpy())

# Create DataFrame
results_df = pd.DataFrame({'Predicted': np.array(predictions).flatten(), 'Actual': np.array(actuals).flatten()})
results_df

Unnamed: 0,Predicted,Actual
0,50.333576,219.0
1,46.941250,70.0
2,49.453487,202.0
3,60.749218,230.0
4,47.552822,111.0
...,...,...
84,42.435783,153.0
85,39.490826,98.0
86,35.042961,37.0
87,36.664616,63.0


Side-by-side, they don't look great. Can you improve them?

<br><br>

## EXERCISE #1
Try increasing the number of epochs to 1,000 (when the model is fairly well trained then the results printed for each 10x epochs will be fairly stable and not change much). Does this give better results?


In [36]:
import torch.optim as optim

# Re-initialize the model, loss function, and optimizer for a fresh start
# This ensures the model begins training from scratch with the new epoch count.
model = DiabetesModel()
criterion = nn.MSELoss() # MSE loss function
optimiser = optim.Adam(model.parameters(), lr=0.001)

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Training loop with 1000 epochs
epochs = 1000 # Increased to 1000 epochs as requested

print(f"Starting training for {epochs} epochs...")

for epoch in range(epochs):
  # use the train_loader to pass the inputs (x) and targets (y)
  for inputs, targets in train_loader:
    # pass to the GPU (hopefully)
    inputs, targets = inputs.to(device), targets.to(device)

    # pass model to GPU as well
    model.to(device)

    model.train() # put the model object in train mode
    optimiser.zero_grad() # reset the gradients
    outputs = model(inputs) # create outputs
    loss = criterion(outputs, targets) # compare with Y to get loss
    loss.backward() # backpropagate the loss
    optimiser.step() # update the parameters based on this round of training

  # every 100 steps we will print out the current loss for brevity with 1000 epochs
  if (epoch+1) % 100 == 0: # Print less frequently for more epochs
      print(f'Epoch [{epoch+1}/{epochs}], Loss: {round(loss.item(), 4)}')

Starting training for 1000 epochs...
Epoch [100/1000], Loss: 23954.2559
Epoch [200/1000], Loss: 13286.502
Epoch [300/1000], Loss: 9895.5273
Epoch [400/1000], Loss: 5388.1621
Epoch [500/1000], Loss: 6208.6914
Epoch [600/1000], Loss: 4684.9414
Epoch [700/1000], Loss: 2873.5476
Epoch [800/1000], Loss: 4262.5732
Epoch [900/1000], Loss: 1844.1047
Epoch [1000/1000], Loss: 1058.116


Increasing the number of epochs to 1000 has significantly reduced the training loss from approximately 3117 to 1058. This suggests the model is learning better. To confirm if this leads to better generalization, let's re-evaluate the model's performance on the test set by calculating the average Mean Squared Error (MSE).



In [37]:
# Evaluation (example)
model.eval() # testing mode
mse_values = [] # collect the MSE scores

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs) # predict the test data

        # Calculate Mean Squared Error
        mse = criterion(outputs, targets) # calcualte mse for the batch
        mse_values.append(mse.item()) # add to the list of MSE values

# Calculate and print the average MSE
avg_mse = np.mean(mse_values)
print(f"Average MSE on test set after 1000 epochs: {avg_mse}")

Average MSE on test set after 1000 epochs: 2965.6083984375


The average MSE on the test set after 1000 epochs is approximately 2965.61. This is a significant improvement compared to the previous average MSE of 14579.13 after only 100 epochs. This indicates that increasing the number of training epochs has allowed the model to learn better and generalize more effectively to unseen data.


## EXERCISE #2 (optional)
Try experimenting with the architecture (number of neurons and/or number of layers). Can we reach an optimal architecture?

In [40]:
import torch
import torch.nn as nn

# Define the original model
class DiabetesModel(nn.Module):
    def __init__(self):
        super(DiabetesModel, self).__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 5),
            nn.ReLU(),
            nn.Linear(5, 5),
            nn.ReLU(),
            nn.Linear(5, 1)
        )

    def forward(self, x):
        return self.layers(x)

# Define the new model architecture as DiabetesModelV2
class DiabetesModelV2(nn.Module):
    def __init__(self):
        super(DiabetesModelV2, self).__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 20), # Increased neurons from 5 to 20
            nn.ReLU(),
            nn.Linear(20, 10), # Added another hidden layer with 10 neurons
            nn.ReLU(),
            nn.Linear(10, 5),  # Original second layer, now third hidden layer
            nn.ReLU(),
            nn.Linear(5, 1) # Output layer
        )

    def forward(self, x):
        return self.layers(x)

print("Defined DiabetesModel and DiabetesModelV2 classes.")

Defined DiabetesModel and DiabetesModelV2 classes.


**Reasoning**:
The subtask requires experimenting with the new `DiabetesModelV2` architecture. This step will instantiate `DiabetesModelV2`, define the loss function and optimizer, and then run the training loop using the new model to observe its performance.



In [42]:
import torch.optim as optim

# Initialize the new model (DiabetesModelV2), loss function, and optimizer
model_v2 = DiabetesModelV2() # Instantiate the new model
criterion_v2 = nn.MSELoss() # MSE loss function
optimiser_v2 = optim.Adam(model_v2.parameters(), lr=0.001)

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Training loop with 1000 epochs for the new model
epochs = 1000

print(f"Starting training for DiabetesModelV2 for {epochs} epochs...")

for epoch in range(epochs):
  for inputs, targets in train_loader:
    inputs, targets = inputs.to(device), targets.to(device)
    model_v2.to(device)

    model_v2.train()
    optimiser_v2.zero_grad()
    outputs = model_v2(inputs)
    loss = criterion_v2(outputs, targets)
    loss.backward()
    optimiser_v2.step()

  if (epoch+1) % 100 == 0:
      print(f'Epoch [{epoch+1}/{epochs}], Loss: {round(loss.item(), 4)}')

Starting training for DiabetesModelV2 for 1000 epochs...
Epoch [100/1000], Loss: 5560.5347
Epoch [200/1000], Loss: 1547.8363
Epoch [300/1000], Loss: 5308.1353
Epoch [400/1000], Loss: 1712.9739
Epoch [500/1000], Loss: 3154.0017
Epoch [600/1000], Loss: 1067.9788
Epoch [700/1000], Loss: 1569.2338
Epoch [800/1000], Loss: 2569.1626
Epoch [900/1000], Loss: 2123.1909
Epoch [1000/1000], Loss: 709.7121


**Reasoning**:
After training the `DiabetesModelV2`, the next step is to evaluate its performance on the test set to compare it with the original model and determine if the architecture changes led to an improvement. This involves calculating the average MSE on the test data.



In [44]:
model_v2.eval() # Set the model to evaluation mode
mse_values_v2 = [] # Collect the MSE scores for DiabetesModelV2

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model_v2(inputs) # Predict on the test data using model_v2

        # Calculate Mean Squared Error
        mse = criterion_v2(outputs, targets) # Calculate MSE for the batch
        mse_values_v2.append(mse.item()) # Add to the list of MSE values

# Calculate and print the average MSE for DiabetesModelV2
avg_mse_v2 = np.mean(mse_values_v2)
print(f"Average MSE on test set for DiabetesModelV2 after 1000 epochs: {avg_mse_v2}")

Average MSE on test set for DiabetesModelV2 after 1000 epochs: 2860.659423828125


**Reasoning**:
The subtask requires comparing the MSE of the old and new models, and then displaying the predicted vs actual values for the new model. This code block will print both MSEs and generate a DataFrame for visual comparison of the new model's predictions.



In [46]:
print(f"Average MSE on test set (Original Model, 1000 epochs): {avg_mse}")
print(f"Average MSE on test set (New Model V2, 1000 epochs): {avg_mse_v2}")

# Evaluation for DiabetesModelV2 to collect predictions and actuals
model_v2.eval() # Set the model to evaluation mode
predictions_v2 = []
actuals_v2 = []

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model_v2(inputs)
        predictions_v2.extend(outputs.cpu().numpy().flatten())
        actuals_v2.extend(targets.cpu().numpy().flatten())

# Create DataFrame for DiabetesModelV2 results
results_df_v2 = pd.DataFrame({'Predicted': predictions_v2, 'Actual': actuals_v2})
print("\nPredicted vs Actual values for DiabetesModelV2:")
results_df_v2

Average MSE on test set (Original Model, 1000 epochs): 2965.6083984375
Average MSE on test set (New Model V2, 1000 epochs): 2860.659423828125

Predicted vs Actual values for DiabetesModelV2:


Unnamed: 0,Predicted,Actual
0,139.583282,219.0
1,180.706604,70.0
2,137.841568,202.0
3,298.141327,230.0
4,119.275070,111.0
...,...,...
84,111.106834,153.0
85,86.222908,98.0
86,80.254005,37.0
87,64.403595,63.0



*   The minor reduction in MSE suggests that increasing model complexity can offer marginal performance gains, but further analysis is needed to determine if this improvement is statistically significant or practically meaningful.
*   Experiment with different architectural changes, such as varying activation functions, regularization techniques, or exploring different optimizers, to identify configurations that lead to more substantial improvements in model performance and generalization.
