# Predicting age from brain structures

<img src="https://drive.google.com/uc?export=view&id=1bSgR27zLYH_eW0VD4jjGYF2uLHQJWvJh" width = "400" style="float: right;">

In this tutorial, we will take the concepts we have learned so far and apply them to our case study, which involves predicting a baby's age based on the volumes of their brain structures. We will learn how to tune the parameters of the network and devise a non-linear solution.

To begin, we will load our dataset, which comprises **86 brain volume observations**. Afterward, we'll transform both the feature matrix and target vector into tensors. Run the cell below to load the dataset with 86 structures.

In [None]:
# only do this if you work on Google Colab
# run the cell
# then upload file 'GA-brain-volumes-86-features.csv'

from google.colab import files
files.upload()

In [None]:
import torch
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

def CreateFeaturesTargets(filename):

    df = pd.read_csv(filename,header=None)

    # convert from 'DataFrame' to numpy array
    data = df.values

    # Features are in columns one to end
    X = data[:,1:]

    # Scale features
    X = StandardScaler().fit_transform(X)

    # Labels are in the column zero
    y = data[:,0].reshape(-1,1)

    # return Features and Labels
    return X, y

X,y = CreateFeaturesTargets('GA-brain-volumes-86-features.csv')

# perform scaling of the target values to support better convergence
target_scaler = StandardScaler()
y = target_scaler.fit_transform(y)

print('Number of samples is', X.shape[0])
print('Number of features is', X.shape[1])

# convert to tensors
X = torch.from_numpy(X).float()
y = torch.from_numpy(y).float()
print('X shape: ', X.shape)
print('y shape: ', y.shape)
print('X type: ', X.type())
print('y type: ', y.type())

Observation 1: It's important to note that we've transformed our target values into a two-dimensional vector. Both the feature matrix and target vector are Pytorch tensors of type `float`, as Pytorch requires this format.

Observation 2: Notably, we've also scaled the target values, which enhances the convergence of stochastic gradient descent. This is a change from previous exercises, where we used regression techniques that either relied on analytical solutions or alternative optimisers.

The function `PlotTargets` displayed below has been used earlier to illustrate both the actual and predicted target values. Please review the function and execute the following cell.

In [None]:
import matplotlib.pyplot as plt
def PlotTargets(y_pred,y, label = 'Target values', plot_line=True):
    if plot_line:
        plt.plot([-3,3],[-3,3],'r', label = '$y=\hat{y}$')
    plt.plot(y,y_pred,'o', label = label)

    plt.xlabel('Expected target values')
    plt.ylabel('Predicted target values')
    plt.legend()

## Exercise 4

In this exercise you will train and evaluate a single layer perceptron to predict age of a baby from volumes of 86 brain structures. First we will split the dataset into **training, validation and test set**.

This is different from what we have done before, but cross-validation is rarely used in deep learning, due to long training times. You will see later in this exercise how these three sets are used.

n this exercise you will train and evaluate a single layer perceptron to predict age of a baby from volumes of 86 brain structures. First we will split the dataset into training, validation and test set.

This is different from what we have done before, but cross-validation is rarely used in deep learning, because it tends to extend training times.

But don't worry! As we navigate through this exercise together, you'll get a better understanding of how we utilize these three sets. Ready to jump in?

In [None]:
from sklearn.model_selection import train_test_split

# extract test set
groups = np.round(y/3)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42, stratify=groups)

# extract validation set
groups_val = np.round(y_train/3)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.15, random_state=42, stratify=groups_val)

# display info
print('Training samples: ', y_train.shape[0])
print('Validation samples: ', y_val.shape[0])
print('Test samples: ', y_test.shape[0])

In order to simplify the code and keep things organized, we've provided you with three handy functions below:
* `train` will perform one training epoch and return the current loss value
* `validate` will return the loss value without performing any training.
* `RMSE` will calculate the Root Mean Squared Error (RMSE) for the trained network and the specific dataset you've selected. An additional feature is that it considers the scaling of the target values, providing the result in weeks GA (Gestational Age).

Feel free to take a moment to familiarize yourself with these functions, then go ahead and run the code.

In [None]:
# performs one training epoch
# returns MSE loss
def train(net,X,y):
    # 1. Clear gradients
    optimizer.zero_grad()
    # 2. Forward pass
    prediction = net(X)
    # 3. Compute loss
    loss = loss_function(prediction, y)
    # 4. Calculate gradients
    loss.backward()
    # 5. Update network parameters
    optimizer.step()
    # return MSE loss
    return loss.data # we want only value, not gradients

# calculates and returns the loss any training
def validate(net,X,y):
    with torch.no_grad(): # no need to calculate gradients
        # Forward pass
        prediction = net(X)
        # Calculate loss
        loss = loss_function(prediction, y)
        # return MSE loss
        return loss

# Calculates RMSE in weeks GA
def RMSE(net,X,y):
    loss = validate(net,X,y).numpy()
    rmse = np.sqrt(loss*target_scaler.var_[0])

You might recall that we previously used a linear regression model in Notebook 9.1 to predict brain volume from age. Now, we'll be taking things a step further. However, for our present dataset, we need to make some adjustments to the existing code. Let's go through these modifications one-by-one:

**Task 4.1:** Adjust **architecture** of the network so that it can be used to predict age from 86 structures.

**Task 4.2:** Don't be disheartened if you notice that the network isn't training properly at first. In fact, an increasing training loss is often indicative that the **learning rate **is too high. By testing smaller learning rates, we might be able to solve this issue. Our aim is to select the highest learning rate which doesn't cause an increase in the training loss.

**Task 4.3:** After tuning your learning rate, you might notice that the training loss continues to decrease quite significantly. This might mean that the network hasn't yet converged to an optimal solution. Try increasing the number of **epochs** to 1000 and see what happens. Can you figure out how many epochs were needed for the network to converge by looking at the MSE loss plot?

**Task 4.4:** The number of epochs we've set may seem a bit arbitrary, and it might leave you wondering if it's either too few (with the network not having converged) or too many (resulting in the network overfitting). You may also be curious about the role of the **validation set**. Well, this set is used to monitor the performance of the network during training. In this task, you'll implement a monitoring system during epochs using the validation set. Here's how you can do it:

- First, create a variable val_losses that will store the validation loss at each epoch. Initiate it before the `for` loop, just like `train_losses`.

- At each epoch, call the function `validate` to calculate the loss on the validation set `X_val`, `y_val`. The validation loss returned by validate should then be appended to `val_losses`.

- On the subplot `133`, plot the validation loss along with the training loss.

- If required, adjust the number of epochs to 10000 to identify when the validation loss begins to rise.

**Task 4.5:**Naturally, we'd like to select the model that shows the best performance on the validation set as our final trained model. This means we need to continue training as long as the validation set's loss is decreasing. The moment it starts increasing, we should stop training to prevent overfitting. This strategy is known as **early stopping** and serves as a form of regularisation. To implement early stopping, we need to `break` the `for` loop as soon as the validation loss starts rising. To do this, add the following code at the end of the for loop:

`if(i>1):
     if val_losses[i-1]>val_losses[i-2]:
         print('Final iteration: ', i)
         break`


 **NOTE** Don't be surprised if not all network runs deliver equally well. Since we're using gradient descent and the weights of the network are initially set to random values, the fit might not always converge to an optimal solution. However, take heart that some runs will yield a good solution, not unlike the penalised regression techniques that we've discussed earlier in this module. Happy coding!

In [None]:
import torch.nn as nn
class ANRegressor(nn.Module):
    def __init__(self):
        super(ANRegressor, self).__init__()
        self.layer = nn.Linear(1, 1)

    def forward(self, x):
        x = self.layer(x)
        return x

# create network
net = ANRegressor()
print(net)

# mean squared error loss
loss_function = nn.MSELoss()

# stochastic gradient descent optimiser
optimizer = torch.optim.SGD(net.parameters(), lr=0.2)

# train
train_losses=[]
for i in range(10):
    loss = train(net, X_train, y_train)
    train_losses.append(loss) # we save losses to display them at the end

# calculate training and test performance
rmse_train = RMSE(net,X_train,y_train)
print('Training RMSE: ', rmse_train)
rmse_val = RMSE(net,X_val,y_val)
print('Validation RMSE: ', rmse_val)
rmse_test = RMSE(net,X_test,y_test)
print('Test RMSE: ', rmse_test)

# display results
plt.figure(figsize=(14,4))

# plot training set predictions
plt.subplot(131)
PlotTargets(net(X_train).data,y_train)
plt.title('Training set')

# plot validation and test set predictions
plt.subplot(132)
PlotTargets(net(X_val).data, y_val, label = 'val targets')
PlotTargets(net(X_test).data,y_test, label = 'test targets', plot_line=False)
plt.title('Validation and Test set')

# plot training and validation loss
plt.subplot(133)
plt.plot(train_losses,label='training loss')
plt.title('MSE loss')
plt.legend()

## Exercise 5 (optional)

Do this exercise if you finished early and have time to play with a more complex neural network. We will now tune a multi-layer perceptron to predict age based on the volumes of six distinct brain regions.

First, we will load the dataset with 6 brain structures. Please note that executing the code below will replace the previous dataset.

**Task 5.1:** FThere are a few missing parts in the code that need to be filled in to convert the feature matrix and target vector from numpy arrays to a format that's suitable for Pytorch training. Can you spot and fill them? Your journey into the deeper realms of neural networks begins now!

In [None]:
# only do this if you work on Google Colab
# run the cell
# then upload file 'GA-brain-volumes-6-features.csv'

from google.colab import files
files.upload()

In [None]:
X,y = CreateFeaturesTargets('GA-brain-volumes-6-features.csv')

# perform scaling of the target values to support better convergence
target_scaler = StandardScaler()
y = target_scaler.fit_transform(y)

print('Number of samples is', X.shape[0])
print('Number of features is', X.shape[1])

# convert to tensors
X = None
y = None
print('X shape: ', X.shape)
print('y shape: ', y.shape)
print('X type: ', X.type())
print('y type: ', y.type())

**Task 5.2:** Create training, validation and test set similarly to exercise 4.

In [None]:
# extract test set


# extract validation set


# display info
print('Training samples: ', y_train.shape[0])
print('Validation samples: ', y_val.shape[0])
print('Test samples: ', y_test.shape[0])

**Task 5.3:** Time for some fun with training and performance evaluation! Use the same code you developed in Exercise 4. All you need to change is the architecture, adapting it to take in 6 input features. Everything else should work as is. Fiddle around with the learning rate to achieve the best performance.

**Task 5.4:** Implement a multi-layer perceptron architecture with the following specs:

* The first `Linear` layer should have with 6 outputs
* Apply `ReLU` activation
* Follow this up with a second `Linear` layer, with 6 inputs and 1 output.

Does this non-linear network give you better results? Dive in and discover for yourself! Good luck!