# Graded Lab 02

<a target="_blank" href="https://colab.research.google.com/github/andrew-nash/CS6421-labs-2026/blob/main/Lab02.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>


This lab will be the first graded assessment of the course. It focuses on applying the priciples of modularity, and implementing the mathematics that you have seen in case to build simple neural networks in PyTorch from scratch.

Helpful material worth referencing: https://docs.pytorch.org/tutorials/beginner/nn_tutorial.html

**IMPORTANT NOTICE** Where you are asked to define a certain module, class or function, do not change the names of the Class or its functions. Your assignement will be auto-graded, and requires these names of these to match the expected names exactly.

If you hve questions during the week, you may contact me at a.nash@cs.ucc.ie

In [1]:
import torch
import numpy as np
import pandas as pd
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

## Data (Already Completed, No Marks)

We will re-use the data loading from the previous lab (modified so that it returns te features as Tensors rather than np.arrays). This will not reuqire modification by you, and is not a graded component of the lab. However, you may alter this code if you wish.

The task here is to predict the probability of a passenger surviving the Titanic sinking, given a set of features consisting of

1. Gender
2. Where they embarked
3. Their Ticket Class
4. Their Age
5. The price of their ticket

In [2]:
!wget https://github.com/andrew-nash/CS6421-labs-2026/raw/refs/heads/main/titanic_test.csv
!wget https://github.com/andrew-nash/CS6421-labs-2026/raw/refs/heads/main/titanic_train.csv

--2026-02-09 21:10:34--  https://github.com/andrew-nash/CS6421-labs-2026/raw/refs/heads/main/titanic_test.csv
Resolving github.com (github.com)... 20.205.243.166
Connecting to github.com (github.com)|20.205.243.166|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/andrew-nash/CS6421-labs-2026/refs/heads/main/titanic_test.csv [following]
--2026-02-09 21:10:35--  https://raw.githubusercontent.com/andrew-nash/CS6421-labs-2026/refs/heads/main/titanic_test.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28629 (28K) [text/plain]
Saving to: ‘titanic_test.csv’


2026-02-09 21:10:35 (138 MB/s) - ‘titanic_test.csv’ saved [28629/28629]

--2026-02-09 21:10:35--  https://github.com/andrew-nash/CS6421-labs

In [3]:
class TitanicDataset (torch.utils.data.Dataset):
    # the Train argument defines whether the dataset is being queried for train or test data
    # In practice, you would likely be handling separate datasets for each
    def __init__(self, file_name, Train=True):
        self.dataframe = pd.read_csv(file_name)
        #print(self.dataframe.head())
        self.dataframe = self.dataframe.drop(['PassengerId', 'Name', 'Ticket', 'Cabin'], axis=1)
        self.dataframe = self.dataframe.drop(['SibSp', 'Parch'], axis=1)

        self.dataframe = self.dataframe.dropna(subset=['Age', 'Embarked', 'Sex', 'Pclass', 'Fare'])

        # Instead of using strings ("Male" and "Female"), we need to convert these to numerical values -- in this
        # case 1 for male, 0 for female
        self.dataframe['Male'] = np.where(self.dataframe['Sex'] == 'male', 1, 0)

        # Manual one-hot encoding for Embarked, using np.where

        # Embarked locations are: C = Cherbourg, Q = Queenstown, S = Southampton
        # Embarked_C = 1, if embarked from Cherbourg, 0 otherwise
        # Embarked_S = 1, if embarked from Southampton, 0 otherwise
        # Embarked_ = 0 and Embarked_S = 0, if embarked from Queenstown (now Cobh ...)
        self.dataframe['Embarked_C'] = np.where(self.dataframe['Embarked'] == 'C', 1, 0)
        self.dataframe['Embarked_S'] = np.where(self.dataframe['Embarked'] == 'S', 1, 0)

        # Remove original Sex and Embarked columns
        self.dataframe = self.dataframe.drop(['Sex', 'Embarked'], axis=1)

        # We can achieve the same one-hot encoding for Pclass using Pandas get_dummies function, instead of the
        # manual np.where approach above
        self.dataframe[['Pclass_1', 'Pclass_2']] = pd.get_dummies(self.dataframe['Pclass'], prefix='Pclass').iloc[:, :2].astype(int)
        self.dataframe = self.dataframe.drop(['Pclass'], axis=1)


        # Nomralisation
        self.dataframe['Age_N'] = self.dataframe['Age']/self.dataframe['Age'].max()

        # An example of a log transform
        self.dataframe['log_Fare'] = np.log10(self.dataframe['Fare'] + 1)
        self.dataframe = self.dataframe.drop(['Age', 'Fare'], axis=1)

        self.dataframe.reset_index()
        self.Train = Train

    def __len__(self):
        return self.dataframe.shape[0]

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
        if self.Train :
            survived = self.dataframe['Survived']
            survived = torch.tensor(np.array(survived)[idx], dtype=torch.float)

        features = pd.DataFrame(columns=('Male',  'Embarked_C', 'Embarked_S', 'Pclass_1', 'Pclass_2', 'Age_N', 'log_Fare'))

        # Bear in mind that the test dataset will not have a Survived column
        if self.Train:
            features = self.dataframe.iloc[idx,1:]
        else:
            features = self.dataframe.iloc[idx,:]

        features = torch.tensor(features.values, dtype=torch.float)
        if self.Train:
            sample = (features, survived)
        else:
            sample = features
        return sample

In [4]:
training_data = TitanicDataset('titanic_train.csv')
testing_data = TitanicDataset('titanic_test.csv', Train=False)

sub_train_dataset, val_dataset = torch.utils.data.random_split(training_data, [0.8,0.2])
train_dataloader = DataLoader(sub_train_dataset, batch_size=64)
val_dataloader = DataLoader(val_dataset, batch_size=64)

# Modularity

We are going to now crate the various modules needed to build a basic feedforward neural network.

In [5]:
from torch import nn
import torch

## Defining a Simple Feedforward Layer

The following is a sample module (implementing PyTorch's abstract nn.Module), that implements a DNN layer, with a sinle weight matrix, and no bias vector.

nn.Parameter is used in Torch to define any parameter (typically, but not exclusively) weights and biases), that should be updated by backpropogation.



In [6]:
class SampleFeedForwardLayer(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()

        # torch.randn (https://docs.pytorch.org/docs/stable/generated/torch.randn.html)
        # creates a Tensor with random values extracted from a standard nomal
        # distribution
        # this will be a matrix of dimension: input_size x output_size
        initial_weights_values = torch.randn(input_size, output_size)

        # create a trainable weights matrix from these inital values
        self.weights = nn.Parameter(initial_weights_values)

    def forward(self, x):
        '''
        Compare this to the lecture notes - remeber that @ corresponds to Matrix
        (or Tensor multiplication). So this completes the forward pass of this layer

        Observe that we do x@self.weights, and not self.weights@x
        The reason lies in the fact that a 1-D vector/flat Tensor (of shape (input_size))
        in Pytorch correpsonds to a Linear Algebraic ROW VECTOR

        If you are unsure of this, consult the lecture notes on Matehmatics, and the pdf
        document provided on GitHub (https://github.com/andrew-nash/CS6421-labs-2026/blob/main/Lab02-RowVsColVectors.pdf)
        '''
        return x@self.weights
        # equivalently, torch.matmul([x, self.weights])

Verifying that this works (if you are working on the Jupyter server, or you own GPU-equipped machine or Colab instance, you may change 'cpu' to 'cuda'):

In [7]:
tL = SampleFeedForwardLayer(3, 2)
tL.to('cpu')
test_xdata = torch.tensor([1,2,3], dtype=torch.float)
tL(test_xdata)

tensor([ 1.6878, -0.1121], grad_fn=<SqueezeBackward4>)

### Graded Task 1

Define a FeedForwardHeLayer, that

1) Includes and uses a bias vector, as well as a weights matrix. These should be initialised as 0s. Hint: https://docs.pytorch.org/docs/stable/generated/torch.zeros.html
2) Improve the weight initialisation to use He initialisation (you are **not** allowed to use Torch's nn.init, you must implement this from scratch


#### He initialisation

The Standard Normal initialization given above, is based on setting each weight as samples from $N(\mu=0,\sigma=1)$. He initialisation sets weights based on samples frm $N\left(\mu=0, \sigma=\sqrt{\frac{2}{n}}\right)$, where $n$ is the number of inputs to the layer.

Hint: is there a mathematical operation you can do to samples from $N(\mu=0,\sigma=1)$, to transform them into samples from  $N(\mu=0,\sqrt{\frac{2}{n}})$?


In [44]:
class FeedForwardHeLayer(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()

        He_intialization = torch.sqrt(torch.tensor(2.0/input_size))
        initial_weights_values = torch.randn(input_size, output_size) * He_intialization #modify to correpsond to He initialization

        # create a trainable weights matrix from these inital values
        self.weights = nn.Parameter(initial_weights_values)

        initial_bias_values = torch.zeros(output_size)
        self.biases = nn.Parameter(initial_bias_values) # add a bias vector

    def forward(self, x):
        return x @ self.weights + self.biases #Solve


## Defining an Activation Function

The ReLU activation

Here, I give you the complete implmentation of a Relu Activation function as a Module:

In [46]:
class ReluAct(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()
        # No nn.Parameters are needed, there are no trainable
        # weights or biases associated with the activation
        # function


    def forward(self, x):
        # The most elegant approach is the following --
        # Clamp the lowest possible value to be 0 with:
        # return torch.clamp(x, min=0)
        # A different method, which is more complex, but also more flexible is:

        # https://docs.pytorch.org/docs/stable/generated/torch.where.html
        # Wherever x>0, it will keep the value from x. Everywhere else, it
        # will replace the value with 0s
        return torch.where((x > 0), x, 0)



#### Graded Task 2

Implement the Sigmoid activation: $\displaystyle \frac{1}{1+\exp\left(-x\right)}$

Hint: https://docs.pytorch.org/docs/stable/generated/torch.exp.html

In [48]:
class SigmoidAct(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()
        # No nn.Parameters are needed, there are no trainable
        # weights or biases associated with the activation
        # function


    def forward(self, x):
        # implement a sigmoid activation
        return 1 / (1 + torch.exp(-x))


## Defining A Loss Function

We can define a loss function using nn.Module similarly to activation functions.

Here, you are given a sample of the Mean Absolute Error, defined as
\begin{equation}
l\left(\widehat{y}, y\right)=\frac{1}{n}\sum_{i=1}^{n} \left|\widehat{y} - y\right|
\end{equation}

In [49]:
class SampleMAELoss(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, pred, y):
        return torch.mean(torch.abs(pred-y))

#### Graded Task 3

Implement the Mean Squared Error loss, defined as
\begin{equation}
l\left(\widehat{y}, y\right)=\frac{1}{n}\sum_{i=1}^{n} \left(\widehat{y} - y\right)^2
\end{equation}

In [50]:
class SampleMSELoss(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, pred, y):
        return torch.mean((pred - y)**2)  #implement

## Putting All this Together


We are going to combine all of our custom modules, into a single overarching module that will define our model.

In [51]:
class SimpleModelFromSamples(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()

        self.layer1 = SampleFeedForwardLayer(input_size, 10)
        # because we are using modules to define our activation functions, we must
        # store them within the model
        self.act1 = ReluAct(input_size, 10)

        self.layer2 = SampleFeedForwardLayer(10, 10)
        self.act2 = ReluAct(10, 10)

        self.layer3 = SampleFeedForwardLayer(10, output_size)
        self.act3 = ReluAct(10, output_size)

    def forward(self, x):
        x = self.layer1(x)
        x = self.act1(x)
        x = self.layer2(x)
        x = self.act2(x)
        x = self.layer3(x)
        x = self.act3(x)
        return x

In [54]:
sample_model = SimpleModelFromSamples(7, 1)

In [53]:
sample_model(sub_train_dataset[0][0])


tensor([0.], grad_fn=<WhereBackward0>)

### Graded Task 4

class FeedForwardHeLayer(nn.Module):
Implement a model including the and Sigmoid activation modules that you defined earlier in the Lab. It should have the followin exact structure:

1. A FeedForwardHeLayer with 16 output neurons, followed by a ReLU activation
2. A FeedForwardHeLayer with 32 output neurons, followed by a ReLU activation
3. A FeedForwardHeLayer with 64 output neurons, followed by a ReLU activation
4. A SampleFeedForwardLayer with 16 output neurons, followed by a Sigmoid activation
5. A SampleFeedForwardLayer with 1 output neurons, followed by a Sigmoid activation

In [55]:
class CustomModel(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()

        #'''DEFINE YOUR LAYERS AND ACTIVATION FUNCTIONS'''

        #A FeedForwardHeLayer with 16 output neurons, followed by a ReLU activation
        self.HeLayer_1 = FeedForwardHeLayer(input_size, 16)
        self.ReLU_activation_1 = ReluAct(input_size, 16)

        #A FeedForwardHeLayer with 32 output neurons, followed by a ReLU activation
        self.HeLayer_2 = FeedForwardHeLayer(16, 32)
        self.ReLU_activation_2 = ReluAct(16, 32)

        #A FeedForwardHeLayer with 64 output neurons, followed by a ReLU activation
        self.HeLayer_3 = FeedForwardHeLayer(32, 64)
        self.ReLU_activation_3 = ReluAct(32, 64)

        #A SampleFeedForwardLayer with 16 output neurons, followed by a Sigmoid activation
        self.Layer_4 = SampleFeedForwardLayer(64, 16)
        self.Sigmoid_activation_4 = SigmoidAct(64, 16)

        #A SampleFeedForwardLayer with 1 output neurons, followed by a Sigmoid activation
        self.Layer_5 = SampleFeedForwardLayer(16, output_size)
        self.Sigmoid_activation_5 = SigmoidAct(16, output_size)

    def forward(self, x):
        #'''DEFINE THE FEEDFORWARD PROCESS'''
        x = self.HeLayer_1(x)
        x = self.ReLU_activation_1(x)

        x = self.HeLayer_2(x)
        x = self.ReLU_activation_2(x)

        x = self.HeLayer_3(x)
        x = self.ReLU_activation_3(x)

        x = self.Layer_4(x)
        x = self.Sigmoid_activation_4(x)

        x = self.Layer_5(x)
        x = self.Sigmoid_activation_5(x)

        return x

## Define the Backpropagation

Unlike the last lab, we will apply the back-propogation manually.

In [56]:
def fit_sample_model(model_to_train, loss_func, epochs, lr=0.01, batch_size=64):
    N = len(sub_train_dataset)
    for epoch in range(epochs):
        correct = 0
        train_loss = 0
        print(f"Epoch {epoch+1}\n-------------------------------")
        for i in range((N - 1) // batch_size + 1):
            # Get the start and end indices of ther batch
            start_i = i * batch_size
            end_i = start_i + batch_size

            # Extract x and y data
            x_train, y_train = sub_train_dataset[start_i:end_i]
            pred = model_to_train(x_train)
            loss = loss_func(pred, y_train)



            loss.backward()


            # When we perform back-prop, we must tell PyTorch
            # to pause trying to auto-computing gradients of
            # our operations
            with torch.no_grad():
                train_loss += loss.item()

                # The most important part of this to pay attention to, is the .T
                # this is because the shape of torch.round(pred) is (batch_size, 1) a matrix
                # with a single column, whereas the shape of y_train is (batch_size) --
                # a flat vector. We could use reshape to convert torch.round(pred) to hav
                # shape (batch_size). Hoewver, T (transpose), does the same in this case
                correct += (torch.round(pred).T==y_train).type(torch.float).sum().item()
                for p in model_to_train.parameters():
                    # here we manually define out weight and bias update
                    # rather than using an optimizer
                    p -= p.grad * lr
                # reset the gradients to 0
                # otherwise, in the next pass, the new gradients
                # will be added to the previous ones
                model_to_train.zero_grad()
        correct /= N
        print(f"Train Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {train_loss:>8f} \n")



In [57]:
sample_model = SimpleModelFromSamples(7, 1)
loss_func = SampleMAELoss()
epochs = 10
lr = 0.001
batch_size=64
fit_sample_model(sample_model, loss_func, epochs, lr, batch_size)

Epoch 1
-------------------------------
Train Error: 
 Accuracy: 4.2%, Avg loss: 95.643007 

Epoch 2
-------------------------------
Train Error: 
 Accuracy: 5.3%, Avg loss: 79.981307 

Epoch 3
-------------------------------
Train Error: 
 Accuracy: 6.1%, Avg loss: 65.720197 

Epoch 4
-------------------------------
Train Error: 
 Accuracy: 6.5%, Avg loss: 52.763155 

Epoch 5
-------------------------------
Train Error: 
 Accuracy: 6.5%, Avg loss: 41.438906 

Epoch 6
-------------------------------
Train Error: 
 Accuracy: 5.6%, Avg loss: 31.798775 

Epoch 7
-------------------------------
Train Error: 
 Accuracy: 6.0%, Avg loss: 23.638237 

Epoch 8
-------------------------------
Train Error: 
 Accuracy: 10.9%, Avg loss: 17.222318 

Epoch 9
-------------------------------
Train Error: 
 Accuracy: 14.4%, Avg loss: 13.510290 

Epoch 10
-------------------------------
Train Error: 
 Accuracy: 28.8%, Avg loss: 11.566397 



### Graded Task 5

Create a new training function, fit_your_model(), which trains the CustomModel you defined in Task 4.

You should also compute and output the validation loss and accuracy on val_dataset in this function.

You should also use you custom MSE loss for this trianing.

In [60]:
def fit_your_model(model, loss_func, epochs, lr, batch_size):
  train_N = len(sub_train_dataset)
  val_N = len(val_dataset)

  for epoch in range(epochs):
        model.train()
        correct_train = 0
        train_loss = 0
        print(f"Epoch {epoch+1}\n-------------------------------")

        #Training loop
        for i in range((train_N - 1) // batch_size + 1):
            start_i = i * batch_size
            end_i = start_i + batch_size

            x_train, y_train = sub_train_dataset[start_i:end_i]
            pred = model(x_train)
            loss = loss_func(pred, y_train)
            loss.backward()

            with torch.no_grad():
                train_loss += loss.item()
                correct_train += (torch.round(pred).T == y_train).type(torch.float).sum().item()

                for p in model.parameters():
                    p -= p.grad * lr
                model.zero_grad()

        train_a = correct_train/train_N

        #Validation loop
        model.eval()
        val_loss = 0
        correct_val = 0

        with torch.no_grad():
          for i in range((val_N - 1) // batch_size + 1):
            start_i = i * batch_size
            end_i = start_i + batch_size

            x_val, y_val = val_dataset[start_i:end_i]
            val_predicted = model(x_val)

            val_loss += loss_func(val_predicted, y_val).item()
            correct_val += (torch.round(val_predicted).T == y_val).type(torch.float).sum().item()
        val_a = correct_val / val_N

        print(f"Train -> Accuracy: {100*train_a:.1f}%, Avg loss: {train_loss:.6f}")
        print(f"Val -> Accuracy: {100*val_a:.1f}%, Avg loss: {val_loss:.6f} \n")


  #pass

In [61]:
model = CustomModel(7,1)
loss_func = SampleMSELoss()
epochs = 10
lr = 0.001
batch_size=64
fit_your_model(model, loss_func, epochs, lr, batch_size)

Epoch 1
-------------------------------
Train -> Accuracy: 40.9%, Avg loss: 3.919954
Val -> Accuracy: 38.0%, Avg loss: 1.487672 

Epoch 2
-------------------------------
Train -> Accuracy: 40.9%, Avg loss: 3.826605
Val -> Accuracy: 38.0%, Avg loss: 1.452235 

Epoch 3
-------------------------------
Train -> Accuracy: 40.5%, Avg loss: 3.734572
Val -> Accuracy: 38.0%, Avg loss: 1.417005 

Epoch 4
-------------------------------
Train -> Accuracy: 40.5%, Avg loss: 3.645530
Val -> Accuracy: 38.0%, Avg loss: 1.382465 

Epoch 5
-------------------------------
Train -> Accuracy: 40.5%, Avg loss: 3.559556
Val -> Accuracy: 38.0%, Avg loss: 1.348983 

Epoch 6
-------------------------------
Train -> Accuracy: 40.7%, Avg loss: 3.477955
Val -> Accuracy: 38.0%, Avg loss: 1.316521 

Epoch 7
-------------------------------
Train -> Accuracy: 40.7%, Avg loss: 3.400793
Val -> Accuracy: 38.0%, Avg loss: 1.285132 

Epoch 8
-------------------------------
Train -> Accuracy: 40.4%, Avg loss: 3.328177
Val -

### No Marks

Suggestions for bonus experimentation to further your own learning and understanding (this is content that we will visit later in the course). No marks will be awarded for this, it is pirely or your own practice:

1. Experiment with the number of layers, and numbers of nuerons in each layer
2. Try different numbers epochs, different batch sizes and different learning rates
2. Look at implementing other weight initialisations, such as Xavier initialisation
3. Consider other activation functions, such as leaky relu
4. Practice using the .to('cuda') funciton, to build and train models on GPU