<a href="https://colab.research.google.com/github/alimoorreza/CS167-sp25-notes/blob/main/Day19_Building_Simple_MLP_with_PyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CS167: Day19
## Building a Simple MLP using PyTorch Library

#### CS167: Machine Learning, Spring 2025


📜 [Syllabus](https://analytics.drake.edu/~reza/teaching/cs167_sp25/cs167_syllabus_sp25.pdf)

# Introduction to PyTorch

We can use PyTorch Framework to build and train MLPs and other neural networks such as CNN, RNN, LSTM, Transformers. Let's learn the basics of PyTorch.

In [None]:
# import torch library
import torch
import torch.nn as nn
import numpy as np

## __Put the Model on Training Device (GPU or CPU)__
We want to accelerate the training process using graphical processing unit (GPU). Fortunately, in Colab we can access for GPU. You need to enable it from _Runtime-->Change runtime type-->GPU or TPU_

In [None]:
# check to see if torch.cuda is available, otherwise it will use CPU
import torch
import torch.nn as nn
import numpy as np
device = (
    "cuda"
    if torch.cuda.is_available()
    else "cpu"
)
print(f"Using {device} device")
# if it prints 'cuda' then colab is running using GPU device

Using cuda device


#__Building Multilayer Perceptron (MLP)__
A multilayer perceptron is the simplest type of neural network. It consists of perceptrons (aka nodes, neurons) arranged in layers.
<div>
<img src="https://analytics.drake.edu/~reza/teaching/cs167_sp25/notes/images/mlp_toy_example.png" width=800/>
</div>



In [None]:
# let's generate 4 random samples of (x1, x2) for the above network
torch.manual_seed(0)                      # for reproducibility
random_X = torch.randn(4,2)               # you could imagine that these are pairs of (x1, x2) as shown in the above table
print('random_X = \n', random_X.numpy())


input_feature_size = random_X.shape[1]    # number of columns corresponds to feature dimension
print('\n\ninput feature dimension: ', input_feature_size)

random_X = 
 [[ 1.5409961  -0.2934289 ]
 [-2.1787894   0.56843126]
 [-1.0845224  -1.3985955 ]
 [ 0.40334684  0.83802634]]


input feature dimension:  2


In [None]:
# you can also explicitly incorporate the x0 input which accounts for the bias term in our network
# recall that x0 will always be constant value of 1
'''
input_x_vector = torch.ones(4, 3)
input_x_vector[:,1:3] = random_X # using a slicing operation let's squeeze in all (x1, x2) while retaining x0 as 1
print(input_x_vector.numpy())
input_feature_size = input_x_vector.shape[1] # number of columns corresponds to feature dimension
print('\n\ninput feature dimension: ', input_feature_size)
'''

Each of these questions need to be answered before you set up your neural network:
- Q1: how many hidden layers should be there? (depth)
- Q2: how many neurons should be in each layer? (width)
- Q3:  how many dense connections should be there in between each adjacent layers
- Q4: what should the activation be at each of the intermediate layers?
  - we could use _sigmoid()_, _tanh()_, _rectified-linear-unit()_, etc
- Q5: what should be activation of the final layer
  - depends the task _classification_ (sigmoid(), softmax()) vs. _regression_

In [None]:
torch.manual_seed(1) # for reproducibility
# Q1: how many hidden layers should be there? (depth)
# answer: there is only 1 hidden layer
num_of_hidden_layer = 1







# Q2: how many neurons should be in each layer? (width)
# answer: there are 2 neurons in the input  layer
#         there are 3 neurons in the hidden layer
#         there are 1 neurons in the output layer
#num_of_neurons_input_layer  = input_feature_size # also can be assigned from 'input_feature_size' (which we computed in the previous cell)
num_of_neurons_input_layer  = 2
num_of_neurons_hidden_layer = 3
num_of_neurons_output_layer = 1






# Q3 how many dense connections should be there in between each adjacent layers
# answer: there should be 2x3 dense connnections (between input  layer and hidden layer: dense_connections_W1)
#         there should be 3x1 dense connnections (between hidden layer and output layer: dense_connections_W2)
dense_connections_W1 = torch.randn(num_of_neurons_input_layer,  num_of_neurons_hidden_layer)
dense_connections_W2 = torch.randn(num_of_neurons_hidden_layer, num_of_neurons_output_layer)
print('Random initialized weights between input  layer and hidden layer: dense_connections_W1=\n', dense_connections_W1.numpy())
print('Random initialized weights between input  layer and hidden layer: dense_connections_W2=\n', dense_connections_W2.numpy())


# add the bias terms for all the layers except input layer
bias_terms_hidden    = torch.randn(num_of_neurons_hidden_layer)
bias_terms_output    = torch.randn(num_of_neurons_output_layer)
print('bias_terms_hidden:\n', bias_terms_hidden.numpy())
print('bias_terms_output:\n', bias_terms_output.numpy())

Random initialized weights between input  layer and hidden layer: dense_connections_W1=
 [[ 0.66135216  0.2669241   0.06167726]
 [ 0.6213173  -0.45190597 -0.16613023]]
Random initialized weights between input  layer and hidden layer: dense_connections_W2=
 [[-1.5227685 ]
 [ 0.38168392]
 [-1.0276086 ]]
bias_terms_hidden:
 [-0.5630528  -0.89229053 -0.05825018]
bias_terms_output:
 [-0.19550958]


A multilayer perceptron is the simplest type of neural network. It  consists of perceptrons (aka nodes, neurons) arranged in layers. There are 6 connections between input and hidden layer and 3 connections between hidden and output layers with the random initialized using PyTorch code above.
<div>
<img src="https://analytics.drake.edu/~reza/teaching/cs167_sp25/notes/images/mlp_toy_example0.png" width=800/>
</div>



In [None]:
# Q4: what should the activation be at each of the intermediate layers?
# answer: let use sigmoid() activation function in the hidden layer
sigmoid_activation_hidden = nn.Sigmoid()

In [None]:
# Q5: what should be activation of the final layer (let's assume we are using a binary classification task for which sigmoid ctivation is used)
sigmoid_activation_output = nn.Sigmoid()

__Forward Pass in Multilayer Perceptron (MLP)__

<div>
<img src="https://analytics.drake.edu/~reza/teaching/cs167_sp25/notes/images/mlp_toy_example_forward_pass1.png" width=800/>
</div>

Each neuron contains two operations:
- a dot product between a weight vector (edges in the graph) and an input vector
- that number through an activation function, which produces a number as an output

We can collective do all these dot products in a single layer using a single matrix-matrix multiplication [torch.matmul()](https://pytorch.org/docs/stable/generated/torch.matmul.html) as follows.

Also add the bias-term after computing the matrix multiplication

In [None]:
matrix_mult_X_and_W1 = torch.matmul(random_X[0,:], dense_connections_W1) + bias_terms_hidden
print('hidden layer input vector and weight vector dot products: \n', matrix_mult_X_and_W1.numpy())
output_hidden_layer = sigmoid_activation_hidden(matrix_mult_X_and_W1)
print('output of hidden layer: \n', output_hidden_layer.numpy())


hidden layer input vector and weight vector dot products: 
 [ 0.27377588 -0.3483593   0.08554165]
output of hidden layer: 
 [0.5680196  0.41378036 0.5213724 ]


In [None]:
matrix_mult_hidden_and_W2 = torch.matmul(output_hidden_layer, dense_connections_W2) + bias_terms_output
print('output of output layer: \n', matrix_mult_hidden_and_W2)
final_output = sigmoid_activation_output(matrix_mult_hidden_and_W2)
print('output of hidden layer: \n', final_output.numpy())

output of output layer: 
 tensor([-1.4383])
output of hidden layer: 
 [0.1918079]


#__Group activity__#
Make another simple MLP with the specifications below and perform the 'Forward Pass' of the MLP.

In [None]:
torch.manual_seed(0) # for reproducibility
# Q1: how many hidden layers should be there? (depth)
# answer: there is only 1 hidden layer
num_of_hidden_layer = 1




# Q2: how many neurons should be in each layer? (width)
# answer: there are 3 neurons in the input  layer
#         there are 4 neurons in the hidden layer
#         there are 1 neurons in the output layer
num_of_neurons_input_layer  =
#num_of_neurons_input_layer  = input_feature_size # also can be assigned from 'input_feature_size' (which we computed in the previous cell)
num_of_neurons_hidden_layer =
num_of_neurons_output_layer =




# Q3 how many dense connections should be there in between each adjacent layers
# answer: there should be ?x? dense connnections (between input  layer and hidden layer: dense_connections_W1)
#         there should be ?x1 dense connnections (between hidden layer and output layer: dense_connections_W2)
# add the bias terms for all the layers except input layer


# Q4: what should the activation be at each of the intermediate layers?
# answer: let use sigmoid() activation function in the hidden layer

# Q5: what should be activation of the final layer (let's assume we are using a binary classification task for which sigmoid ctivation is used)


# do the Forward Pass in Multilayer Perceptron (MLP)

#__Building Modular Code for Multilayer Perceptron (MLP)__

<div>
<img src="https://analytics.drake.edu/~reza/teaching/cs167_sp25/notes/images/mlp_network1.png" width=800/>
</div>

A multilayer perceptron is the simplest type of neural network. It consists of perceptrons (aka nodes, neurons) arranged in layers.
Create a network class with two methods:
- _init()_
- _forward()_


In [None]:
import torch
from torch import nn

# You can give any name to your new network, e.g., SimpleMLP.
# However, you have to mandatorily inherit from nn.Module to
# create your own network class. That way, you can access a lot of
# useful methods and attributes from the parent class nn.Module

class SimpleMLP(nn.Module):
  def __init__(self):
    super().__init__()
    # your network layer construction should take place here
    # ...
    # ...

  def forward(self, x):
    # your code for MLP forward pass should take place here
    # ...
    # ...
    return x