# Objective



*   Introduce dense layers mathematically
*   Introduce dense layers with pytorch
* Build a model using dense layers




# Dense Layers and multi layer perceptrons (MLPs) 

In neural networks, dense layers (also known as fully connected layers) are the most common type of layer used for building deep learning models. A dense layer is a type of layer where every neuron in the layer is connected to every neuron in the previous layer. This means that each input feature is connected to every neuron in the layer, and each neuron in the layer contributes to the output of every subsequent layer.

Dense layers are used for transforming inputs into a higher dimensional representation, allowing for more complex models to be learned. They are called "dense" because each neuron in the layer is densely connected to every neuron in the previous layer.

In a dense layer, the output of each neuron is computed as a weighted sum of the inputs, followed by a non-linear activation function. The weights in the layer are learned during the training process, and are adjusted to minimize the error between the predicted output and the actual output.

Dense layers are often stacked together in deep neural networks, with each layer learning increasingly complex features from the input data.

## Mathematically 


Mathematiaclly a dense layer can be written as

$y = f(Wx + b)$

where y is the output vector, x is the input vector, W is the weight matrix (also called trainable parameter) of size (m x n), b is the bias vector of size m (also trainable), and f() is the activation function applied element-wise to the matrix-vector product. For instance f can be the function $Relu$ which is defined to be $Relu(x)=0$ if $x\geq 0$ and zero otherwise.



## MLP, a very simple example :


In the context of PyTorch, MLP refers to a multilayer perceptron model implemented using the PyTorch framework. Recall that PyTorch is a popular deep learning library that provides tools and functionalities for building and training neural networks.

In PyTorch, an MLP is typically constructed by combining multiple layers, including linear layers (also known as fully connected layers) and activation functions. The linear layer in PyTorch is implemented using the torch.nn.Linear class. It represents a fully connected layer in which each neuron is connected to every neuron in the previous and next layers.

The torch.nn.Linear class takes two parameters: the number of input features and the number of output features. These parameters define the shape of the weight matrix that determines the connections and weights between the neurons. The input features correspond to the size of the previous layer, and the output features correspond to the size of the current layer.

Here's an example of how you can define an MLP using linear layers in PyTorch:



In [6]:
import torch
import torch.nn as nn

# Define input and output sizes
input_size = 10
output_size = 5

# Define a simple dense neural network with one hidden layer
class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(input_size, 20)  # 20 hidden units 
        self.fc2 = nn.Linear(20, output_size)

    def forward(self, x):
        x = torch.relu(self.fc1(x))  # activation function for the hidden layer --this computes relu( W x +b ) described above mathematically
        x = self.fc2(x)
        return x

# Create an instance of the model
model = MLP()

# Generate some random input data
x = torch.randn(32, input_size) # so you should think about x as a vector of dimension 10 and we have 32 sample points of it.

# Feed the input through the model to generate output
output = model(x)
print(output.shape)  # should be (32, 5) since we have 32 samples and 5 output classes

torch.Size([32, 5])


In this example, the MLP consists of two linear layers (fc1 and fc2). The input size of fc1 is input_size, and the output size is hidden_size. The input size of fc2 is hidden_size, and the output size is output_size. The torch.relu function is used as the activation function applied to the output of fc1, and no activation function is applied to the output of fc2.

By stacking multiple linear layers with activation functions, an MLP in PyTorch can learn complex patterns and relationships in the data.

# MLP 

In the context of PyTorch, MLP refers to a multilayer perceptron model implemented using the PyTorch framework. PyTorch is a popular deep learning library that provides tools and functionalities for building and training neural networks.

In PyTorch, an MLP is typically constructed by combining multiple layers, including linear layers (also known as fully connected layers) and activation functions. The linear layer in PyTorch is implemented using the torch.nn.Linear class. It represents a fully connected layer in which each neuron is connected to every neuron in the previous and next layers.

The torch.nn.Linear class takes two parameters: the number of input features and the number of output features. These parameters define the shape of the weight matrix that determines the connections and weights between the neurons. The input features correspond to the size of the previous layer, and the output features correspond to the size of the current layer.

Here's an example of how you can define an MLP using linear layers in PyTorch:



# Excercise : A more realistic example with MLP.

# lets get the data

get the data https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv


Using the NumPy function loadtxt(), you can load the file as a matrix of numerical values. The dataset consists of eight input variables and one output variable, which is the last column. The objective is to create a model that maps rows of input variables to an output variable, commonly referred to as a binary classification problem. The input variables are as follows:

* Number of times pregnant
* Plasma glucose concentration at 2 hours in an oral glucose tolerance test
* Diastolic blood pressure (mm Hg)
* Triceps skin fold thickness (mm)
* 2-hour serum insulin (μIU/ml)
* Body mass index (weight in kg/(height in m)2)
* Diabetes pedigree function
* Age (years)

The output variable is a binary class label (0 or 1). Once the CSV file is loaded into memory, you can divide the columns of data into input and output variables. The data will be stored as a 2D array where the first dimension represents the rows and the second dimension represents the columns, for example, (rows, columns). You can divide the array into two arrays by selecting subsets of columns using the standard NumPy slice operator “:”. The first eight columns can be selected by using the slice 0:8, and the output column can be selected by using index 8.

In [3]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
# load the dataset, split into input (X) and output (y) variables
dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
 


now lets convert the above data to pytorch tensors.

In [None]:
import torch 
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

# Define the model

In [4]:

# define the model
class PimaClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        # define the layers of the neural network
        self.hidden1 = nn.Linear(8, 12)  # input layer
        self.act1 = nn.ReLU()  # activation function for hidden layer 1
        self.hidden2 = nn.Linear(12, 8)  # hidden layer 2
        self.act2 = nn.ReLU()  # activation function for hidden layer 2
        self.output = nn.Linear(8, 1)  # output layer
        self.act_output = nn.Sigmoid()  # activation function for the output layer

    def forward(self, x):
        # define the forward pass of the neural network
        x = self.act1(self.hidden1(x))  # pass input through hidden layer 1
        x = self.act2(self.hidden2(x))  # pass output of hidden layer 1 through hidden layer 2
        x = self.act_output(self.output(x))  # pass output of hidden layer 2 through output layer
        return x

model = PimaClassifier()  # initialize the model
print(model)  # print the model architecture


PimaClassifier(
  (hidden1): Linear(in_features=8, out_features=12, bias=True)
  (act1): ReLU()
  (hidden2): Linear(in_features=12, out_features=8, bias=True)
  (act2): ReLU()
  (output): Linear(in_features=8, out_features=1, bias=True)
  (act_output): Sigmoid()
)



# Train the model



In [None]:
loss_fn = nn.BCELoss()  # binary cross-entropy loss function
optimizer = optim.Adam(model.parameters(), lr=0.001)  # Adam optimizer with learning rate of 0.001

n_epochs = 100  # number of epochs for training
batch_size = 10  # batch size for mini-batch gradient descent

for epoch in range(n_epochs):
    for i in range(0, len(X), batch_size):
        Xbatch = X[i:i+batch_size]  # select a mini-batch of inputs
        y_pred = model(Xbatch)  # make predictions for the mini-batch
        ybatch = y[i:i+batch_size]  # select the corresponding outputs for the mini-batch
        loss = loss_fn(y_pred, ybatch)  # compute the loss for the mini-batch
        optimizer.zero_grad()  # reset the gradients to zero
        loss.backward()  # compute gradients
        optimizer.step()  # update model parameters using gradients

# Evalulate the model

In [None]:

# compute accuracy
y_pred = model(X)  # make predictions for the entire dataset
accuracy = (y_pred.round() == y).float().mean()  # compute accuracy
print(f"Accuracy {accuracy}")  # print the accuracy of the model

# make class predictions with the model
predictions = (model(X) > 0.5).int()  # threshold predicted probabilities at 0.5 to make class predictions
for i in range(5):
    # print the input variables, predicted class, and actual class for the first 5 examples in the dataset
    print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))

# Remark on activation funtions

Activation functions are an essential component of neural networks. They introduce non-linearity to the network, allowing it to learn complex patterns and make more accurate predictions. Activation functions are applied to the output of each neuron or layer in a neural network.

Here are some commonly used activation functions:

* Sigmoid Function: The sigmoid function squashes the input value between 0 and 1.
Formula: σ(x) = 1 / (1 + exp(-x))
Range: (0, 1)
Example: Logistic regression, binary classification problems
ReLU (Rectified Linear Unit):

* The ReLU function returns the input value if it is positive, and 0 otherwise.
Formula: f(x) = max(0, x)
Range: [0, +∞)
Example: Convolutional Neural Networks (CNNs), deep learning models
Leaky ReLU:

* The Leaky ReLU function is an extension of ReLU that allows small negative values.
Formula: f(x) = max(αx, x), where α is a small positive constant (e.g., 0.01)
Range: (-∞, +∞)
Example: Neural networks where preventing dead neurons is important
Tanh (Hyperbolic Tangent):

*  The tanh function maps the input to the range (-1, 1), similar to the sigmoid function but with a steeper gradient.
Formula: tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
Range: (-1, 1)
Example: Recurrent Neural Networks (RNNs)
Softmax:

* The softmax function is used in multi-class classification problems to convert a vector of real numbers into a probability distribution over classes.
Formula: σ(z)_i = exp(z_i) / sum(exp(z_j)) for each element z_i in the input vector z
Range: [0, 1] (normalized probabilities that sum to 1)
Example: Multi-class classification, output layer of a neural network


These are just a few examples of activation functions. Each activation function has different properties and is suitable for different types of problems and network architectures. The choice of activation function depends on the specific requirements and characteristics of the problem at hand.

In [5]:
import torch
import torch.nn as nn

# Input tensor
input_tensor = torch.randn(10)

# Sigmoid Function
sigmoid = nn.Sigmoid()
output_sigmoid = sigmoid(input_tensor)
print("Sigmoid:", output_sigmoid)

# ReLU (Rectified Linear Unit)
relu = nn.ReLU()
output_relu = relu(input_tensor)
print("ReLU:", output_relu)

# Leaky ReLU
leaky_relu = nn.LeakyReLU(negative_slope=0.01)
output_leaky_relu = leaky_relu(input_tensor)
print("Leaky ReLU:", output_leaky_relu)

# Tanh (Hyperbolic Tangent)
tanh = nn.Tanh()
output_tanh = tanh(input_tensor)
print("Tanh:", output_tanh)

# Softmax
softmax = nn.Softmax(dim=0)
output_softmax = softmax(input_tensor)
print("Softmax:", output_softmax)

Sigmoid: tensor([0.7462, 0.7571, 0.8109, 0.7599, 0.7049, 0.5072, 0.7223, 0.3963, 0.5618,
        0.7307])
ReLU: tensor([1.0787, 1.1367, 1.4556, 1.1524, 0.8707, 0.0287, 0.9558, 0.0000, 0.2485,
        0.9982])
Leaky ReLU: tensor([ 1.0787,  1.1367,  1.4556,  1.1524,  0.8707,  0.0287,  0.9558, -0.0042,
         0.2485,  0.9982])
Tanh: tensor([ 0.7927,  0.8133,  0.8968,  0.8185,  0.7018,  0.0287,  0.7424, -0.3976,
         0.2435,  0.7608])
Softmax: tensor([0.1216, 0.1289, 0.1773, 0.1309, 0.0988, 0.0426, 0.1076, 0.0272, 0.0530,
        0.1122])


Refs: https://machinelearningmastery.com/develop-your-first-neural-network-with-pytorch-step-by-step/