#### **Module 2: Neural Network Building Blocks**



Neural Network

<img src = 'https://learnopencv.com/wp-content/uploads/2017/10/mlp-diagram.jpg' height=300 width=600>

Node/Neuron/Perceptron

<img src='https://media.geeksforgeeks.org/wp-content/uploads/20250528125143444422/Activation-functions-in-Neural-Networks.webp' height=300 width=600>


**Activation function**

In a neural network, an activation function determines the output of a neuron based on its input. It introduces non-linearity, allowing the network to learn complex patterns, and decides whether a neuron should be activated or not, passing a signal to the next layer.

<img src= 'https://www.researchgate.net/publication/344331692/figure/fig8/AS:965939822616576@1607309408063/Artificial-neural-network-activation-functions-In-this-figure-the-most-common.ppm' height=200 width=600>

In [None]:
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np

#### Data Preparation

In [None]:
N_SAMPLES_PER_CLASS = 500
N_FEATURES = 2
RANDOM_SEED = 42
torch.manual_seed(RANDOM_SEED)

class0_points = torch.randn(N_SAMPLES_PER_CLASS, N_FEATURES) + torch.tensor([-2, -2])
class0_labels = torch.zeros(N_SAMPLES_PER_CLASS, 1)



class1_points = torch.randn(N_SAMPLES_PER_CLASS, N_FEATURES) + torch.tensor([2, 2])
class1_labels = torch.ones(N_SAMPLES_PER_CLASS, 1)

X = torch.cat([class0_points, class1_points], dim=0)
y = torch.cat([class0_labels, class1_labels], dim=0)

print(X)


tensor([[-0.0731, -0.5127],
        [-1.0993, -4.1055],
        [-1.3216, -3.2345],
        ...,
        [ 2.4603,  3.1791],
        [ 0.9894,  1.3797],
        [ 0.5451,  1.4102]])


In [None]:
y.unique(), len(y)

(tensor([0., 1.]), 1000)

In [None]:
shuffled_indices = torch.randperm(len(X))
# shuffled_indices

In [None]:
# Define the split point (e.g., 80% for training, 20% for testing)
train_size = int(0.8 * len(X))
train_indices = shuffled_indices[:train_size]
test_indices = shuffled_indices[train_size:]


# Create the training and testing sets using the indices
X_train, y_train = X[train_indices], y[train_indices]
X_test, y_test = X[test_indices], y[test_indices]

print(f"Total samples: {len(X)}")
print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")


Total samples: 1000
Training samples: 800
Testing samples: 200


#### Dataset and DataLoader

In [None]:
# we use a `DataLoader` to feed the data in small batches.
# This is more memory-efficient and often helps the model train better.

from torch.utils.data import DataLoader, TensorDataset


train_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)

BATCH_SIZE = 32

train_loader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=BATCH_SIZE, shuffle=False)

data_iter = iter(train_loader)
first_batch_features, first_batch_labels = next(data_iter)
print(f"\nShape of one batch of features: {first_batch_features.shape}")
print(f"Shape of one batch of labels: {first_batch_labels.shape}")



Shape of one batch of features: torch.Size([32, 2])
Shape of one batch of labels: torch.Size([32, 1])


#### Model Definition

In [None]:

class SimpleClassifier(nn.Module):
    def __init__(self, input_features, hidden_units):
        super().__init__()
        self.linear1 = nn.Linear(in_features=input_features, out_features=hidden_units) # fully-connected layer or dense layer
        self.linear2 = nn.Linear(in_features=hidden_units, out_features=1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.linear1(x)
        x = self.relu(x)
        x = self.linear2(x)
        x = self.sigmoid(x)
        return x



In [None]:
model = SimpleClassifier(input_features=N_FEATURES, hidden_units=10)
print("\nModel Architecture:")
print(model)




Model Architecture:
SimpleClassifier(
  (linear1): Linear(in_features=2, out_features=10, bias=True)
  (linear2): Linear(in_features=10, out_features=1, bias=True)
  (relu): ReLU()
  (sigmoid): Sigmoid()
)


----

#### ***Loss Function***

The loss function is a mathematical process that quantifies the error margin between a model's prediction and the actual target value.

A loss function applies to a *single training example* and is part of the overall model's learning process that provides the signal by which the model's learning algorithm updates the weights and parameters.

Types:

- Mean Absolute Error (MAE) / L1 Loss [*Regression*]
- Mean Square Error (MSE) / L2 Loss [*Regression*]
- Binary Cross-Entropy Loss / Log Loss [*Classification*]
- Categorical Cross-Entropy Loss [*Classification*]


#### ***Cost Function***

The cost function, sometimes called the objective function, is an average of the loss function of an entire training set containing several training examples. The cost function quantifies the model's performance on the whole training dataset.

----

#***Backward Pass (Backpropagation):***

<p>The calculated loss is propagated backward through the network, starting from the output layer and moving towards the input layer. Using the chain rule of calculus, the algorithm calculates the gradient of the loss function with respect to each weight and bias in the network. </p>


# ***Optimization***
Optimization is the process of adjusting a model's parameters (weights and biases) to minimize a loss function.

---
#### ***Gradient∇L(θ)***
The gradient of the loss function, denoted as ∇L(θ), is a vector of partial derivatives. Each element in this vector is the partial derivative of the loss function L with respect to one specific parameter θᵢ:

$∇L(θ) = [ ∂L/∂θ₁, ∂L/∂θ₂, ∂L/∂θ₃, ... , ∂L/∂θₙ ]$

The gradient vector ∇L(θ) has two fundamental properties:
- Direction: It points in the direction of the steepest ascent of the loss function in the multi-dimensional parameter space.
- Magnitude: Its length or norm, ||∇L(θ)||, represents the steepness of that ascent.
----
#### ***Learning Rate (α)***
This is a hyperparameter that determines the size of the step we take in the opposite direction of the gradient.
- Too small: Training will be very slow.
- Too large: You might overshoot the minimum and bounce around, failing to converge.
----
#### ***Gradient Descent***
The fundamental update rule for Gradient Descent is:

 *new_weights = old_weights - learning_rate * gradient*


#### ***Variants of Gradient Descent***

- Batch Gradient Descent (BGD): Calculates the gradient using the entire training dataset for one update.
- Stochastic Gradient Descent (SGD): Calculates the gradient using a single, randomly chosen data point for one update.
- Mini-Batch Gradient Descent: The compromise and the de facto standard. It calculates the gradient using a small batch of data (e.g., 32, 64, 256 samples).

#### ***Momentum-Based Methods (Adding "Memory")***

- SGD with Momentum
- Adaptive Learning Rate Methods (Per-Parameter Tuning)
- Adagrad (Adaptive Gradient Algorithm):
- RMSprop (Root Mean Square Propagation):
- Adam (Adaptive Moment Estimation)

#### Training Process

In [None]:

loss_function = nn.BCELoss()  # Binary Cross-Entropy Loss for binary classification.
learning_rate = 0.01
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

epochs = 5

for epoch in range(epochs):
    model.train()

    for batch_idx, (batch_features, batch_labels) in enumerate(train_loader):
        # Forward pass
        y_pred = model(batch_features)

        # Calculate loss for this batch.
        loss = loss_function(y_pred, batch_labels)

        # Clear gradients from the previous batch.
        optimizer.zero_grad()

        # Backward pass (Backpropagation)
        loss.backward()

        # Update the model's weights and biases.
        optimizer.step()


    model.eval()
    with torch.no_grad():
        full_train_preds = model(X_train)
        epoch_loss = loss_function(full_train_preds, y_train)
        print(f'Epoch {epoch+1}/{epochs} | Loss: {epoch_loss.item():.4f}')

Epoch 1/5 | Loss: 0.1657
Epoch 2/5 | Loss: 0.0401
Epoch 3/5 | Loss: 0.0231
Epoch 4/5 | Loss: 0.0180
Epoch 5/5 | Loss: 0.0155


#### Evaluate Model

In [None]:


model.eval()  # Set model to evaluation mode

total_correct = 0
total_samples_in_test = 0

with torch.no_grad():
    for batch_features, batch_labels in test_loader:
        test_preds = model(batch_features)
        test_preds_labels = test_preds.round()

        total_correct += (test_preds_labels == batch_labels).sum().item()
        total_samples_in_test += batch_labels.size(0)

accuracy = (total_correct / total_samples_in_test) * 100
print(f"\nModel Accuracy on Test Set: {accuracy:.2f}%")




Model Accuracy on Test Set: 100.00%


#### Save Model

In [None]:
MODEL_SAVE_PATH = 'simple_classifier.pth'
torch.save(model.state_dict(), MODEL_SAVE_PATH)

##### Load Model

.pth saves the weights but does not save model architecture. Model architecture must be known beforehand to load weights in pytorch.

In [None]:
# 1. Create a new instance of the model. It must have the same architecture.
loaded_model = SimpleClassifier(input_features=N_FEATURES, hidden_units=10)
print("Created a new, untrained model instance:")
print(loaded_model)

Created a new, untrained model instance:
SimpleClassifier(
  (linear1): Linear(in_features=2, out_features=10, bias=True)
  (linear2): Linear(in_features=10, out_features=1, bias=True)
  (relu): ReLU()
  (sigmoid): Sigmoid()
)


In [None]:
loaded_model.load_state_dict(torch.load(MODEL_SAVE_PATH))


<All keys matched successfully>

#### Inference

In [None]:

point_1 = torch.tensor([[-2.5, 3.0]])
point_2 = torch.tensor([[6.8, -2.5]])


with torch.no_grad():
    pred_1 = loaded_model(point_1)
    pred_2 = loaded_model(point_2)

print(f"Point 1 prediction: probability: {pred_1.item()} class: {int(pred_1.round().item())} ")
print(f"Point 2 prediction: probability: {pred_2.item()} class: {int(pred_2.round().item())} ")


Point 1 prediction: probability: 0.2791285812854767 class: 0 
Point 2 prediction: probability: 0.9995813965797424 class: 1 
