# Basics of installing PyTorch (CUDA) in Anaconda
- Open Anaconda Powershell Prompt
- Create new virtual environment: conda create -n py312 python=3.12
- Activate it: conda activate py312
- Install PyTorch using CONDA: conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
- Verify PyTorch installation: python -c "import torch; print(torch.__ version__)"
- Verify CUDA availability: python -c "import torch; print(torch.cuda.is_available())"
### Alternative PyTorch
- Install PyTorch for CPU: conda install pytorch torchvision torchaudio cpuonly -c pytorch

## Tips to redirect your Jupyter Notebook kernel to the new environment
Activate your virtual environment first on Anaconda Powershell Prompt
- #### Install ipykernel
conda install ipykernel

- #### Add the environment to Jupyter (e.g. if your virtual environment name is py312)
python -m ipykernel install --user --name=py312

In [3]:
# check if PyTorch exists otherwise follow the above steps to install PyTorch

import torch
torch.__version__

'2.5.1'

# Introduction to PyTorch
-------------------------------------------
A tensor can be viewed as a multi-dimensional array. Similar to how an n-dimensional vector is shown as a one-dimensional array with _n_ elements relative to a specific basis, any tensor can be expressed as a multi-dimensional array when referenced to a basis. The individual values within this multi-dimensional structure are referred to as the tensor's components.

[PyTorch](https://pytorch.org/foundation) is an open-source machine learning library developed by Facebook's AI Research lab. It's known for its flexibility, intuitive design, and dynamic computational graph which makes debugging easier.
his  library offers multi-dimensional tensor data structures and implements various mathematical functions to manipulate these tensors. It also includes numerous tools for effective tensor serialisation, handling arbitrary data types, and provides several other practical utilities.

PyTorch shares significant similarities with NumPy, though it uses the term ''tensor'' instead of ''N-dimensional array''. For example,

In [4]:
import torch
import numpy as np

array_np = np.array([[1, 2, 3],
                    [4, 5, 6]])
array_pytorch = torch.tensor([[1, 2, 3],
                             [4, 5, 6]])
print(array_np)
print(array_pytorch)

[[1 2 3]
 [4 5 6]]
tensor([[1, 2, 3],
        [4, 5, 6]])


Now let us create tensors in PyTorch.

In [3]:
x = torch.tensor([1, 2, 3, 4]) # This creates a 1-dimensional tensor (vector) with 4 elements
x

tensor([1, 2, 3, 4])

In [29]:
# create specific tensors
zeros = torch.zeros(3, 4)  # 3x4 tensor of zeros
ones = torch.ones(2, 3)    # 2x3 tensor of ones
rand = torch.rand(2, 2)    # 2x2 tensor of random numbers (0-1)

rand

tensor([[0.8318, 0.4992],
        [0.3006, 0.1118]])

Let us now explore some common [tensor operations](https://pytorch.org/docs/stable/tensors.html).

In [30]:
# define tensors
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

In [31]:
a.shape # dimension of the tensor a

torch.Size([3])

In [32]:
c = torch.randn(4, 4)  # creates a 4x4 tensor with random values
c.view(2, 8)  # reshape to 2x8 tensor / view() reshapes a tensor without changing its data

tensor([[ 0.8371,  0.0040, -0.1456, -0.6450,  1.7360,  1.6073,  0.8404,  1.4880],
        [-1.3423,  0.0705,  0.8875,  0.2284, -0.2351, -0.8133, -0.3248,  1.1377]])

In [33]:
c = torch.randn((1, 2, 3, 4, 5))
c.squeeze().shape  # squeeze remove dimensions of size one

torch.Size([2, 3, 4, 5])

In [34]:
c.unsqueeze(dim=5).shape  # unsqueeze adds a new dimension one at dimensional index `dim`.

torch.Size([1, 2, 3, 4, 5, 1])

In [35]:
a + b # element-wise addition of tensors

tensor([5, 7, 9])

In [36]:
a * b # element-wise multiplication of tensors

tensor([ 4, 10, 18])

In [37]:
a @ b  # matrix multiplication (note: we have 1-dimensional matrix)

tensor(32)

In [38]:
a / b  # element-wise division of tensors

tensor([0.2500, 0.4000, 0.5000])

In [39]:
torch.dot(a, b)  # scalar product of two vectors / element-wise multiplication followed by addition of tensors

tensor(32)

**Exercise: can you guess why scalar product and matrix multiplication are producing same answer ?**

A parallel [PyTorch CUDA](https://pytorch.org/docs/2.5/cuda.html) version is also available, allowing you to execute tensor calculations on NVIDIA GPUs that have a compute capability of 3.0 or higher. But in this course, as time may not permit, we will be restricted to pre-definied dataset. In future, your project might need CUDA acceleration for which please install the CUDA version of PyTorch.

# Classes
----------------------------------------------
Classifications (classes) usually refers to catergories or labels our neural network is supposed to predict.

It can be of two types: 
- binary classification (yes or no/malignant or benign/dog or cat etc.) or
- multi-class classification (cat or dog or capibara/digit recognition/species of flowers etc.) .

### Example 1: Binary classification
---------------------------------------

In [40]:
# Binary Classification - Email Spam Detection

# Simplified representation of email features
emails = [
    {"id": 1, "contains_promotional_words": True, "has_suspicious_links": True, "from_known_sender": False},
    {"id": 2, "contains_promotional_words": False, "has_suspicious_links": False, "from_known_sender": True},
    {"id": 3, "contains_promotional_words": True, "has_suspicious_links": False, "from_known_sender": True},
    {"id": 4, "contains_promotional_words": True, "has_suspicious_links": True, "from_known_sender": False}
]

# Example classification
classifications = {
    1: "Spam",      # Has promotional words and suspicious links, not from known sender
    2: "Not Spam",  # No promotional words or suspicious links, from known sender
    3: "Not Spam",  # Has promotional words but no suspicious links and from known sender
    4: "Spam"       # Has promotional words and suspicious links, not from known sender
}

print("Emails and their classifications:")
for email in emails:
    email_id = email["id"]
    features = f"Promotional words: {email['contains_promotional_words']}, " \
              f"Suspicious links: {email['has_suspicious_links']}, " \
              f"Known sender: {email['from_known_sender']}"
    classification = classifications[email_id]
    print(f"Email {email_id}: {features} → Classification: {classification}")

Emails and their classifications:
Email 1: Promotional words: True, Suspicious links: True, Known sender: False → Classification: Spam
Email 2: Promotional words: False, Suspicious links: False, Known sender: True → Classification: Not Spam
Email 3: Promotional words: True, Suspicious links: False, Known sender: True → Classification: Not Spam
Email 4: Promotional words: True, Suspicious links: True, Known sender: False → Classification: Spam


In this binary classification example, we have two possible classes: 'Spam' or 'Not Spam'. Features like promotional words, suspicious links, and sender reputation helped determine the class.

### Example 2: Multi-class Classification
------------------------------------------------------

In [41]:
# Simple representation of plant features
plants = [
    {"id": 1, "height_cm": 15, "has_flowers": True, "leaf_shape": "round"},
    {"id": 2, "height_cm": 150, "has_flowers": False, "leaf_shape": "needle"},
    {"id": 3, "height_cm": 25, "has_flowers": True, "leaf_shape": "oval"},
    {"id": 4, "height_cm": 5, "has_flowers": False, "leaf_shape": "round"}
]

# Example classification
plant_classifications = {
    1: "Herb",
    2: "Tree",
    3: "Shrub",
    4: "Grass"
}

print("Plants and their classifications:")
for plant in plants:
    plant_id = plant["id"]
    features = f"Height: {plant['height_cm']}cm, " \
              f"Has flowers: {plant['has_flowers']}, " \
              f"Leaf shape: {plant['leaf_shape']}"
    classification = plant_classifications[plant_id]
    print(f"Plant {plant_id}: {features} → Classification: {classification}")

Plants and their classifications:
Plant 1: Height: 15cm, Has flowers: True, Leaf shape: round → Classification: Herb
Plant 2: Height: 150cm, Has flowers: False, Leaf shape: needle → Classification: Tree
Plant 3: Height: 25cm, Has flowers: True, Leaf shape: oval → Classification: Shrub
Plant 4: Height: 5cm, Has flowers: False, Leaf shape: round → Classification: Grass


In this multi-class classification example, we have four possible classes: 'Herb', 'Tree', 'Shrub', or 'Grass'. Features like height, presence of flowers, and leaf shape help determine the class. A machine learning classifier would learn to assign one of these multiple classes based on patterns in features.

------------------------------------------------------
## Fully-connected neural network

Okay, now that we have some knowledge about the basic terminologies and we have our libraries set up, let us try building our first network. For this, we will use the [iris dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) from scikit-learn. Before proceeding make sure to install scikit-learn from your Anaconda Powershell Prompt.

In [43]:
import matplotlib.pyplot as plt
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

Note: if you find any error message for example saying `No module named 'matplotlib'`, open your Anaconda Powershell Prompt and install the missing library from there. 

Once installed restart kernel.

### Step 1: Load and explore the Iris dataset
------------------------------------------
The [Iris dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) is a classic dataset in machine learning practice containing measurements of sepals and petals from three species of iris flowers.

In [44]:
from sklearn.datasets import load_iris

# load the dataset
iris = load_iris()

# extract features and target classes
X = iris.data
y = iris.target
feature_names = iris.feature_names
target_names = iris.target_names

# print to check the overall structure of our dataset
# and also to find how many classes we have

print(f"Dataset dimensions: {X.shape}")
print(f"Target classes: {target_names}")

Dataset dimensions: (150, 4)
Target classes: ['setosa' 'versicolor' 'virginica']


We now know that we have 150 samples and 4 features in our dataset

### Step 2: Split data into training and testing sets
------------------------------------------

We now divide our data into training and testing datasets in 80:20 ratio. This means, we will be using 80% of our data for training and 20% for evaluating the model's performance.data

In [45]:
# split data into training and testing sets with a seed for reproducibility
# X_train here contains training set for feature data
# y_train here contains target labels for training set, or what we want to predict, or the ground truth

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Step 3: Standarise or scale the feature data
------------------------------------------

In [46]:
# standardise the feature data
scaler = StandardScaler()

# learn the parameter from training data and fit a transformer to it
# fit() - computes mean and std deviation to scale
# transform() - used to scale using mean and std deviation calculated using fit()
# fit_transform() - combination of both fit() and transform()
X_train = scaler.fit_transform(X_train)

# no fit() as we want to avoid data leakage
X_test = scaler.transform(X_test)

Now let us convert feature matrices to FloatTensor (tensor type for numerical data) and LongTensor (tensor type for integer labels).

In [47]:
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.LongTensor(y_train)


X_test_tensor = torch.FloatTensor(X_test)
y_test_tensor = torch.LongTensor(y_test)

### Step 4: Create tensor dataset and [data loader](https://www.eletreby.me/blog/getting-started-with-pytorch-dataset-and-dataloader) for batch training
-------------------------------------------------------

The `DataLoader` class wraps the `Dataset` class and handles batching, shuffling, and utilise Python's multiprocessing to speed up data retrieval.

In [48]:
# Combine features and labels into a single dataset
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(dataset=train_dataset, batch_size=len(X_train_tensor), shuffle=False)

Finally, our dataset is ready for model definition, training, and evaluation.

The following sections will explain the model that we will utilise in this notebook.

## Multi-layer perceptron
---------------------------------------
As we have seen in the previous section, A [multi-layer perceptron](https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning) is a type of [feedforward neural network (FNN)](https://deepai.org/machine-learning-glossary-and-terms/feed-forward-neural-network) comprised of fully connected neurons with a non-linear activation function. It is commonly employed to differentiate data that cannot be separated linearly.

![MLP](https://upload.wikimedia.org/wikipedia/commons/4/46/Colored_neural_network.svg)

### Input layer:
The input layer serves as the entry point for data into the neural network. Each neuron in this layer represents one feature from our dataset (such as petal length in the iris dataset). These neurons don't perform any computation - they simply pass the input values to the next layer.

### Hidden layer:
Hidden layers form the core computational engine of the neural network:

- Each neuron connects to all neurons in the previous and next layers
- These connections have associated weights that determine their importance
- The network "learns" by adjusting these weights during training
- Multiple hidden layers allow the network to build increasingly complex representations

### Output layer:
The output layer produces the final prediction or classification result. In our example dataset,

- Each output neuron typically represents a different class (setosa, versicolor, or virginica in the iris example)
- The number of output neurons depends on your specific task

#### Workflow:
- Information propagates in a forward direction through the network
- Within each (artificial) neuron, input signals are aggregated via a weighted sum operation
- This aggregated value is then passed through an activation function (introducing non-linearity). Common activation functions include [sigmoid](https://machinelearningmastery.com/a-gentle-introduction-to-sigmoid-function/), [tanh](https://www.geeksforgeeks.org/tanh-activation-in-neural-network/), [ReLU (Rectified Linear Unit)](https://medium.com/@gauravnair/the-spark-your-neural-network-needs-understanding-the-significance-of-activation-functions-6b82d5f27fbf#69d4), etc. In this course, we will be using rectified linear unit or ReLU as our activation function.
- The resulting output is then forwarded to neurons in the subsequent layer


Check out [Neural Network Playground](https://playground.tensorflow.org/) to visualise neural network and play around a bit with features like learning rate, activation, regularization, and problem type.

## Step 1: Define the MLP model

In [49]:
class MLP(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        """
        Initialise a simple feedforward MLP architecture.
        
        Parameters:
         input_size: Number of input features (e.g., 4 for Iris dataset)
         hidden_size: Number of neurons in the hidden layer
         num_classes: Number of output classes (e.g., 3 for Iris species)
        """
        super(MLP, self).__init__()
        
        # First layer (input to hidden)
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        
        # Second layer (hidden to hidden)
        self.layer2 = nn.Linear(hidden_size, hidden_size)
        
        # Output layer (hidden to output)
        self.output = nn.Linear(hidden_size, num_classes)
        
    def forward(self, x):
        """
        Define the forward pass through the network for a single input.
        
        Parameter:
         x: Input tensor of shape [input_size] representing a single sample
        
        Returns:
         Output tensor of shape [num_classes] for a single prediction
        """

        # Forward pass through the network
        # Each step applies a linear transformation followed by a non-linear activation
        
        x = self.layer1(x)
        x = self.relu(x)
            
        x = self.layer2(x)
        x = self.relu(x)
            
        x = self.output(x)
        return x

## Step 2: Set model parameters

In [50]:
input_size = 4    # Assuming 4 features (like Iris dataset)
hidden_size = 15  # Neurons in hidden layer
num_classes = 3   # Output classes 

## Step 3: Initialise model

In [51]:
model = MLP(input_size, hidden_size, num_classes)
model

MLP(
  (layer1): Linear(in_features=4, out_features=15, bias=True)
  (relu): ReLU()
  (layer2): Linear(in_features=15, out_features=15, bias=True)
  (output): Linear(in_features=15, out_features=3, bias=True)
)

## Step 4: Loss function

In [53]:
def calculate_loss(model, X, y_true):
    """
    Calculate loss for the model without training.
    
    Parameters:
     model: The MLP model
     X: Input features (torch tensor) - single sample, not batched
     y_true: True label (torch tensor) - single label, not batched
    
    Returns:
     loss: Sum of squared errors loss value
    """
    
    # Add batch dimension of 1 for model compatibility
    X_input = X.unsqueeze(0)  # Adds batch dimension [1, input_features]
    y_pred = model(X_input)   # y_pred shape: [1, num_classes]
    
    # Get number of classes from model's output layer
    num_classes = model.output.out_features
    
    # Create one-hot encoded label for a single sample
    """
    as we are working with a classification problem that has 3 possible classes (0, 1, and 2). 
    Using one-hot encoding gives us:

    Class 0 becomes: [1, 0, 0, 0]
    Class 1 becomes: [0, 1, 0, 0]
    Class 2 becomes: [0, 0, 1, 0]

    """
    y_true_one_hot = torch.zeros(1, num_classes)
    
    # Convert label tensor to integer and set the appropriate position to 1
    label_idx = y_true.item()
    y_true_one_hot[0, label_idx] = 1
    
    # Calculate sum of squared errors between prediction and one-hot label
    squared_errors = (y_pred - y_true_one_hot) ** 2
    loss = torch.sum(squared_errors)
    
    # Calculate and print the loss value
    loss_value = loss.item()
    print(f"Current SSE loss: {loss_value:.4f}")
    
    # Return the loss as a Python float
    return loss_value
###############################################################
# Usage with individual data points
total_loss = 0
num_samples = 0

# Iterate through the dataset individually
for features, labels in train_loader:
    
    for i in range(features.size(0)):
        
        # Extract individual feature and label
        single_feature = features[i]  # Single feature
        single_label = labels[i]      # Single label
        
        # Calculate loss for individual sample
        loss = calculate_loss(model, single_feature, single_label)
        total_loss += loss
        num_samples += 1

# Calculate average loss across all processed samples
if num_samples > 0:
    avg_loss = total_loss / num_samples
    print(f"Average SSE loss across {num_samples} samples: {avg_loss:.4f}")

Current SSE loss: 1.4324
Current SSE loss: 1.3866
Current SSE loss: 0.5571
Current SSE loss: 1.4376
Current SSE loss: 1.3976
Current SSE loss: 1.5637
Current SSE loss: 0.5010
Current SSE loss: 1.4522
Current SSE loss: 1.4301
Current SSE loss: 1.4059
Current SSE loss: 1.5602
Current SSE loss: 0.4424
Current SSE loss: 0.5309
Current SSE loss: 1.4000
Current SSE loss: 1.4456
Current SSE loss: 0.5168
Current SSE loss: 1.4834
Current SSE loss: 1.4882
Current SSE loss: 0.5446
Current SSE loss: 1.3786
Current SSE loss: 0.4514
Current SSE loss: 1.3389
Current SSE loss: 0.4205
Current SSE loss: 1.4467
Current SSE loss: 1.3262
Current SSE loss: 0.4913
Current SSE loss: 1.4641
Current SSE loss: 1.4273
Current SSE loss: 1.4041
Current SSE loss: 0.5863
Current SSE loss: 1.5350
Current SSE loss: 1.4361
Current SSE loss: 1.4168
Current SSE loss: 1.4230
Current SSE loss: 0.4549
Current SSE loss: 1.4557
Current SSE loss: 0.4704
Current SSE loss: 1.2869
Current SSE loss: 1.4077
Current SSE loss: 0.5058
