# Principles of Deep Learning

Thinking about deep learning as a biological process of adaptation can be very helpful. Let's break down how a neural network learns, which, at its core, is a cycle of guessing, checking, and correcting.

Think of it like this: you're trying to teach a cell culture to respond to a new growth factor. You expose it, measure a response (e.g., protein expression), see how far off it is from the desired response, and then tweak the signaling pathway to get closer next time. The deep learning training loop is a mathematical formalization of that exact process.

This entire cycle is often called training the model. It consists of four key steps that are repeated over and over.


## 1. The Forward Pass: Making a Prediction ➡️
The forward pass (or forward propagation) is the process of your neural network making a guess. You take your input data—say, a set of gene expression values from an RNA-seq experiment—and pass it forward through the network's layers.

Each layer in the network is composed of "neurons" (nodes). A neuron receives inputs, performs a simple calculation, and passes the result to the next layer. This calculation is typically a weighted sum of its inputs, plus a value called a bias, which is then fed into an activation function.

    Weighted Sum: This is just like a linear regression: z=(w1​x1​+w2​x2​+…)+b. The weights (w) are the most important part; they are the internal parameters the network will "learn." Initially, they are random. They represent the strength of the connection between neurons, analogous to synaptic strength.

    Activation Function: This function introduces non-linearity, which is critical. A biological neuron either fires or it doesn't—it's not a simple linear switch. An activation function like a ReLU (Rectified Linear Unit) or Sigmoid mimics this. It takes the weighted sum (z) and decides what the neuron's output should be.

This process continues layer by layer until the final layer produces an output—the network's prediction (y^​). For example, it might output a single number between 0 and 1, representing the probability that your input gene expression profile corresponds to a "cancerous" cell.


## 2. Loss Calculation: Quantifying the Error 📉
Now that the network has made a prediction (y^​), we need to tell it how wrong it was. The loss function (also called a cost function or objective function) does exactly this. It compares the network's prediction (y^​) with the ground truth (the correct label, y), which you know from your experimental data.

The result is a single number called the loss. A high loss means the prediction was terrible; a low loss means it was pretty good.

A common loss function for classification tasks (like "cancerous" vs. "healthy") is Binary Cross-Entropy. The formula looks a bit intimidating, but the concept is simple: it heavily penalizes predictions that are confidently wrong.\
Loss=−[ylog(y^​)+(1−y)log(1−y^​)]

If the true label y=1 and your model predicts y^​=0.9 (90% confident it's 1), the loss is small. If it predicts y^​=0.1, the loss is huge!


### Example of Loss calculation using Binary Cross-Entropy 

Formula:
        Loss = -[y*log(y_hat) + (1-y_hat)*log(1-y_hat)]

Predictions: 0.9 and 0.1.
Ground truth = 1

In [6]:
import numpy as np

# Apply loss function
y9 = -(1*np.log(0.9))
y1 = -(1*np.log(0.1) + (1-0.1)*np.log(1-0.1))

print(f'loss of predition 0.9 = {y9:.2f}')
print(f'loss of predition 0.1 = {y1:.2f}')

loss of predition 0.9 = 0.11
loss of predition 0.1 = 2.40


## 3. Backward Propagation: Assigning Blame ⬅️
This is the magic of deep learning. Now that we have the loss, we need to figure out which weights in the network were most responsible for the error and how to change them to do better next time. This process is called backward propagation or backprop.

Using calculus (specifically the chain rule), backprop calculates the gradient of the loss with respect to every single weight and bias in the network. A gradient is essentially a vector that points in the direction of the steepest ascent of the loss function. Therefore, if we move the weights in the opposite direction of the gradient, we will decrease the loss.

Think of it as a "blame assignment" algorithm. It starts from the loss and works its way backward through the network, layer by layer, calculating how much each weight contributed to the final error. A weight that had a large impact on the wrong output will get a large gradient.


## 4. Iteration and Optimization: Updating the Model 🛠️
he final step is to actually update the weights and biases using the gradients we just calculated. This is handled by an optimizer. The most fundamental optimizer is Stochastic Gradient Descent (SGD).

The update rule is simple:
new_weight=old_weight−(learning_rate×gradient)

The learning rate is a small number (e.g., 0.01) that controls how big of a step we take. It's a critical hyperparameter:

    Too high, and you might overshoot the optimal weights, like a clumsy scientist adding way too much reagent.

    Too low, and the model will learn excruciatingly slowly.

This entire four-step cycle—forward pass, loss calculation, backprop, and weight update—is one iteration. We repeat this process many times, usually by feeding the model data in small batches (e.g., 32 or 64 samples at a time). One full pass through the entire training dataset is called an epoch. After many epochs, the network's weights are finely tuned, and the loss is minimized. The model has learned!

## 5. Hands-On Example with PyTorch

Here is a simple, fully-commented script that demonstrates this entire process. We'll create a toy dataset where we try to classify points into two categories based on their (x, y) coordinates.

### LEARNING SOME PYTORCH BASICS
The torch.nn module provides everything you need to assemble a neural network. The key idea is modularity. You build a complex network by stacking together simpler pieces, much like building with LEGO blocks.

The main components you'll use from torch.nn are:

    Layers: These are the fundamental building blocks of a network. They take in tensors (the data), perform a specific mathematical operation, and pass the resulting tensors to the next layer. The most common ones are:

        nn.Linear(in_features, out_features): A fully connected layer, which applies a linear transformation: y=Wx+b.

        nn.Conv2d(...): A convolutional layer, essential for image processing tasks.

        nn.RNN(...), nn.LSTM(...): Recurrent layers used for sequence data like text or time series.

    Activation Functions: These introduce non-linearity into your network, allowing it to learn complex patterns. Without them, a neural network would just be a very deep linear model. Common examples include:

        nn.ReLU(): Rectified Linear Unit. It's simple and effective: f(x)=max(0,x).

        nn.Sigmoid(): Squeezes values between 0 and 1, often used for binary classification outputs.

        nn.Softmax(): Used for multi-class classification to convert raw scores into probabilities.

    Loss Functions: These measure how far your model's prediction is from the actual target (the ground truth). The goal of training is to minimize this value.

        nn.MSELoss(): Mean Squared Error, common for regression tasks.

        nn.CrossEntropyLoss(): The standard choice for classification tasks.

    Containers: These help you organize your layers into a cohesive model.

        nn.Module: The base class for all neural network modules. When you define your own network, you'll create a class that inherits from nn.Module. This is crucial because it automatically tracks all the learnable parameters (weights and biases) within your layers.

        nn.Sequential: A container that allows you to chain layers and operations together in a simple, linear stack.

#### A Small Example: Building a Partial Network

In [1]:
import torch
import torch.nn as nn

# 1. Define the network architecture by creating a class that inherits from nn.Module.
class SimpleNetwork(nn.Module):
    def __init__(self):
        # Call the constructor of the parent class (nn.Module)
        super(SimpleNetwork, self).__init__()

        # Define the layers you will use. nn.Module will track their parameters.
        self.layer1 = nn.Linear(in_features=10, out_features=5) # Input: 10, Hidden: 5
        self.activation = nn.ReLU()
        self.layer2 = nn.Linear(in_features=5, out_features=1) # Hidden: 5, Output: 1

    # The forward() method defines the sequence of operations.
    # This is where you specify how data flows through the network.
    def forward(self, x):
        x = self.layer1(x)
        x = self.activation(x)
        x = self.layer2(x)
        return x

In [2]:
# 2. Create an instance of the network.
model = SimpleNetwork()
print('"Simple Network" model architecture:')
print(model)

"Simple Network" model architecture:
SimpleNetwork(
  (layer1): Linear(in_features=10, out_features=5, bias=True)
  (activation): ReLU()
  (layer2): Linear(in_features=5, out_features=1, bias=True)
)


In [3]:
# 3. Create some dummy input data.
# Let's pretend we have a batch of 3 samples, each with 10 features.
input_tensor = torch.randn(3,10)
input_tensor

tensor([[-0.7015,  1.4560, -1.5479, -0.2399, -0.5083,  0.8717,  0.0032, -0.8095,
          1.3964,  0.5748],
        [-1.3217,  0.8034,  1.8372,  0.4521, -0.9674,  0.5541, -0.8576,  0.0545,
         -0.7165, -1.3152],
        [ 0.4965, -0.3070,  0.9838, -0.5722, -0.9212,  1.3286, -0.0956,  0.1977,
         -0.5860,  2.6606]])

In [4]:
# 4. Pass the data through the model (this calls the forward() method).
output = model(input_tensor)

In [5]:
print('\nShape of the input tensor:', input_tensor.shape)
print('\nShape of the output tensor:', output.shape)
print('\nOutput from the model:', '\n', output)


Shape of the input tensor: torch.Size([3, 10])

Shape of the output tensor: torch.Size([3, 1])

Output from the model: 
 tensor([[ 0.0202],
        [-0.1091],
        [-0.0347]], grad_fn=<AddmmBackward0>)


### Now move to a complete example

In [59]:
import torch
import torch.nn as nn

# --- 0. Data Preparation ---
# Let's create some synthetic data. Imagine these are two features from your cells.
# We want to classify a cell as type 0 or type 1.
# Features (e.g., expression of Gene A, expression of Gene B)

X = torch.tensor([[1.0, 1.0], [1.0, 4.0], [2.0, 2.0], [4.0, 3.0], [5.0, 4.0], [6.0, 5.0]], dtype=torch.float32)
# Labels (Ground Truth: Type 0 or Type 1)
Y = torch.tensor([[0], [0], [0], [1], [1], [1]], dtype=torch.float32)

print(f'Ground truth values (6 cells with a type 0 or 1: \n{Y}')
print(f'Cell type tensor shape: \n{Y.shape}')
print(f'\nRelative expression of gene A and B per cell: \n{X}')
print(f'Genes expression tensor shape: \n{X.shape}')

Ground truth values (6 cells with a type 0 or 1: 
tensor([[0.],
        [0.],
        [0.],
        [1.],
        [1.],
        [1.]])
Cell type tensor shape: 
torch.Size([6, 1])

Relative expression of gene A and B per cell: 
tensor([[1., 1.],
        [1., 4.],
        [2., 2.],
        [4., 3.],
        [5., 4.],
        [6., 5.]])
Genes expression tensor shape: 
torch.Size([6, 2])


In [60]:
# --- Define the Model Architecture ---
# A very simple neural network with one input layer, one hidden layer, and one output layer.

class SimpleNeuralNet(nn.Module):
    def __init__(self):
        super(SimpleNeuralNet, self).__init__()
        self.features1 = nn.Linear(2, 4) # Input features=2, hidden neurons=4
        self.relu = nn.ReLU() # Non-linear activation function
        self.features2 = nn.Linear(4, 1) # Hidden neurons=4, output classes=1
        self.sigmoid = nn.Sigmoid() # Squashes output to a probability (0 to 1)

    # The forward() method defines the sequence of operations.
    def forward(self, x):
        out = self.features1(x)
        out = self.relu(out)
        out = self.features2(out)
        out = self.sigmoid(out)
        return out

In [62]:
# Pass the data through the model (this calls the forward() method).
class_model = SimpleNeuralNet()
print('"Simple Neural Network" model architecture:')
print(class_model)
print(f"\nInitial (random) prediction for first data point: {class_model(X[0]).item():.4f}\n")

"Simple Neural Network" model architecture:
SimpleNeuralNet(
  (features1): Linear(in_features=2, out_features=4, bias=True)
  (relu): ReLU()
  (features2): Linear(in_features=4, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)

Initial (random) prediction for first data point: 0.5624



In [63]:
# --- Define Loss and Optimizer ---
learning_rate = 0.01
loss_function = nn.BCELoss() # Binary Cross-Entropy Loss
optimizer = torch.optim.SGD(class_model.parameters(), lr=learning_rate) # Stochastic Gradient Descent

In [65]:
# --- The Training Loop ---
num_epochs = 100000

    # ----------------------------------------------
    #  THE FOUR STEPS OF DEEP LEARNING IN A FOR LOOP
    # ----------------------------------------------

for epoch in (range(num_epochs)):
    # 1. FORWARD PASS: Make a prediction
    predictions = class_model(X)
    
    # 2. LOSS CALCULATION: Quantify the error
    loss = loss_function(predictions, Y)
    
    # 3. BACKWARD PROPAGATION: Calculate gradients ("assign blame")
    # First, clear old gradients from the previous iteration
    optimizer.zero_grad()
    # Now, perform backpropagation
    loss.backward()

    # 4. UPDATE WEIGHTS: Adjust the model's parameters
    optimizer.step()

    # Show the iteration process every 10 epochs
    if (epoch + 1) % 10000 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

print("\n--- Training Finished ---")

Epoch [10000/100000], Loss: 0.0002
Epoch [20000/100000], Loss: 0.0002
Epoch [30000/100000], Loss: 0.0002
Epoch [40000/100000], Loss: 0.0002
Epoch [50000/100000], Loss: 0.0002
Epoch [60000/100000], Loss: 0.0002
Epoch [70000/100000], Loss: 0.0001
Epoch [80000/100000], Loss: 0.0001
Epoch [90000/100000], Loss: 0.0001
Epoch [100000/100000], Loss: 0.0001

--- Training Finished ---
