### Day 2: Forward Propagation

Goal:
Understand how input moves through a neural network to produce an output.

Core equation:
z = Wx + b

a = activation(z)


In [7]:
import torch           # PyTorch library for tensor computations
import torch.nn as nn  # PyTorch's neural network module for defining neural networks


In [8]:
print("Single Neuron Forward Pass")

# Input features
x = torch.tensor([2.0, 3.0])    # Input vector of size 2 (2 features)

# Weights
w = torch.tensor([0.5, 0.8])   # Weights for each input feature

# Bias
b = torch.tensor(1.0)       

# Forward pass
z = torch.dot(w, x) + b    # Linear combination (weighted sum + bias), dot product (where each element of w is multiplied by corresponding element of x and then summed up)
a = torch.relu(z)          # reLU does not allow negative values, if z<0 then a=0 else a=z

print("Input:", x)
print("Weights:", w)
print("Bias:", b)
print("z = w·x + b =", z.item())
print("Output after ReLU:", a.item())


Single Neuron Forward Pass
Input: tensor([2., 3.])
Weights: tensor([0.5000, 0.8000])
Bias: tensor(1.)
z = w·x + b = 4.400000095367432
Output after ReLU: 4.400000095367432


Each input has a weight

Weighted sum + bias = z

Activation function gives final output

<img src="image.png" alt="Alt text" width="700" height="400">

In [None]:
print("\nSingle Layer Forward Pass")

# Input (2 features)
x = torch.tensor([2.0, 3.0])          # x1,x2

# Weight matrix: 3 neurons, 2 inputs  # w11 w12 w21 w22 w31 w32 
W = torch.tensor([[0.1, 0.2],
                  [0.3, 0.4],
                  [0.5, 0.6]])

# Bias for each neuron                # b1 b2 b3
b = torch.tensor([0.1, 0.2, 0.3])

# Forward pass         
z = torch.matmul(W, x) + b            # matrix multiplication (W·x) + bias
a = torch.relu(z)

print("Input:", x)
print("z = W·x + b =", z)             # z is a vector of size 3 (one for each neuron)
print("Output after ReLU:", a)        # a is also a vector of size 3 (one for each neuron)



Single Layer Forward Pass
Input: tensor([2., 3.])
z = W·x + b = tensor([0.9000, 2.0000, 3.1000])
Output after ReLU: tensor([0.9000, 2.0000, 3.1000])


Each row of W = one neuron

We now get multiple outputs

This is a layer

### Full MLP Forward Pass (2 Layers)

Input (2) → Hidden (3) → Output (1)


In [None]:
print("\nComplete MLP Forward Pass")

# Input
x = torch.tensor([2.0, 3.0])

# Layer 1: Input → Hidden
W1 = torch.tensor([[0.1, 0.2],
                   [0.3, 0.4],
                   [0.5, 0.6]])
b1 = torch.tensor([0.1, 0.2, 0.3])

# Layer 2: Hidden → Output
W2 = torch.tensor([[0.7, 0.8, 0.9]])
b2 = torch.tensor([0.4])

# Forward pass
z1 = torch.matmul(W1, x) + b1  # z1 has size 3 (hidden layer output before activation)
a1 = torch.relu(z1)            # a1 is the output of hidden layer

z2 = torch.matmul(W2, a1) + b2
output = torch.sigmoid(z2)    # sigmioid function = 1/(1+exp(-z))

print("Hidden layer output:", a1)
print("Final output:", output.item())     # item() to get scalar value from single-element tensor



Complete MLP Forward Pass
Hidden layer output: tensor([0.9000, 2.0000, 3.1000])
Final output: 0.9955922961235046


Multiply input by weights.

Add bias.

Apply activation.

Pass to next layer.

Repeat until output.

### Same Network Using PyTorch Layers


In [14]:
print("\nUsing PyTorch nn.Sequential")

model = nn.Sequential( # Sequential container to stack layers together
    nn.Linear(2, 3),   # Input → Hidden    # 2 inputs, 3 neurons and randomly initialized weights and biases 
    nn.ReLU(),                             # 3 neurons 3 outputs from hidden layer 
    nn.Linear(3, 1),   # Hidden → Output  ;  3 inputs from hidden layer, 1 output neuron and randomly initialized weights and bias
    nn.Sigmoid()
)

x = torch.tensor([2.0, 3.0])
output = model(x)

print("Model output:", output.item())



Using PyTorch nn.Sequential
Model output: 0.6329331398010254




#### 1. Input (2 features) → [Hidden Layer: 3 neurons] → [Output Layer: 1 neuron]

       Input Layer        Hidden Layer       Output Layer
          [x₁]              [h₁]               [y]    
          [x₂]     →        [h₂]     reLU →         -> sigmoid
                            [h₃]



#### 2. Complete Line-by-Line Breakdown

#### Line 1: Creating the Model Structure
model = nn.Sequential(
What this does:

nn.Sequential is a container that stacks layers in sequence

Think of it as a pipeline where data flows through each layer in order

Like an assembly line: Input → Layer 1 → Layer 2 → ... → Output

Analogy: Imagine a factory assembly line:

Raw Material → Machine 1 → Machine 2 → Final Product
   (Input)    (Layer 1)   (Layer 2)     (Output)

#### Line 2: First Layer (Input → Hidden)

nn.Linear(2, 3),   # Input → Hidden

Breaking down nn.Linear(2, 3):

This creates a fully connected layer (also called dense layer)

2: Number of input features (must match your data)

3: Number of neurons in this layer

What happens INSIDE this layer:

Weight Matrix Created: Size = 3 × 2 (3 neurons, each with 2 weights)


Weights = [[w₁₁, w₁₂],  # Neuron 1: weights for x₁ and x₂
           [w₂₁, w₂₂],  # Neuron 2: weights for x₁ and x₂  
           [w₃₁, w₃₂]]  # Neuron 3: weights for x₁ and x₂

Bias Vector Created: Size = 3 (one bias per neuron)


Bias = [b₁, b₂, b₃]

Mathematical Operation (for input [x₁, x₂]):


For each neuron i:

output_i = (w_i₁ × x₁) + (w_i₂ × x₂) + b_i

In matrix form:

[h₁]   = [w₁₁  w₁₂]   [x₁]   + [b₁]
[h₂]     [w₂₁  w₂₂] × [x₂]     [b₂]
[h₃]     [w₃₁  w₃₂]            [b₃]

Example with numbers (random initialization):

```
 Suppose after initialization:
 Weights = [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]]
 Bias = [0.1, 0.2, 0.3]
 Input x = [2.0, 3.0]
```
# Calculation:
h₁ = (0.1×2) + (0.2×3) + 0.1 = 0.2 + 0.6 + 0.1 = 0.9
h₂ = (0.3×2) + (0.4×3) + 0.2 = 0.6 + 1.2 + 0.2 = 2.0
h₃ = (0.5×2) + (0.6×3) + 0.3 = 1.0 + 1.8 + 0.3 = 3.1

# Output from Linear(2,3): [0.9, 2.0, 3.1]

#### Line 3: Activation Function (ReLU)

    nn.ReLU(),
What ReLU does:

ReLU = Rectified Linear Unit

Formula: f(x) = max(0, x)

Applied element-wise to each neuron's output

Continuing our example:


Input to ReLU: [0.9, 2.0, 3.1]
Apply ReLU: max(0, value) for each value

h₁ after ReLU = max(0, 0.9) = 0.9
h₂ after ReLU = max(0, 2.0) = 2.0  
h₃ after ReLU = max(0, 3.1) = 3.1

Output: [0.9, 2.0, 3.1] (all positive, so unchanged)
Why ReLU?

Introduces non-linearity (without it, network would be linear)

Fast computation (just max(0, x))

Helps with vanishing gradient problem

#### Line 4: Second Layer (Hidden → Output)

    nn.Linear(3, 1),   # Hidden → Output
Breaking down nn.Linear(3, 1):

3: Input features (must match output size of previous layer)

1: Number of neurons in output layer

What happens INSIDE:

Weight Matrix: Size = 1 × 3 (1 neuron with 3 weights)


Weights = [[w₁, w₂, w₃]]  # Single neuron, 3 inputs

Bias: Single value

Bias = [b]
Mathematical Operation:

y = (w₁ × h₁) + (w₂ × h₂) + (w₃ × h₃) + b
Continuing our example (random initialization):

# Suppose after initialization:
# Weights = [[0.7, 0.8, 0.9]]
# Bias = [0.4]
# Input from ReLU layer: [0.9, 2.0, 3.1]

# Calculation:
y = (0.7×0.9) + (0.8×2.0) + (0.9×3.1) + 0.4
  = 0.63 + 1.6 + 2.79 + 0.4
  = 5.42
#### Line 5: Output Activation (Sigmoid)

    nn.Sigmoid()
What Sigmoid does:

Formula: f(x) = 1 / (1 + e⁻ˣ)

Squashes output between 0 and 1

Continuing our example:

Input to Sigmoid: 5.42
Apply Sigmoid: 1 / (1 + e⁻⁵·⁴²)

Calculate:
e⁻⁵·⁴² ≈ 0.0044
1 + 0.0044 = 1.0044
1 / 1.0044 ≈ 0.9956

Final output: 0.9956
Why Sigmoid at output?

Produces values between 0 and 1

Can be interpreted as probability

Good for binary classification (yes/no, 1/0)

Line 6: Model Definition Complete
python
)
Closes the nn.Sequential container

Model is now defined but not trained yet (weights are random)



### Batch Example (Concept Only)

Concept:

Instead of one input, we process many at once.

This is called a batch.

In [None]:
print("\nBatch Forward Pass")

# Batch of 3 samples
batch_input = torch.tensor([[2.0, 3.0],        # its shape is (3,2) because we have 3 samples and each sample has 2 features that means 3 input vectors
                            [1.0, 4.0],
                            [0.5, 0.5]])

batch_output = model(batch_input)

print("Batch input shape:", batch_input.shape)
print("Batch output shape:", batch_output.shape)
print(batch_output)                              # output is a vector of size 3 (one for each sample)



Batch Forward Pass
Batch input shape: torch.Size([3, 2])
Batch output shape: torch.Size([3, 1])
tensor([[0.6329],
        [0.6117],
        [0.6239]], grad_fn=<SigmoidBackward0>)
