# PyTorch Tutorial: Building Neural Networks

Now that you understand tensors and gradients, it's time to build your first neural network! This notebook covers the building blocks of neural networks in PyTorch.

## Learning Objectives

By the end of this notebook, you will:
- Understand what neural networks are and how they work
- Learn to use `nn.Module` to create custom networks
- Understand different types of layers (Linear, activation functions, etc.)
- Build your first neural network from scratch
- Understand the forward pass through a network

---

## What is a Neural Network?

A **neural network** is a series of connected layers that transform input data into output predictions. Think of it as a pipeline:

**Input â†’ Layer 1 â†’ Layer 2 â†’ ... â†’ Layer N â†’ Output**

Each layer:
1. Takes input data
2. Applies a transformation (usually: multiply by weights, add bias, apply activation)
3. Produces output that becomes input to the next layer

### Key Components:
- **Weights**: Parameters that the network learns
- **Biases**: Additional parameters for each layer
- **Activation Functions**: Non-linear functions that allow networks to learn complex patterns
- **Layers**: Building blocks that perform specific operations



## Setting Up

Let's import the necessary modules:


In [None]:
import torch
import torch.nn as nn  # nn module contains all neural network components
import torch.nn.functional as F  # Functional interface for operations
import matplotlib.pyplot as plt
import numpy as np

# Set random seed for reproducibility
torch.manual_seed(42)

print("PyTorch version:", torch.__version__)


## Understanding nn.Module

`nn.Module` is the base class for all neural network components in PyTorch. It provides:
- Automatic gradient tracking for all parameters
- Easy parameter management
- Device management (CPU/GPU)
- Model saving and loading capabilities

Let's create our first simple network:


In [None]:
# Create a simple neural network class
class SimpleNet(nn.Module):
    """
    A simple neural network with one hidden layer.
    
    Architecture:
    Input (2 features) â†’ Hidden Layer (3 neurons) â†’ Output (1 value)
    """
    
    def __init__(self):
        # Call parent class constructor
        super(SimpleNet, self).__init__()
        
        # Define layers
        # nn.Linear(input_size, output_size)
        # This creates: output = input @ weight.T + bias
        self.hidden = nn.Linear(2, 3)  # 2 inputs â†’ 3 outputs
        self.output = nn.Linear(3, 1)   # 3 inputs â†’ 1 output
    
    def forward(self, x):
        """
        Define the forward pass (how data flows through the network).
        This is called automatically when you do: model(input)
        """
        # Pass through hidden layer
        x = self.hidden(x)  # Shape: (batch, 2) â†’ (batch, 3)
        
        # Apply activation function (we'll learn about these next)
        x = torch.relu(x)  # ReLU: max(0, x)
        
        # Pass through output layer
        x = self.output(x)  # Shape: (batch, 3) â†’ (batch, 1)
        
        return x

# Create an instance of our network
model = SimpleNet()

print("Model created!")
print("Model structure:")
print(model)
print()

# Let's see what parameters the model has
print("Model parameters:")
for name, param in model.named_parameters():
    print(f"{name}: shape {param.shape}, requires_grad={param.requires_grad}")


### Testing Our Network

Let's pass some data through our network:


In [None]:
# Create some sample input data
# Shape: (batch_size, input_features)
# batch_size = 3 (3 samples), input_features = 2
x = torch.tensor([[1.0, 2.0],
                  [3.0, 4.0],
                  [5.0, 6.0]])

print("Input shape:", x.shape)
print("Input data:")
print(x)
print()

# Forward pass through the network
# This automatically calls the forward() method
output = model(x)

print("Output shape:", output.shape)
print("Output data:")
print(output)
print()

# Note: The network hasn't been trained yet, so outputs are random!
# We'll learn how to train networks in the next notebook.


## Understanding Layers

Let's explore the most common types of layers in PyTorch:


### 1. Linear Layer (Fully Connected / Dense Layer)

A Linear layer performs: `output = input @ weight.T + bias`

This is the most common layer type in neural networks.


In [None]:
# Create a linear layer
# nn.Linear(in_features, out_features)
linear = nn.Linear(in_features=4, out_features=3)

print("Linear layer:")
print(linear)
print()

# Check the parameters
print("Weight shape:", linear.weight.shape)  # (out_features, in_features) = (3, 4)
print("Bias shape:", linear.bias.shape)      # (out_features,) = (3,)
print()

# Create some input data
x = torch.randn(2, 4)  # batch_size=2, features=4
print("Input shape:", x.shape)
print("Input:")
print(x)
print()

# Forward pass
output = linear(x)
print("Output shape:", output.shape)  # (batch_size, out_features) = (2, 3)
print("Output:")
print(output)
print()

# Manual calculation (for understanding):
# output = x @ weight.T + bias
manual_output = x @ linear.weight.T + linear.bias
print("Manual calculation matches:", torch.allclose(output, manual_output))


### 2. Activation Functions

Activation functions introduce **non-linearity** into neural networks. Without them, multiple layers would be equivalent to a single layer!

Common activation functions:


In [None]:
# Create some input values
x = torch.linspace(-5, 5, 100)

# ReLU (Rectified Linear Unit): max(0, x)
# Most common activation function
relu = torch.relu(x)

# Sigmoid: 1 / (1 + exp(-x))
# Outputs between 0 and 1
sigmoid = torch.sigmoid(x)

# Tanh: (exp(x) - exp(-x)) / (exp(x) + exp(-x))
# Outputs between -1 and 1
tanh = torch.tanh(x)

# Visualize activation functions
plt.figure(figsize=(12, 4))

plt.subplot(1, 3, 1)
plt.plot(x.numpy(), relu.numpy(), 'b-', linewidth=2)
plt.title('ReLU: max(0, x)', fontsize=12)
plt.xlabel('x', fontsize=10)
plt.ylabel('y', fontsize=10)
plt.grid(True, alpha=0.3)
plt.axhline(y=0, color='k', linestyle='--', linewidth=0.5)
plt.axvline(x=0, color='k', linestyle='--', linewidth=0.5)

plt.subplot(1, 3, 2)
plt.plot(x.numpy(), sigmoid.numpy(), 'r-', linewidth=2)
plt.title('Sigmoid: 1/(1+exp(-x))', fontsize=12)
plt.xlabel('x', fontsize=10)
plt.ylabel('y', fontsize=10)
plt.grid(True, alpha=0.3)
plt.axhline(y=0.5, color='k', linestyle='--', linewidth=0.5)
plt.axvline(x=0, color='k', linestyle='--', linewidth=0.5)

plt.subplot(1, 3, 3)
plt.plot(x.numpy(), tanh.numpy(), 'g-', linewidth=2)
plt.title('Tanh', fontsize=12)
plt.xlabel('x', fontsize=10)
plt.ylabel('y', fontsize=10)
plt.grid(True, alpha=0.3)
plt.axhline(y=0, color='k', linestyle='--', linewidth=0.5)
plt.axvline(x=0, color='k', linestyle='--', linewidth=0.5)

plt.tight_layout()
plt.show()

print("Key differences:")
print("- ReLU: Simple, fast, most common. Outputs 0 for negative inputs.")
print("- Sigmoid: Smooth, outputs 0-1. Good for probabilities.")
print("- Tanh: Smooth, outputs -1 to 1. Similar to sigmoid but centered at 0.")


### 3. Sequential: Building Networks Easily

`nn.Sequential` allows you to build networks without writing a class. It's great for simple, linear architectures:


In [None]:
# Build a network using Sequential
# This is equivalent to our SimpleNet but simpler to write
sequential_model = nn.Sequential(
    nn.Linear(2, 3),      # Input layer: 2 â†’ 3
    nn.ReLU(),            # Activation
    nn.Linear(3, 1)       # Output layer: 3 â†’ 1
)

print("Sequential model:")
print(sequential_model)
print()

# Test it
x = torch.tensor([[1.0, 2.0]])
output = sequential_model(x)
print("Input:", x)
print("Output:", output)
print()

# Access individual layers
print("First layer (index 0):", sequential_model[0])
print("First layer weight shape:", sequential_model[0].weight.shape)


## Building a More Complex Network

Let's build a network for a classification problem (e.g., classifying images):


In [None]:
class ClassificationNet(nn.Module):
    """
    A neural network for classification.
    
    Architecture:
    Input (784 features, e.g., flattened 28x28 image)
    â†’ Hidden Layer 1 (128 neurons) + ReLU
    â†’ Hidden Layer 2 (64 neurons) + ReLU
    â†’ Output Layer (10 classes, e.g., digits 0-9) + Softmax
    """
    
    def __init__(self, input_size=784, hidden1=128, hidden2=64, num_classes=10):
        super(ClassificationNet, self).__init__()
        
        # Define layers
        self.fc1 = nn.Linear(input_size, hidden1)  # Fully connected layer 1
        self.fc2 = nn.Linear(hidden1, hidden2)       # Fully connected layer 2
        self.fc3 = nn.Linear(hidden2, num_classes)   # Output layer
        
        # Note: We don't define ReLU here because we'll use F.relu in forward()
        # This is a common pattern
    
    def forward(self, x):
        # Flatten input if needed (for images)
        # x shape: (batch, height, width) â†’ (batch, height*width)
        x = x.view(x.size(0), -1)  # -1 means "figure out this dimension"
        
        # Layer 1
        x = self.fc1(x)
        x = F.relu(x)  # Apply ReLU activation
        
        # Layer 2
        x = self.fc2(x)
        x = F.relu(x)
        
        # Output layer (no activation here - we'll apply softmax during loss calculation)
        x = self.fc3(x)
        
        return x

# Create the model
model = ClassificationNet()
print("Classification Network:")
print(model)
print()

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")
print()

# Test with dummy data
dummy_input = torch.randn(5, 784)  # 5 samples, 784 features
output = model(dummy_input)
print(f"Input shape: {dummy_input.shape}")
print(f"Output shape: {output.shape}")  # (5, 10) - 5 samples, 10 class scores
print(f"Output (class scores):")
print(output)


## Understanding the Forward Pass

Let's trace through what happens during a forward pass:


In [None]:
# Create a simple example to trace through
simple_model = nn.Sequential(
    nn.Linear(3, 4),
    nn.ReLU(),
    nn.Linear(4, 2)
)

# Create input
x = torch.tensor([[1.0, 2.0, 3.0]])
print("Input:", x)
print("Input shape:", x.shape)
print()

# Trace through each layer
print("Forward pass:")
print("-" * 50)

# After first linear layer
x1 = simple_model[0](x)
print(f"After Linear(3â†’4): {x1}")
print(f"  Shape: {x1.shape}")

# After ReLU
x2 = simple_model[1](x1)
print(f"After ReLU: {x2}")
print(f"  Shape: {x2.shape}")

# After second linear layer
x3 = simple_model[2](x2)
print(f"After Linear(4â†’2): {x3}")
print(f"  Shape: {x3.shape}")
print()

# Compare with direct forward pass
direct_output = simple_model(x)
print("Direct forward pass output:", direct_output)
print("Matches:", torch.allclose(x3, direct_output))


## Common Layer Types (Reference)

Here's a quick reference of other common layers you'll encounter:


In [None]:
# Dropout: Randomly sets some neurons to zero during training (prevents overfitting)
dropout = nn.Dropout(p=0.5)  # 50% dropout rate
x = torch.randn(3, 4)
print("Before dropout:", x)
print("After dropout (training):", dropout(x))  # Some values become 0
print()

# Batch Normalization: Normalizes inputs to each layer (helps training)
batch_norm = nn.BatchNorm1d(4)  # For 1D data with 4 features
x = torch.randn(3, 4)  # (batch, features)
print("Before batch norm:", x)
print("After batch norm:", batch_norm(x))
print()

# Note: We'll learn about Conv2d (for images) and other layers in later notebooks


## Practice Exercises

### Exercise 1: Build a Simple Network
Create a network with:
- Input: 10 features
- Hidden layer: 20 neurons with ReLU
- Output: 5 classes

### Exercise 2: Count Parameters
For the network in Exercise 1, calculate:
- Number of weights in the first layer
- Number of biases in the first layer
- Total parameters

### Exercise 3: Forward Pass
Create input data of shape (3, 10) and pass it through your network. What's the output shape?


## Solutions to Exercises

### Exercise 1 Solution


In [None]:
# Exercise 1 Solution
class ExerciseNet(nn.Module):
    def __init__(self):
        super(ExerciseNet, self).__init__()
        self.fc1 = nn.Linear(10, 20)
        self.fc2 = nn.Linear(20, 5)
    
    def forward(self, x):
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        return x

model = ExerciseNet()
print("Network:")
print(model)


### Exercise 2 Solution


In [None]:
# Exercise 2 Solution
print("First layer (fc1):")
print(f"  Weight shape: {model.fc1.weight.shape}")  # (20, 10)
print(f"  Number of weights: {model.fc1.weight.numel()}")  # 20 * 10 = 200
print(f"  Bias shape: {model.fc1.bias.shape}")  # (20,)
print(f"  Number of biases: {model.fc1.bias.numel()}")  # 20
print()

total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params}")
print(f"  Breakdown: (10*20 + 20) + (20*5 + 5) = 200 + 20 + 100 + 5 = 325")


### Exercise 3 Solution


In [None]:
# Exercise 3 Solution
x = torch.randn(3, 10)  # 3 samples, 10 features
print("Input shape:", x.shape)

output = model(x)
print("Output shape:", output.shape)  # (3, 5) - 3 samples, 5 class scores
print("Output:")
print(output)


## Key Takeaways

1. **nn.Module**: Base class for all neural networks in PyTorch
2. **forward()**: Defines how data flows through the network
3. **nn.Linear**: Fully connected layer (most common layer type)
4. **Activation Functions**: Introduce non-linearity (ReLU, sigmoid, tanh)
5. **nn.Sequential**: Easy way to build simple linear networks
6. **Parameters**: Weights and biases are automatically tracked for gradient computation
7. **Forward Pass**: Simply call `model(input)` to get predictions

## What's Next?

In the next notebook, we'll learn about:
- **Training Neural Networks**: How to train your models
- **Loss Functions**: Measuring how wrong predictions are
- **Optimizers**: Algorithms that update network parameters
- **Training Loop**: The complete process of training a model

Now that you can build networks, it's time to teach them to learn!

---

**Excellent work! You can now build neural networks! ðŸŽ‰**
