# Lecture 1: Neural Network Fundamentals + MNIST
## Conceptual Understanding Approach





## The Journey Begins
### From Human Vision to Machine Learning

> "How do we teach machines to recognize what we see instantly?"

Today's Goal: Build a system that can recognize handwritten digits like a human child learning to read numbers.





## The Brain Analogy

### How Does Your Brain Recognize a Number?
🧠 **Your brain**: Millions of neurons working together  
🔗 **Connections**: Neurons send signals to each other  
⚡ **Learning**: Connections strengthen with experience  
🎯 **Recognition**: Patterns emerge from experience  

### Can We Simulate This?
Yes! That's exactly what neural networks do.





## What is a Neural Network?

### Simple Analogy: A Decision Committee
- Each "neuron" is like a committee member
- They receive information (inputs)
- They make a decision (output)
- They vote on the final answer

### The Magic: Learning from Examples
Just like how you learned to recognize numbers by seeing many examples!





## Meet MNIST: The "Hello World" of Computer Vision

### Why Handwritten Digits?
✅ **Simple**: Only 10 classes (0-9)  
✅ **Visual**: We can see what's happening  
✅ **Practical**: Real-world application  
✅ **Achievable**: We can get great results  

### The Challenge
28×28 pixels = 784 numbers → "What digit is this?"





## Building Intuition: The Learning Process

### How Would You Teach a Child?
1. **Show examples**: "This is a 3, this is a 7..."
2. **Let them try**: "What number do you think this is?"
3. **Correct mistakes**: "Actually, that's a 6, not a 5"
4. **Practice more**: Repeat with many examples
5. **Test understanding**: New examples they haven't seen

**This is exactly what we do with neural networks!**





## The Architecture: 784 → 128 → 64 → 10

### Think of it as Layers of Understanding

**Layer 1 (784 inputs)**: Raw pixel values  
*"I see some dark and light spots"*

**Layer 2 (128 neurons)**: Basic patterns  
*"I see some curves and lines"*

**Layer 3 (64 neurons)**: Complex features  
*"I see number-like shapes"*

**Layer 4 (10 outputs)**: Final decision  
*"This looks most like a 3!"*





## The Learning Magic: How Do Computers Learn?

### Trial and Error, But Smarter!
1. **Make a guess**: "I think this is a 7"
2. **Check if right**: "Oops, it was actually a 2"  
3. **Adjust thinking**: "Next time I see this pattern, think 2"
4. **Repeat**: Do this millions of times, getting better each time

### The Beautiful Part
The computer figures out what patterns matter **on its own**!



In [None]:
"""
Lecture 1: Neural Network Fundamentals + MNIST Digits
A simple, refactored implementation using torchvision datasets and Accelerate.
"""
#%%
import torch
import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import torchmetrics
from accelerate import Accelerator

def load_mnist_data():
    """Load and preprocess MNIST dataset"""
    # Download the Image Dataset
    # And Convert image to numerical value tensors
    transform  = transforms.ToTensor()
    train_data = datasets.MNIST('data', train=True, download=True, transform=transform)
    test_data  = datasets.MNIST('data', train=False, transform=transform)

    # Create DataLoader function for efficient data-reading
    train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
    test_loader  = DataLoader(test_data, batch_size=64, shuffle=False)

    return train_loader, test_loader

#%%
class SimpleNeuralNet(nn.Module):
    def __init__(self):
        super().__init__()

        # Network architecture: 784 -> 128 -> 64 -> 10
        self.flatten_image = nn.Flatten()               # Convert 28x28 image to 784 pixels
        self.hidden_layer_1 = nn.Linear(in_features = 28*28, out_features = 128)       # First hidden layer
        self.relu_1 = nn.ReLU()                         # Activation function
        self.hidden_layer_2 = nn.Linear(in_features = 128, out_features = 64)        # Second hidden layer
        self.relu_2 = nn.ReLU()                         # Activation function
        self.output_layer = nn.Linear(in_features = 64, out_features = 10)           # Output layer (10 digits)

        print("Created neural network: 784 -> 128 -> 64 -> 10")

    def forward(self, single_batch):
        """Forward pass through the network"""
        x = single_batch
        x = self.flatten_image(x)                # Flatten image
        x = self.relu_1(self.hidden_layer_1(x))  # First layer + activation
        x = self.relu_2(self.hidden_layer_2(x))  # Second layer + activation
        x = self.output_layer(x)                 # Final output (no activation)
        return x

#%%
def train_model(model, train_loader, optimizer, accelerator, epochs=3):
    """Training loop with accelerate"""
    criterion = nn.CrossEntropyLoss()

    model.train()
    for epoch in range(epochs):
        total_loss = 0
        for x_batch, y_batch in train_loader:

            # Forward pass
            predictions = model(x_batch)
            loss = criterion(predictions, y_batch)

            # Backward pass
            optimizer.zero_grad()  # Clear gradients
            accelerator.backward(loss)  # Use accelerator for backward pass
            optimizer.step()       # Update weights

            total_loss += loss.item()
        avg_loss = total_loss / len(train_loader)
        accelerator.print(f'Epoch {epoch+1}: Average Loss = {total_loss/len(train_loader):.4f}')

def test_model(model, test_loader):
    """Test the recently trained model accuracy"""
    model.eval()
    correct = 0
    total = 0

    # Set Automatic gradient calculation OFF
    torch.set_grad_enabled(False)
    for x_batch, y_batch in test_loader:
        predictions = model(x_batch)
        accuracy = torchmetrics.functional.accuracy(predictions, y_batch, task='multiclass', num_classes=10)
        correct += accuracy * len(y_batch)
        total += len(y_batch)
    # Set Automatic gradient calculation Back On
    torch.set_grad_enabled(True)

    print(f'Test Accuracy: {(correct/total)*100:.2f}%')

#%%
if __name__ == "__main__":
    # Initialize Accelerator
    accelerator = Accelerator()

    # Step 1: Load data
    train_loader, test_loader = load_mnist_data()

    # Step 2: Create model and optimizer
    model = SimpleNeuralNet()
    PARAMETERS_TO_OPTIMIZE = model.parameters()
    optimizer = torch.optim.SGD(params = PARAMETERS_TO_OPTIMIZE, lr=0.01)
    
    # Prepare everything with Accelerate
    model, optimizer, train_loader, test_loader = accelerator.prepare(model, optimizer, train_loader, test_loader)

    accelerator.print(f'Model has {sum(p.numel() for p in model.parameters())} parameters')

    # Step 3: Train the model
    train_model(model, train_loader, optimizer, accelerator)

    # Step 4: Test the model
    test_model(model, test_loader)


## Future Improvements
# TODO:ajinkyak: Simple Trainer Function, custom mix of features inspired by lightning. Shift divice management to accelerate
# TODO:ajinkyak: Flag: Overfit one batch.



## PyTorch: Our Learning Toolkit

### Why PyTorch?
🔧 **Easy to use**: Write code that looks like math  
🚀 **Powerful**: Handles the hard stuff automatically  
🧪 **Flexible**: Great for experimentation  
🌍 **Popular**: Used by researchers and companies worldwide  

### The Magic Word: `autograd`
PyTorch automatically figures out how to learn from mistakes!

#### Pytorch library
- For writing neural networks
- For adjusting knowledge of neural networks, to reduce error. 


In [None]:
#%%
import torch

# EXAMPLE 1
# Create a tensor with gradient tracking enabled
x = torch.tensor(2.0, requires_grad=True)

# Define a simple function: y = x^2 + 3x + 1
y = x**2 + 3*x + 1

# Backpropagate (compute dy/dx)
y.backward()

# Gradient is stored in x.grad
print(f"x: {x.item()}")
print(f"y: {y.item()}")
print(f"dy/dx: {x.grad.item()}")

#%%
# EXAMPLE 2
# A vector of inputs
x = torch.randn(3, requires_grad=True)

# A simple function: y = sum(x^2)
y = (x**2).sum()

# Compute gradients
y.backward()

print("x:", x)
print("Gradient dy/dx:", x.grad)


# Why Autograd Matters
# Neural networks: Training requires gradients of the loss w.r.t. millions of parameters → Autograd handles this automatically.
# Efficiency: Optimized C++ backend with GPU acceleration.
# Integration: Works seamlessly with torch.optim for gradient-based optimization.


#%%
from torch.autograd import grad

x1 = torch.tensor(2, requires_grad=True, dtype=torch.float16)
x2 = torch.tensor(3, requires_grad=True, dtype=torch.float16)
x3 = torch.tensor(1, requires_grad=True, dtype=torch.float16)
x4 = torch.tensor(4, requires_grad=True, dtype=torch.float16)

x1, x2, x3, x4

f = x1 * x2 + x3 * x4

# f = x1 * x2 + x3 * x4
# f = 2 * 3 + 1 * 4
# df_dx1 = 3
# df_dx4 = 1

df_dx = grad(outputs = f, inputs = [x1, x2, x3, x4])
print(f'gradient of x1 = {df_dx[0]}')
print(f'gradient of x2 = {df_dx[1]}')
print(f'gradient of x3 = {df_dx[2]}')
print(f'gradient of x4 = {df_dx[3]}')

#%%
from torch.autograd import grad

x1 = torch.tensor(2, requires_grad=True, dtype=torch.float16)
x2 = torch.tensor(3, requires_grad=True, dtype=torch.float16)
x3 = torch.tensor(1, requires_grad=True, dtype=torch.float16)
x4 = torch.tensor(4, requires_grad=True, dtype=torch.float16)

x1, x2, x3, x4

f = x1 * x2 + x3 * x4

# f = x1 * x2 + x3 * x4
# f = 2 * 3 + 1 * 4
# df_dx1 = 3
# df_dx4 = 1

df_dx = grad(outputs = f, inputs = [x1, x2, x3, x4])
print(f'gradient of x1 = {df_dx[0]}')
print(f'gradient of x2 = {df_dx[1]}')
print(f'gradient of x3 = {df_dx[2]}')
print(f'gradient of x4 = {df_dx[3]}')



## Training: The Learning Journey

### What Happens During Training?

**Epoch 1**: "I'm just guessing randomly" (10% accuracy)  
**Epoch 5**: "I'm starting to see patterns" (50% accuracy)  
**Epoch 10**: "I'm getting pretty good!" (90% accuracy)  
**Epoch 20**: "I'm as good as a human!" (95%+ accuracy)  

### Watching Your Model Learn
It's like watching a child learn to read numbers!





## Success Metrics: How Do We Know It's Working?

### The Report Card
📊 **Accuracy**: What percentage did we get right?  
📈 **Loss**: How "confident" are our wrong answers?  
⏱️ **Speed**: How fast can we make decisions?  
🎯 **Consistency**: Do we perform well on new data?  

### Our Goal Today
Get our model to >95% accuracy on MNIST!





## Common Challenges & Solutions

### "My Model Isn't Learning!"
- Check your learning rate (not too fast, not too slow)
- Make sure data is properly normalized
- Verify your loss function is decreasing

### "It Works on Training but Not Test Data!"
- This is called "overfitting" - memorizing instead of learning
- Solution: More data, simpler model, or regularization





## The Bigger Picture: Why This Matters

### MNIST is Just the Beginning
Today: Recognize handwritten digits  
Tomorrow: Detect diseases in X-rays  
Next week: Generate beautiful art  
Next month: Understand natural language  

### You're Learning Universal Principles
The concepts you learn today apply to **all** of deep learning!





## Hands-On Lab Preview

### What You'll Build Today
1. **Load MNIST data**: Get our "textbook" of examples
2. **Create the network**: Build our "brain" 
3. **Train the model**: Let it learn from examples
4. **Test performance**: See how well it learned
5. **Visualize results**: See what it got right and wrong

### Expected Experience
😅 First run: "It's not working!"  
🤔 After debugging: "Oh, I see what's wrong"  
😊 Final result: "Wow, it really works!"  





## Key Takeaways

### What You'll Understand After Today
🎯 **Core Concept**: Neural networks learn by example  
🧠 **Architecture**: Layers build up understanding  
📚 **Training**: Iterative improvement through feedback  
🔧 **PyTorch**: Practical tool for implementation  
📊 **Evaluation**: How to measure success  

### Most Important Insight
**Computers can learn to recognize patterns just like humans do - through experience and practice!**

