# üéì Week 9 - Day 3: Introduction to PyTorch

## üìã Today's Learning Goals:

By the end of this notebook, you will be able to:

‚úÖ **Create and manipulate PyTorch tensors** (creation, indexing, reshaping)

‚úÖ **Build a small neural network** in PyTorch

‚úÖ **Understand the training loop** (forward, loss, backward, step)

---

## üî• What is PyTorch?

PyTorch is a **deep learning framework** developed by Facebook AI Research that:
- Makes building neural networks easy and intuitive
- Automatically computes gradients (no manual calculus!)
- Runs on GPUs for fast training
- Has a Pythonic API that feels natural

**Why PyTorch?**
- Used by 95% of deep learning researchers
- Powers products at Meta, Tesla, OpenAI, and more
- Easier to debug than other frameworks
- Huge community and ecosystem

---

## üìñ Notebook Structure:

1. **Part 1:** Setup and Installation
2. **Part 2:** Tensors in PyTorch (creation, indexing, reshaping)
3. **Part 3:** Building a Small Neural Network
4. **Part 4:** Understanding the Training Loop
5. **Part 5:** Complete Training Example
6. **Part 6:** Challenge Exercise

---
---

# üì¶ PART 1: Setup and Installation

## üéØ What we'll do:
- Import PyTorch and other necessary libraries
- Check PyTorch version
- Verify GPU availability (if applicable)
- Set random seeds for reproducibility

## ü§î Why set random seeds?
Setting random seeds ensures that you get the **same results every time** you run the code. This is crucial for:
- Debugging (if something goes wrong, you can reproduce it)
- Comparing results with classmates
- Scientific reproducibility

In [None]:
# ============================================
# IMPORT LIBRARIES
# ============================================

import torch                    # Main PyTorch library
import torch.nn as nn           # Neural network modules
import torch.optim as optim     # Optimization algorithms (SGD, Adam, etc.)
import numpy as np              # For numerical operations
import matplotlib.pyplot as plt # For visualization

# ============================================
# SET RANDOM SEEDS FOR REPRODUCIBILITY
# ============================================
# Why? So everyone gets the same results!

torch.manual_seed(42)    # PyTorch random seed
np.random.seed(42)       # NumPy random seed

# ============================================
# CHECK PYTORCH VERSION AND GPU AVAILABILITY
# ============================================

print("="*60)
print("üî• PYTORCH SETUP INFORMATION")
print("="*60)
print(f"‚úÖ PyTorch Version: {torch.__version__}")
print(f"‚úÖ CUDA Available (GPU): {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"üéÆ GPU Device: {torch.cuda.get_device_name(0)}")
else:
    print("üíª Running on CPU (no GPU detected)")

print(f"‚úÖ Random seeds set to 42 for reproducibility")
print("="*60)
print("\nüöÄ Ready to learn PyTorch!\n")

---
---

# üé≤ PART 2: Tensors in PyTorch

## üéØ Learning Objectives:
1. Understand what tensors are
2. Create tensors in multiple ways
3. Index and slice tensors
4. Reshape tensors

## ü§î What is a Tensor?

A **tensor** is a multi-dimensional array. Think of it as:
- **0D tensor** (scalar): Just a number ‚Üí `5`
- **1D tensor** (vector): A list of numbers ‚Üí `[1, 2, 3]`
- **2D tensor** (matrix): A table of numbers ‚Üí `[[1,2], [3,4]]`
- **3D tensor**: A cube of numbers (like RGB images)
- **4D tensor**: A batch of images

Tensors are similar to NumPy arrays but:
- Can run on GPUs (much faster!)
- Have automatic differentiation (autograd)
- Are optimized for deep learning

---

## üìù Section 2.1: Creating Tensors

Let's learn different ways to create tensors!

In [None]:
# ============================================
# METHOD 1: CREATE TENSOR FROM PYTHON LIST
# ============================================
# This is the most intuitive way to create a tensor

# Create a 1D tensor (vector)
tensor_1d = torch.tensor([1, 2, 3, 4, 5])

print("üìä 1D Tensor (Vector):")
print(tensor_1d)
print(f"Shape: {tensor_1d.shape}")      # Shape tells us dimensions
print(f"Data type: {tensor_1d.dtype}")  # Data type (int64, float32, etc.)
print(f"Device: {tensor_1d.device}")    # CPU or GPU
print()

# Create a 2D tensor (matrix)
tensor_2d = torch.tensor([[1, 2, 3],
                          [4, 5, 6]])

print("üìä 2D Tensor (Matrix):")
print(tensor_2d)
print(f"Shape: {tensor_2d.shape}")      # (2, 3) = 2 rows, 3 columns
print(f"Data type: {tensor_2d.dtype}")
print()

# Create a 3D tensor
tensor_3d = torch.tensor([[[1, 2], [3, 4]],
                          [[5, 6], [7, 8]]])

print("üìä 3D Tensor:")
print(tensor_3d)
print(f"Shape: {tensor_3d.shape}")      # (2, 2, 2)
print()

In [None]:
# ============================================
# METHOD 2: CREATE TENSORS WITH SPECIFIC VALUES
# ============================================

print("üîπ Creating Special Tensors:\n")

# Zeros tensor (all elements are 0)
# Useful for: Initializing accumulators, padding, etc.
zeros = torch.zeros(3, 4)  # 3 rows, 4 columns
print("Zeros Tensor (3x4):")
print(zeros)
print(f"Shape: {zeros.shape}\n")

# Ones tensor (all elements are 1)
# Useful for: Masks, weights initialization, etc.
ones = torch.ones(2, 3)
print("Ones Tensor (2x3):")
print(ones)
print(f"Shape: {ones.shape}\n")

# Random tensor (values between 0 and 1)
# Useful for: Weight initialization
random_uniform = torch.rand(2, 2)  # Uniform distribution [0, 1)
print("Random Tensor - Uniform [0,1):")
print(random_uniform)
print()

# Random normal tensor (mean=0, std=1)
# Useful for: Weight initialization in neural networks
random_normal = torch.randn(2, 2)  # Normal distribution
print("Random Tensor - Normal Distribution:")
print(random_normal)
print()

# Range tensor (similar to Python's range)
# Useful for: Creating sequences, indexing
range_tensor = torch.arange(0, 10, 2)  # Start=0, End=10, Step=2
print("Range Tensor (0 to 10, step 2):")
print(range_tensor)
print()

# Identity matrix (diagonal of ones)
# Useful for: Linear algebra operations
identity = torch.eye(3)  # 3x3 identity matrix
print("Identity Matrix (3x3):")
print(identity)
print()

---

## üìù Section 2.2: Tensor Properties and Data Types

Understanding tensor properties is crucial for debugging and optimization.

In [None]:
# ============================================
# UNDERSTANDING TENSOR PROPERTIES
# ============================================

# Create a sample tensor
sample = torch.randn(3, 4)  # 3 rows, 4 columns of random numbers

print("üîç Tensor Properties:\n")
print(f"Tensor:\n{sample}\n")

# Shape: Dimensions of the tensor
print(f"üìê Shape: {sample.shape}")           # torch.Size([3, 4])
print(f"   ‚Üí 3 rows, 4 columns\n")

# Size: Same as shape (alternative method)
print(f"üìê Size: {sample.size()}")           # Same as shape
print()

# Number of dimensions
print(f"üìä Number of dimensions (ndim): {sample.ndim}")
print(f"   ‚Üí This is a 2D tensor\n")

# Total number of elements
print(f"üî¢ Total elements (numel): {sample.numel()}")
print(f"   ‚Üí 3 √ó 4 = 12 elements\n")

# Data type
print(f"üéØ Data type (dtype): {sample.dtype}")
print(f"   ‚Üí float32 is the default for neural networks\n")

# Device (CPU or GPU)
print(f"üíª Device: {sample.device}")
print(f"   ‚Üí Currently on CPU\n")

In [None]:
# ============================================
# WORKING WITH DIFFERENT DATA TYPES
# ============================================
# PyTorch supports many data types, but we mainly use:
# - torch.float32 (default): For neural network weights
# - torch.int64: For labels/indices
# - torch.bool: For masks

print("üé® Different Data Types:\n")

# Float tensor (default)
float_tensor = torch.tensor([1.0, 2.0, 3.0])
print(f"Float: {float_tensor}")
print(f"dtype: {float_tensor.dtype}\n")

# Integer tensor
int_tensor = torch.tensor([1, 2, 3])
print(f"Integer: {int_tensor}")
print(f"dtype: {int_tensor.dtype}\n")

# Convert between types using .to()
# This is very common when preparing data!
converted = int_tensor.to(torch.float32)
print(f"Converted to float: {converted}")
print(f"dtype: {converted.dtype}\n")

# Or use specific constructors
float_tensor2 = torch.FloatTensor([1, 2, 3])
long_tensor = torch.LongTensor([1, 2, 3])
print(f"FloatTensor: {float_tensor2}, dtype: {float_tensor2.dtype}")
print(f"LongTensor: {long_tensor}, dtype: {long_tensor.dtype}")

---

## üìù Section 2.3: Indexing and Slicing Tensors

Indexing works just like NumPy arrays and Python lists!

**Remember:**
- Python uses **0-based indexing** (first element is at index 0)
- Negative indices count from the end (-1 is the last element)
- Slicing syntax: `[start:end:step]`

In [None]:
# ============================================
# INDEXING 1D TENSORS (VECTORS)
# ============================================

# Create a 1D tensor
vector = torch.tensor([10, 20, 30, 40, 50])
print("Original Vector:")
print(vector)
print()

# Access single elements
print("üîπ Single Element Access:")
print(f"First element (index 0): {vector[0]}")
print(f"Third element (index 2): {vector[2]}")
print(f"Last element (index -1): {vector[-1]}")
print(f"Second to last (index -2): {vector[-2]}")
print()

# Slicing: [start:end] - end is NOT included
print("üîπ Slicing:")
print(f"First 3 elements [0:3]: {vector[0:3]}")      # [10, 20, 30]
print(f"From index 2 to end [2:]: {vector[2:]}")     # [30, 40, 50]
print(f"Up to index 3 [:3]: {vector[:3]}")           # [10, 20, 30]
print(f"Every other element [::2]: {vector[::2]}")   # [10, 30, 50]
print(f"Reverse [::-1]: {vector[::-1]}")             # [50, 40, 30, 20, 10]
print()

In [None]:
# ============================================
# INDEXING 2D TENSORS (MATRICES)
# ============================================

# Create a 2D tensor (3 rows, 4 columns)
matrix = torch.tensor([[1,  2,  3,  4],
                       [5,  6,  7,  8],
                       [9, 10, 11, 12]])

print("Original Matrix (3x4):")
print(matrix)
print(f"Shape: {matrix.shape}\n")

# Access single element: [row, column]
print("üîπ Single Element Access:")
print(f"Element at row 0, col 0: {matrix[0, 0]}")     # 1
print(f"Element at row 1, col 2: {matrix[1, 2]}")     # 7
print(f"Last element: {matrix[-1, -1]}")              # 12
print()

# Access entire rows
print("üîπ Row Access:")
print(f"First row [0, :]: {matrix[0, :]}")            # [1, 2, 3, 4]
print(f"Second row [1]: {matrix[1]}")                 # [5, 6, 7, 8]
print(f"Last row [-1]: {matrix[-1]}\n")               # [9, 10, 11, 12]

# Access entire columns
print("üîπ Column Access:")
print(f"First column [:, 0]: {matrix[:, 0]}")         # [1, 5, 9]
print(f"Third column [:, 2]: {matrix[:, 2]}")         # [3, 7, 11]
print(f"Last column [:, -1]: {matrix[:, -1]}\n")      # [4, 8, 12]

# Slicing: Get sub-matrices
print("üîπ Sub-matrix Slicing:")
print(f"Top-left 2x2 [:2, :2]:\n{matrix[:2, :2]}\n")  # [[1,2], [5,6]]
print(f"Bottom-right 2x2 [1:, 2:]:\n{matrix[1:, 2:]}\n")  # [[7,8], [11,12]]
print(f"Middle 2x2 [1:3, 1:3]:\n{matrix[1:3, 1:3]}\n")    # [[6,7], [10,11]]

---

## üìù Section 2.4: Reshaping Tensors

**Why reshape?**
- Neural networks expect specific input shapes
- Images need to be flattened for fully-connected layers
- Batching data requires adding/removing dimensions

**Key methods:**
- `.view()`: Returns a new view of the same data (fast, memory-efficient)
- `.reshape()`: More flexible, may copy data if needed
- `.squeeze()`: Removes dimensions of size 1
- `.unsqueeze()`: Adds a dimension of size 1

In [None]:
# ============================================
# RESHAPING WITH .view() AND .reshape()
# ============================================

# Create a tensor with 12 elements
original = torch.arange(12)  # [0, 1, 2, ..., 11]
print("Original Tensor:")
print(original)
print(f"Shape: {original.shape}\n")

# Reshape to 3x4 (3 rows, 4 columns)
reshaped_3x4 = original.view(3, 4)
print("Reshaped to 3x4:")
print(reshaped_3x4)
print(f"Shape: {reshaped_3x4.shape}\n")

# Reshape to 2x6
reshaped_2x6 = original.view(2, 6)
print("Reshaped to 2x6:")
print(reshaped_2x6)
print(f"Shape: {reshaped_2x6.shape}\n")

# Use -1 to let PyTorch calculate the dimension automatically
# Rule: Total elements must remain the same!
reshaped_auto = original.view(4, -1)  # 4 rows, PyTorch calculates 3 columns
print("Reshaped to 4x? (auto-calculated):")
print(reshaped_auto)
print(f"Shape: {reshaped_auto.shape}")
print(f"PyTorch calculated: 12 √∑ 4 = 3 columns\n")

# Flatten to 1D (common operation before fully-connected layers)
flattened = original.view(-1)  # -1 means "flatten everything"
print("Flattened to 1D:")
print(flattened)
print(f"Shape: {flattened.shape}\n")

In [None]:
# ============================================
# ADDING AND REMOVING DIMENSIONS
# ============================================

# Create a 2D tensor
tensor_2d = torch.tensor([[1, 2, 3],
                          [4, 5, 6]])
print("Original 2D Tensor:")
print(tensor_2d)
print(f"Shape: {tensor_2d.shape}\n")

# Add dimension at position 0 (create a batch dimension)
# This is very common when feeding single samples to a model!
unsqueezed_0 = tensor_2d.unsqueeze(0)
print("After unsqueeze(0) - Added batch dimension:")
print(unsqueezed_0)
print(f"Shape: {unsqueezed_0.shape}")  # (1, 2, 3)
print(f"Interpretation: 1 batch, 2 rows, 3 columns\n")

# Add dimension at position 1
unsqueezed_1 = tensor_2d.unsqueeze(1)
print("After unsqueeze(1) - Added dimension in middle:")
print(unsqueezed_1)
print(f"Shape: {unsqueezed_1.shape}\n")  # (2, 1, 3)

# Remove dimensions of size 1
tensor_with_ones = torch.randn(1, 3, 1, 4)  # Shape: (1, 3, 1, 4)
print(f"Tensor with extra dimensions: {tensor_with_ones.shape}")

squeezed = tensor_with_ones.squeeze()  # Remove all dimensions of size 1
print(f"After squeeze(): {squeezed.shape}")  # (3, 4)
print("Removed dimensions of size 1\n")

In [None]:
# ============================================
# PRACTICAL EXAMPLE: IMAGE PREPROCESSING
# ============================================
# This is exactly what happens when you feed images to a neural network!

print("üñºÔ∏è Practical Example: Preparing an Image for a Neural Network\n")

# Simulate a grayscale image: 28x28 pixels (like MNIST digits)
image = torch.randn(28, 28)
print(f"1. Original image shape: {image.shape}")
print(f"   ‚Üí This is a 2D grayscale image\n")

# Step 1: Flatten the image for a fully-connected layer
flattened_image = image.view(-1)  # Convert to 1D
print(f"2. Flattened image: {flattened_image.shape}")
print(f"   ‚Üí 28 √ó 28 = {flattened_image.shape[0]} pixels\n")

# Step 2: Add batch dimension (neural networks expect batches)
batched_image = flattened_image.unsqueeze(0)
print(f"3. Added batch dimension: {batched_image.shape}")
print(f"   ‚Üí Shape is now (batch_size=1, features=784)\n")

print("‚úÖ This image is now ready to be fed into a neural network!")
print("üí° Neural networks always expect batch dimension first: (batch, features)")

---
---

# üß† PART 3: Building a Small Neural Network

## üéØ What we'll learn:
- How to define a neural network class in PyTorch
- Understanding `nn.Module` (the base class for all networks)
- Creating layers with `nn.Linear`
- Implementing the forward pass

## üèóÔ∏è Neural Network Architecture:

We'll build a simple 2-layer network:

```
Input (4 features)
      ‚Üì
Linear Layer (4 ‚Üí 8 neurons)
      ‚Üì
ReLU Activation (non-linearity)
      ‚Üì
Linear Layer (8 ‚Üí 1 neuron)
      ‚Üì
Output (prediction)
```

This is a **regression network** (predicts a continuous value).

---

## üìù Section 3.1: Understanding nn.Module

**Every PyTorch neural network is a class that inherits from `nn.Module`.**

You must implement two methods:
1. `__init__()`: Define your layers here
2. `forward()`: Define how data flows through layers

In [None]:
# ============================================
# DEFINE A SIMPLE NEURAL NETWORK
# ============================================

class SimpleNet(nn.Module):
    """
    A simple 2-layer neural network for regression.
    
    Architecture:
    - Input layer: 4 features
    - Hidden layer: 8 neurons + ReLU activation
    - Output layer: 1 neuron (for prediction)
    
    Why this architecture?
    - Simple enough to understand
    - Complex enough to learn patterns
    - Similar to networks used in real projects
    """
    
    def __init__(self, input_size=4, hidden_size=8, output_size=1):
        """
        Initialize the network layers.
        
        Args:
            input_size (int): Number of input features
            hidden_size (int): Number of neurons in hidden layer
            output_size (int): Number of output values
        
        Why call super().__init__()?:
        - This initializes the parent class (nn.Module)
        - Required for PyTorch to track parameters
        """
        # Initialize parent class - ALWAYS DO THIS FIRST!
        super(SimpleNet, self).__init__()
        
        # ============================================
        # DEFINE LAYERS
        # ============================================
        
        # Layer 1: Linear transformation (input ‚Üí hidden)
        # nn.Linear(in_features, out_features)
        # This creates: output = input @ weights + bias
        self.fc1 = nn.Linear(input_size, hidden_size)
        # fc1 has: 4√ó8 = 32 weights + 8 biases = 40 parameters
        
        # Activation function: ReLU (Rectified Linear Unit)
        # ReLU(x) = max(0, x)
        # Why ReLU? Introduces non-linearity, fast to compute
        self.relu = nn.ReLU()
        
        # Layer 2: Linear transformation (hidden ‚Üí output)
        self.fc2 = nn.Linear(hidden_size, output_size)
        # fc2 has: 8√ó1 = 8 weights + 1 bias = 9 parameters
        
        # Total parameters: 40 + 9 = 49 learnable parameters
    
    def forward(self, x):
        """
        Define the forward pass (how data flows through the network).
        
        Args:
            x (Tensor): Input data, shape (batch_size, input_size)
        
        Returns:
            Tensor: Network output, shape (batch_size, output_size)
        
        What happens here:
        1. Input (batch, 4) ‚Üí Linear ‚Üí (batch, 8)
        2. (batch, 8) ‚Üí ReLU ‚Üí (batch, 8) [some values become 0]
        3. (batch, 8) ‚Üí Linear ‚Üí (batch, 1) [final prediction]
        """
        # Step 1: First linear layer
        # x has shape: (batch_size, 4)
        x = self.fc1(x)  # Output shape: (batch_size, 8)
        
        # Step 2: Apply ReLU activation
        # This makes the network non-linear (can learn complex patterns)
        # Without activation, the network is just linear regression!
        x = self.relu(x)  # Shape stays: (batch_size, 8)
        
        # Step 3: Second linear layer (output)
        x = self.fc2(x)  # Output shape: (batch_size, 1)
        
        return x


# ============================================
# CREATE AN INSTANCE OF THE NETWORK
# ============================================

# Create the model
model = SimpleNet(input_size=4, hidden_size=8, output_size=1)

print("="*60)
print("üß† NEURAL NETWORK CREATED")
print("="*60)
print("\nüìä Architecture:")
print(model)
print()

In [None]:
# ============================================
# INSPECT THE MODEL PARAMETERS
# ============================================
# Understanding parameters helps you:
# - Debug model size issues
# - Calculate memory requirements
# - Understand what the model is learning

print("üîç Model Parameters:\n")

# Method 1: Print all parameters with names
print("üìã All Parameters (with names):")
for name, param in model.named_parameters():
    print(f"  {name:20s} | Shape: {str(param.shape):15s} | Elements: {param.numel():4d}")
print()

# Method 2: Count total parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print("üìä Parameter Summary:")
print(f"  Total parameters: {total_params}")
print(f"  Trainable parameters: {trainable_params}")
print()

# Explanation of parameter counts:
print("üí° Parameter Breakdown:")
print(f"  Layer 1 (fc1): 4 inputs √ó 8 outputs = 32 weights + 8 biases = 40 parameters")
print(f"  Layer 2 (fc2): 8 inputs √ó 1 output = 8 weights + 1 bias = 9 parameters")
print(f"  Total: 40 + 9 = 49 parameters")
print()
print("‚úÖ All parameters are trainable (requires_grad=True)")

In [None]:
# ============================================
# TEST THE FORWARD PASS
# ============================================
# Let's verify the model works by passing some data through it

print("üß™ Testing Forward Pass:\n")

# Create some random input data
# Shape: (batch_size=3, features=4)
# This represents 3 samples, each with 4 features
test_input = torch.randn(3, 4)

print("üì• Input:")
print(test_input)
print(f"Shape: {test_input.shape}\n")

# Pass through the model (forward pass)
# PyTorch automatically calls the forward() method
test_output = model(test_input)

print("üì§ Output:")
print(test_output)
print(f"Shape: {test_output.shape}\n")

print("‚úÖ Success! The model processed the input correctly.")
print("üí° The output is random because the model hasn't been trained yet.")

---
---

# üîÑ PART 4: Understanding the Training Loop

## üéØ What is the Training Loop?

The training loop is where the **learning happens**. It consists of 4 key steps that repeat for many iterations:

### **The 4 Steps of Training:**

```python
1. FORWARD PASS:
   predictions = model(inputs)
   # Pass data through network to get predictions

2. COMPUTE LOSS:
   loss = criterion(predictions, targets)
   # Calculate how wrong the predictions are

3. BACKWARD PASS:
   optimizer.zero_grad()  # Clear old gradients
   loss.backward()        # Compute new gradients
   # Calculate how to adjust each weight

4. UPDATE WEIGHTS:
   optimizer.step()
   # Adjust weights to reduce loss
```

**Analogy:** Think of it like learning to throw darts:
1. **Forward:** Throw a dart (make a prediction)
2. **Loss:** See how far from bullseye (measure error)
3. **Backward:** Figure out how to adjust your throw (compute gradients)
4. **Update:** Adjust your technique (update weights)

Repeat thousands of times ‚Üí Get better at hitting bullseye!

---

## üìù Section 4.1: Components of Training

Before we can train, we need:
1. **Loss Function** (criterion): Measures how wrong our predictions are
2. **Optimizer**: Updates weights to reduce loss
3. **Data**: Inputs and their correct outputs

In [None]:
# ============================================
# STEP 1: DEFINE LOSS FUNCTION
# ============================================
# Loss function (also called criterion) measures prediction error
# Lower loss = better predictions

# For regression problems, we use Mean Squared Error (MSE)
# MSE = average of (prediction - target)¬≤
criterion = nn.MSELoss()

print("üìâ Loss Function: Mean Squared Error (MSE)")
print("   Formula: MSE = (1/n) * Œ£(prediction - target)¬≤")
print("   Why MSE? Penalizes large errors more than small ones")
print()

# Example: How MSE works
predictions = torch.tensor([2.0, 3.0, 4.0])
targets = torch.tensor([2.5, 3.2, 3.8])
example_loss = criterion(predictions, targets)

print("üîç Example MSE Calculation:")
print(f"  Predictions: {predictions.numpy()}")
print(f"  Targets:     {targets.numpy()}")
print(f"  MSE Loss:    {example_loss.item():.4f}")
print(f"  Interpretation: Average squared difference is {example_loss.item():.4f}")
print()

In [None]:
# ============================================
# STEP 2: DEFINE OPTIMIZER
# ============================================
# Optimizer updates weights to minimize loss
# Think of it as the "learning algorithm"

# We'll use Adam optimizer (most popular choice)
# Adam = Adaptive Moment Estimation
# - Adapts learning rate for each parameter
# - Usually works well without tuning

learning_rate = 0.01  # How big are the update steps?
                      # Too high = unstable, too low = slow learning

optimizer = optim.Adam(model.parameters(), lr=learning_rate)

print("‚öôÔ∏è Optimizer: Adam")
print(f"   Learning rate: {learning_rate}")
print("   What it optimizes: All {49} parameters in our model")
print()
print("üí° Why Adam?")
print("   - Adapts learning rate automatically")
print("   - Works well for most problems")
print("   - Industry standard (used in 90% of deep learning)")
print()

---

## üìù Section 4.2: Understanding Each Step in Detail

Let's break down each step of the training loop with detailed explanations.

In [None]:
# ============================================
# DEMONSTRATION: ONE TRAINING ITERATION
# ============================================
# Let's manually go through one iteration step-by-step

print("="*60)
print("üéì DETAILED WALKTHROUGH: ONE TRAINING ITERATION")
print("="*60)
print()

# Prepare some dummy data
# In real training, this would come from your dataset
dummy_input = torch.randn(5, 4)   # 5 samples, 4 features each
dummy_target = torch.randn(5, 1)  # 5 target values

print("üì¶ Input Data:")
print(f"   Input shape: {dummy_input.shape}   (5 samples √ó 4 features)")
print(f"   Target shape: {dummy_target.shape} (5 target values)")
print()

# ============================================
# STEP 1: FORWARD PASS
# ============================================
print("üìä STEP 1: FORWARD PASS")
print("   ‚Üí Pass input through the network to get predictions\n")

predictions = model(dummy_input)

print(f"   Predictions shape: {predictions.shape}")
print(f"   First 3 predictions: {predictions[:3].detach().numpy().flatten()}")
print(f"   First 3 targets:     {dummy_target[:3].numpy().flatten()}")
print()

# ============================================
# STEP 2: COMPUTE LOSS
# ============================================
print("üìâ STEP 2: COMPUTE LOSS")
print("   ‚Üí Calculate how wrong our predictions are\n")

loss = criterion(predictions, dummy_target)

print(f"   Loss value: {loss.item():.4f}")
print(f"   What this means: Average squared error is {loss.item():.4f}")
print(f"   Goal: Make this number smaller!")
print()

# ============================================
# STEP 3: BACKWARD PASS (BACKPROPAGATION)
# ============================================
print("üîô STEP 3: BACKWARD PASS (Backpropagation)")
print("   ‚Üí Calculate gradients for all parameters\n")

# CRITICAL: Always zero gradients first!
# Why? Gradients accumulate by default in PyTorch
# If we don't zero them, old gradients will be added to new ones
optimizer.zero_grad()
print("   ‚úì Cleared old gradients (optimizer.zero_grad())")

# Compute gradients via backpropagation
# This is the magic of PyTorch - automatic differentiation!
loss.backward()
print("   ‚úì Computed gradients (loss.backward())")

# Let's inspect a gradient to see what happened
first_param_grad = list(model.parameters())[0].grad
print(f"   Example gradient shape: {first_param_grad.shape}")
print(f"   Gradient statistics: mean={first_param_grad.mean():.6f}, std={first_param_grad.std():.6f}")
print()

# ============================================
# STEP 4: UPDATE WEIGHTS
# ============================================
print("‚ö° STEP 4: UPDATE WEIGHTS")
print("   ‚Üí Adjust parameters to reduce loss\n")

# Get current weight before update
first_param_before = list(model.parameters())[0].data.clone()

# Update weights using computed gradients
# The optimizer uses: new_weight = old_weight - learning_rate * gradient
optimizer.step()
print("   ‚úì Updated all weights (optimizer.step())")

# Get weight after update to see the change
first_param_after = list(model.parameters())[0].data
weight_change = (first_param_after - first_param_before).abs().mean()

print(f"   Average weight change: {weight_change:.6f}")
print()

print("="*60)
print("‚úÖ ONE TRAINING ITERATION COMPLETE!")
print("="*60)
print()
print("üí° Key Takeaway:")
print("   Repeat these 4 steps thousands of times ‚Üí Model learns!")
print("   Each iteration: predictions get better, loss gets smaller")

---
---

# üéØ PART 5: Complete Training Example

## üéØ What we'll do:
Now let's put everything together and train a model on real data!

We'll:
1. Generate synthetic dataset
2. Train the model for multiple epochs
3. Visualize the training progress
4. Evaluate the final model

## üìä Dataset:
We'll create a simple regression problem:
- **Input:** 4 features (random numbers)
- **Output:** Sum of all features (with some noise)
- **Task:** Learn to predict the sum from the features

This is a learnable pattern - the model should achieve low loss!

In [None]:
# ============================================
# STEP 1: GENERATE SYNTHETIC DATASET
# ============================================

print("üìä Generating Synthetic Dataset...\n")

# Generate 1000 training samples
n_samples = 1000
n_features = 4

# Random inputs: values between -1 and 1
X_train = torch.randn(n_samples, n_features)

# Target: sum of all features + small noise
# This creates a learnable pattern!
y_train = X_train.sum(dim=1, keepdim=True) + torch.randn(n_samples, 1) * 0.1

print(f"‚úÖ Training Data Created:")
print(f"   Input shape: {X_train.shape}   (1000 samples √ó 4 features)")
print(f"   Output shape: {y_train.shape} (1000 target values)")
print()
print(f"üìã Example Data:")
print(f"   First sample input: {X_train[0].numpy()}")
print(f"   First sample target: {y_train[0].item():.4f}")
print(f"   Sum of inputs: {X_train[0].sum().item():.4f}")
print(f"   (Target ‚âà Sum due to small noise)")
print()

# Generate 200 test samples (for evaluation)
n_test = 200
X_test = torch.randn(n_test, n_features)
y_test = X_test.sum(dim=1, keepdim=True) + torch.randn(n_test, 1) * 0.1

print(f"‚úÖ Test Data Created:")
print(f"   Input shape: {X_test.shape}")
print(f"   Output shape: {y_test.shape}")
print()

In [None]:
# ============================================
# STEP 2: CREATE FRESH MODEL AND OPTIMIZER
# ============================================
# Create a new model for clean training

model = SimpleNet(input_size=4, hidden_size=8, output_size=1)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

print("üß† Model Ready for Training")
print(f"   Architecture: 4 ‚Üí 8 ‚Üí 1")
print(f"   Parameters: {sum(p.numel() for p in model.parameters())}")
print(f"   Optimizer: Adam (lr=0.01)")
print(f"   Loss Function: MSE")
print()

In [None]:
# ============================================
# STEP 3: TRAINING LOOP
# ============================================

print("="*60)
print("üèãÔ∏è TRAINING THE MODEL")
print("="*60)
print()

# Training configuration
num_epochs = 100  # How many times to go through the entire dataset
                  # More epochs = more learning (but risk overfitting)

# List to store loss history for plotting
train_losses = []

# Main training loop
for epoch in range(num_epochs):
    # ============================================
    # THE 4 TRAINING STEPS (repeated each epoch)
    # ============================================
    
    # 1. FORWARD PASS
    predictions = model(X_train)
    
    # 2. COMPUTE LOSS
    loss = criterion(predictions, y_train)
    
    # 3. BACKWARD PASS
    optimizer.zero_grad()  # Clear old gradients
    loss.backward()        # Compute new gradients
    
    # 4. UPDATE WEIGHTS
    optimizer.step()
    
    # Save loss for plotting
    train_losses.append(loss.item())
    
    # Print progress every 10 epochs
    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1:3d}/{num_epochs}] | Loss: {loss.item():.4f}")

print()
print("="*60)
print("‚úÖ TRAINING COMPLETE!")
print("="*60)
print()
print(f"üìä Final Training Loss: {train_losses[-1]:.4f}")
print(f"üìà Loss Reduction: {train_losses[0]:.4f} ‚Üí {train_losses[-1]:.4f}")
print(f"üìâ Improvement: {((train_losses[0] - train_losses[-1]) / train_losses[0] * 100):.1f}%")

In [None]:
# ============================================
# STEP 4: VISUALIZE TRAINING PROGRESS
# ============================================

plt.figure(figsize=(12, 5))

# Plot 1: Loss over time
plt.subplot(1, 2, 1)
plt.plot(train_losses, linewidth=2, color='blue')
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss (MSE)', fontsize=12)
plt.title('Training Loss Over Time', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)

# Add annotations
plt.text(0, train_losses[0], f'Start: {train_losses[0]:.3f}', 
         fontsize=10, color='red', ha='left')
plt.text(len(train_losses)-1, train_losses[-1], f'End: {train_losses[-1]:.3f}', 
         fontsize=10, color='green', ha='right')

# Plot 2: Loss (log scale) - better for seeing later improvements
plt.subplot(1, 2, 2)
plt.plot(train_losses, linewidth=2, color='green')
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss (MSE) - Log Scale', fontsize=12)
plt.title('Training Loss (Log Scale)', fontsize=14, fontweight='bold')
plt.yscale('log')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüìä What the plots show:")
print("   ‚Üí Left: Loss decreases rapidly at first, then plateaus")
print("   ‚Üí Right: Log scale shows continued improvement throughout")
print("   ‚úÖ This is healthy training - loss consistently decreases!")

In [None]:
# ============================================
# STEP 5: EVALUATE ON TEST DATA
# ============================================

print("\nüéØ EVALUATING TRAINED MODEL\n")

# Set model to evaluation mode
# This disables dropout, batch norm, etc. (not used in our simple model)
model.eval()

# Disable gradient computation (saves memory and speeds up)
with torch.no_grad():
    # Make predictions on test set
    test_predictions = model(X_test)
    
    # Compute test loss
    test_loss = criterion(test_predictions, y_test)

print(f"üìä Test Set Performance:")
print(f"   Test Loss (MSE): {test_loss.item():.4f}")
print(f"   Training Loss: {train_losses[-1]:.4f}")
print()

if test_loss.item() < train_losses[-1] * 1.2:
    print("‚úÖ Good! Test loss is similar to training loss")
    print("   ‚Üí Model generalizes well to new data")
else:
    print("‚ö†Ô∏è Test loss is much higher than training loss")
    print("   ‚Üí Model may be overfitting")

# Show some example predictions
print("\nüîç Sample Predictions vs Targets:")
print("\n   Prediction  |  Target  |  Error")
print("   " + "="*40)
for i in range(5):
    pred = test_predictions[i].item()
    target = y_test[i].item()
    error = abs(pred - target)
    print(f"   {pred:10.4f}  |  {target:7.4f}  |  {error:6.4f}")

print("\nüí° Interpretation:")
print("   Small errors ‚Üí Model learned the pattern well!")
print("   The model successfully learned to sum inputs.")

---
---

# üéØ PART 6: Challenge Exercise

## üèÜ Your Turn to Code!

Now that you understand PyTorch basics, try this challenge:

### **Challenge: Build and Train a Classifier**

**Task:** Create a neural network that classifies data into 2 categories (binary classification)

### **Requirements:**

1. **Network Architecture:**
   - Input: 10 features
   - Hidden layer 1: 16 neurons + ReLU
   - Hidden layer 2: 8 neurons + ReLU
   - Output: 1 neuron + Sigmoid (for binary classification)

2. **Training Setup:**
   - Use Binary Cross Entropy Loss (`nn.BCELoss()`)
   - Use SGD optimizer (`optim.SGD()`)
   - Train for 50 epochs
   - Use the provided dataset below

3. **Evaluation:**
   - Calculate accuracy on test set
   - Plot training loss
   - Show confusion matrix (optional)

### **Hints:**
- For binary classification, use `nn.Sigmoid()` as final activation
- BCELoss expects predictions in range [0, 1]
- Accuracy = (correct predictions) / (total predictions)
- Use `(predictions > 0.5).float()` to convert probabilities to binary predictions

### **Bonus Challenges:**
- Try different learning rates (0.001, 0.01, 0.1)
- Add more hidden layers
- Compare SGD vs Adam optimizer
- Visualize decision boundary (advanced!)

---

### **Dataset Provided:**

In [None]:
# ============================================
# CHALLENGE DATASET (Binary Classification)
# ============================================

# Generate synthetic binary classification data
torch.manual_seed(42)

# Training data
X_train_challenge = torch.randn(500, 10)  # 500 samples, 10 features
# Target: 1 if sum > 0, else 0
y_train_challenge = (X_train_challenge.sum(dim=1) > 0).float().unsqueeze(1)

# Test data
X_test_challenge = torch.randn(100, 10)
y_test_challenge = (X_test_challenge.sum(dim=1) > 0).float().unsqueeze(1)

print("üì¶ Challenge Dataset Ready!")
print(f"   Training: {X_train_challenge.shape} inputs, {y_train_challenge.shape} labels")
print(f"   Test: {X_test_challenge.shape} inputs, {y_test_challenge.shape} labels")
print(f"\n   Class distribution:")
print(f"   Class 0: {(y_train_challenge == 0).sum().item()} samples")
print(f"   Class 1: {(y_train_challenge == 1).sum().item()} samples")
print("\nüí° Your task: Build a network that predicts the class!")

In [None]:
# ============================================
# YOUR CODE HERE
# ============================================

# Step 1: Define your network class
class BinaryClassifier(nn.Module):
    def __init__(self):
        super(BinaryClassifier, self).__init__()
        # TODO: Define your layers
        # Hint: Input ‚Üí Hidden(16) ‚Üí Hidden(8) ‚Üí Output(1)
        pass
    
    def forward(self, x):
        # TODO: Implement forward pass
        # Remember: Use Sigmoid at the end!
        pass

# Step 2: Create model, loss, and optimizer
# TODO: Your code here

# Step 3: Training loop
# TODO: Implement training for 50 epochs

# Step 4: Evaluate and plot
# TODO: Calculate accuracy and plot loss

# Good luck! üçÄ

---
---

# üìö Summary: What We Learned Today

## ‚úÖ Part 1: Tensors in PyTorch

**Key Concepts:**
- **Tensors** are multi-dimensional arrays (like NumPy, but GPU-accelerated)
- **Creation:** `torch.tensor()`, `torch.zeros()`, `torch.randn()`, etc.
- **Indexing:** Works like NumPy - `tensor[0]`, `tensor[:, 1]`, etc.
- **Reshaping:** Use `.view()` and `.reshape()` to change dimensions
- **Important properties:** `.shape`, `.dtype`, `.device`, `.numel()`

**Key Takeaways:**
- Tensors can run on GPU for massive speedup
- Always check tensor shapes - most bugs come from shape mismatches!
- Use `.view(-1)` to flatten, `.unsqueeze(0)` to add batch dimension

---

## ‚úÖ Part 2: Building Neural Networks

**Key Concepts:**
- **nn.Module:** Base class for all neural networks
- **Two required methods:**
  - `__init__()`: Define layers
  - `forward()`: Define data flow
- **Common layers:**
  - `nn.Linear()`: Fully-connected layer
  - `nn.ReLU()`: Activation function
  - `nn.Sigmoid()`: For binary classification

**Key Takeaways:**
- Always call `super().__init__()` first!
- Activation functions add non-linearity (essential for learning)
- Use `.named_parameters()` to inspect learnable parameters

---

## ‚úÖ Part 3: The Training Loop

**The 4 Sacred Steps:**

```python
1. FORWARD:  predictions = model(inputs)
2. LOSS:     loss = criterion(predictions, targets)
3. BACKWARD: optimizer.zero_grad() + loss.backward()
4. UPDATE:   optimizer.step()
```

**Key Components:**
- **Loss Function:** Measures prediction error (MSE, CrossEntropy)
- **Optimizer:** Updates weights to reduce loss (Adam, SGD)
- **Epochs:** Number of times to go through entire dataset

**Key Takeaways:**
- **ALWAYS** call `optimizer.zero_grad()` before `loss.backward()`
- Training is just repeating these 4 steps thousands of times!
- Loss should decrease over time - if not, something is wrong

---

## üéØ Critical Things to Remember:

### **The Training Loop Order:**
```python
# ‚úÖ CORRECT ORDER
predictions = model(inputs)        # 1. Forward
loss = criterion(predictions, targets)  # 2. Loss
optimizer.zero_grad()              # 3. Clear gradients
loss.backward()                    # 4. Compute gradients
optimizer.step()                   # 5. Update weights

# ‚ùå WRONG - Missing zero_grad()
loss.backward()  
optimizer.step()  # Gradients accumulate - model won't learn properly!
```

### **Common Beginner Mistakes:**

1. **Forgetting `optimizer.zero_grad()`**
   - Gradients accumulate ‚Üí wrong updates
   
2. **Wrong tensor shapes**
   - Check shapes often: `print(tensor.shape)`
   
3. **Not setting model to eval mode**
   - Use `model.eval()` before testing
   
4. **Using `loss.backward()` on non-scalar tensors**
   - Loss must be a single number

---

## üöÄ Next Steps:

**What's coming next:**
- **Week 10:** Convolutional Neural Networks (CNNs) for image processing
- **Week 11:** Transfer learning with pre-trained models
- **Week 12:** Recurrent Neural Networks (RNNs) for sequences

**To practice before next class:**
1. Complete the challenge exercise above
2. Try different network architectures
3. Experiment with learning rates
4. Read PyTorch tutorials: https://pytorch.org/tutorials/

---

## üí° Pro Tips:

1. **Always check shapes:** `print(tensor.shape)` is your best friend
2. **Start simple:** Get a small network working before making it complex
3. **Monitor loss:** It should decrease consistently during training
4. **Use GPU when possible:** Massive speedup for large models
5. **Read documentation:** PyTorch docs are excellent!

---

## üìñ Additional Resources:

- **PyTorch Official Tutorials:** https://pytorch.org/tutorials/
- **PyTorch Documentation:** https://pytorch.org/docs/
- **PyTorch Forums:** https://discuss.pytorch.org/
- **Deep Learning with PyTorch (Free Book):** https://pytorch.org/assets/deep-learning/Deep-Learning-with-PyTorch.pdf

---

# üéâ Congratulations!

You now understand the fundamentals of PyTorch:
- ‚úÖ How to create and manipulate tensors
- ‚úÖ How to build neural networks
- ‚úÖ How to train models using the training loop

**You're ready to build real deep learning applications!** üöÄ

Keep practicing, experiment with different architectures, and don't be afraid to make mistakes - that's how we learn!

---

*"The only way to learn PyTorch is to code in PyTorch."*

**Happy coding! üî•**