## **🤖 Phase 3: Deep Learning with PyTorch**
We’ll start by building a neural network that recognizes handwritten digits using the classic MNIST dataset.

This will teach you:

* How neural networks work

* How to use PyTorch (a powerful deep learning library)

* Training, loss, and evaluation

#### **✅ Step 1: Install PyTorch (if not already)**


In [1]:
pip install torch torchvision

Collecting torch
  Downloading torch-2.7.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (29 kB)
Collecting torchvision
  Downloading torchvision-0.22.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (6.1 kB)
Collecting filelock (from torch)
  Downloading filelock-3.18.0-py3-none-any.whl.metadata (2.9 kB)
Collecting sympy>=1.13.3 (from torch)
  Downloading sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Collecting fsspec (from torch)
  Downloading fsspec-2025.3.2-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.6.77 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.6.77-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.6.77 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.6.80 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.6.80-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata 

### ✅ Step 2: Import Required Libraries

In [1]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim


1. import torch
    * This imports the core PyTorch library.

    * Think of torch like the brain of PyTorch — it handles tensors (like NumPy arrays, but more powerful) and all the basic computations.
2. import torchvision

   is a package that provides:
      * Popular datasets (like MNIST, CIFAR-10)

      * Common image models (like ResNet, AlexNet)

      * Useful tools for working with images
3. import torchvision.transforms as transforms
   
   helps you preprocess and augment image data, such as:

      * Converting images to tensors

      * Normalizing them

      * Randomly flipping, cropping, resizing, etc.
4. import torch.nn as nn

   This is the Neural Network module — it includes layers like:

      * nn.Linear for fully connected layers

      * nn.Conv2d for convolutional layers

      * nn.ReLU, nn.Softmax, etc. for activation functions

   You’ll use nn to build your model architecture.

5. import torch.optim as optim

   This gives you optimizers like:

      * optim.SGD (Stochastic Gradient Descent)

      * optim.Adam (adaptive learning)

   These help your model learn faster and better by updating weights during training.

### ✅ Step 3: Load MNIST Dataset

In [None]:
# Transform to convert images to tensor and normalize
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Download training and test datasets

# 1. Download & Transform the Training Set
trainset = torchvision.datasets.MNIST(
    root='./data', # folder to save the data
    train=True,    # this is the training set
    download=True, # download it if not already present
    transform=transform  # apply the transform you defined earlier
)

#2. Load Training Set in Batches
trainloader = torch.utils.data.DataLoader(
    trainset,        # the dataset to load
    batch_size=64,   # number of images in each batch
    shuffle=True     # shuffle data for training randomness
)

# 3.Download & Transform the Test Set
testset = torchvision.datasets.MNIST(
    root='./data',
    train=False,
    download=True, transform=transform
)

# 4. Load Test Set in Batches
testloader = torch.utils.data.DataLoader(testset, batch_size=64,
                                         shuffle=False)


100%|██████████| 9.91M/9.91M [00:09<00:00, 1.01MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 91.6kB/s]
100%|██████████| 1.65M/1.65M [00:02<00:00, 612kB/s] 
100%|██████████| 4.54k/4.54k [00:00<00:00, 19.2MB/s]


### 🪄 Step-by-Step:
1. transforms.Compose([...])

    Think of this like a pipeline or a to-do list.

    It will apply each step in order to every image you load.

2. transforms.ToTensor()

    Converts the image from a PIL image (or NumPy array) into a PyTorch tensor.

    Also scales pixel values from [0, 255] to [0.0, 1.0].

3. transforms.Normalize((0.5,), (0.5,))

    This step normalizes the image values so they are between -1 and 1.

    Here's how it works for each pixel value:

    new_value = (input - mean)/ std
    
    new_value = (original − 0.5) / 0.5

    So:

    * 0 → -1

    * 0.5 → 0

    * 1 → 1

🧠 This helps the neural network learn faster and more accurately.
### 🔎 Why (0.5,)?

   * The (0.5,) is for grayscale images (like MNIST), which have only 1 channel.

   * For color images (RGB), you'd use 3 values like (0.5, 0.5, 0.5).

## 🧠 What’s MNIST?

   * A dataset of 70,000 handwritten digits (0 through 9)

   * Each image is 28×28 pixels, grayscale

   * A classic dataset for learning image classification

### 🟩 Download training and test datasets
1. Download & Transform the Training Set:

       trainset = torchvision.datasets.MNIST(
       root='./data', # folder to save the data
       train=True,    # this is the training set
       download=True, # download it if not already present
       transform=transform  # apply the transform you defined earlier
       )
   * Downloads 60,000 training images.

   * Applies your transform to each image:

   * Converts to tensor

   * Normalizes pixel values

2. Load Training Set in Batches:

       trainloader = torch.utils.data.DataLoader(
       trainset,        # the dataset to load
       batch_size=64,   # number of images in each batch
       shuffle=True     # shuffle data for training randomness
       )

   * Breaks data into mini-batches of 64 images each.

   * shuffle=True, so that the model doesn’t see data in the same order every time.

3. Download & Transform the Test Set:

       testset = torchvision.datasets.MNIST(
       root='./data',
       train=False,
       download=True, transform=transform
       )
   * This time: train=False → loads 10,000 test images.

### ✅ Step 4: Define a Simple Neural Network

defining a simple feedforward neural network (also called a fully connected or dense neural network) for image classification, likely on MNIST (28×28 grayscale digit images).

In [3]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)  # flatten the image
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()


1. class Net(nn.Module): 

   * You’re creating your own neural network class that inherits from PyTorch’s nn.Module (which gives you access to lots of helpful features).
2. def __init__(self):
    super(Net, self).__init__()

    * __init__ is the constructor: it sets up the layers.

    * super() lets your class inherit all the cool PyTorch behaviors from nn.Module.
3. self.fc1 = nn.Linear(28*28, 128)

   * First fully connected (dense) layer

   * Input: 28×28 = 784 (flattened image)

   * Output: 128 neurons
4. self.fc2 = nn.Linear(128, 64)

   * Second layer: reduces 128 to 64 neurons
5. self.fc3 = nn.Linear(64, 10)

   * Final layer: maps to 10 output classes (for digits 0–9)
6. def forward(self, x):
    x = x.view(-1, 28*28)

   * Reshapes the input image from [batch, 1, 28, 28] to [batch, 784]

   * -1 means “keep the batch size the same”
7. x = torch.relu(self.fc1(x))
   x = torch.relu(self.fc2(x))

   * Applies ReLU activation after each layer

   * ReLU = Rectified Linear Unit, introduces non-linearity
8. x = self.fc3(x)

   * Final layer gives raw scores (called logits) for each class


### ✅ Step 5: Define Loss Function and Optimizer

setting up the loss function and optimizer — perfect next step for training your neural network!

In [4]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)


1. criterion = nn.CrossEntropyLoss()

**🔥 What it does:**

   * Measures how wrong your model's predictions are.

   * Perfect for classification tasks like digit recognition (0–9).

**✅ Why CrossEntropyLoss?**

Because:

* Your model outputs 10 scores (one for each class).

* CrossEntropyLoss combines:

    * Softmax: turns scores into probabilities.

    * Log loss: penalizes incorrect predictions more.

2. optimizer = optim.Adam(net.parameters(), lr=0.001)

**🔥 What it does:**

 * Updates the model's weights during training to reduce the loss.

***✅ Why Adam?**

 * It’s one of the most popular optimizers.

 *  Combines the best of SGD and RMSProp.

 * Learns fast and adaptively — great for beginners.

net.parameters():

  * Gives the optimizer access to all the learnable weights in your model.

lr=0.001 (learning rate):

   * Controls how big the steps are when updating weights.

   * 0.001 is a good starting point.

### ✅ Step 6: Train the Model

In [5]:
for epoch in range(5):  # 5 epochs
    running_loss = 0.0
    for images, labels in trainloader:
        optimizer.zero_grad()
        outputs = net(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
    print(f"Epoch {epoch+1}, Loss: {running_loss:.3f}")


Epoch 1, Loss: 374.266
Epoch 2, Loss: 184.975
Epoch 3, Loss: 133.082
Epoch 4, Loss: 105.000
Epoch 5, Loss: 89.613


1. for epoch in range(5):

   * You're training the model for 5 full passes through the training data (called "epochs").

2. for images, labels in trainloader:

   *  trainloader gives you batches of data (e.g., 64 images + labels at a time).

   * images: input data (like pictures of digits).

   * labels: the correct answers (e.g., 7, 2, 0, etc.).

3. optimizer.zero_grad()

   * Clears out any old gradients from the previous batch (PyTorch accumulates them by default).

4. outputs = net(images)

   * Runs the images through your model and gets predictions (raw scores).

5. loss = criterion(outputs, labels)

   * Compares predictions to the correct labels.

   * Computes how wrong the model is (the "loss").

6. loss.backward()

   * Calculates gradients (how much each weight should change).

7. optimizer.step()

   * Updates the model weights using those gradients.

8. running_loss += loss.item()

   * Adds up the loss from each batch to track total loss for the epoch.
 

### ✅ Step 7: Evaluate Accuracy

Check how well your model performs on the test dataset (data it has never seen before).

In [6]:
correct = 0
total = 0
with torch.no_grad():
    for images, labels in testloader:
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Test Accuracy: {100 * correct / total:.2f}%")


Test Accuracy: 96.67%


1. with torch.no_grad():

   * This tells PyTorch not to calculate gradients (which saves memory and speeds things up).

   * You don't need gradients during testing — only during training.

2. for images, labels in testloader:

   * You're getting batches of images + their correct labels from the testloader.

4. outputs = net(images)

   * Make Predictions
   * Run the images through the model to get output scores for each class.

5. _, predicted = torch.max(outputs.data, 1)

   * outputs.data contains the scores for each class (10 numbers per image).

   * torch.max(..., 1) gets the index of the highest score, i.e., the predicted class (e.g., "3" or "7").