![image.png](https://i.imgur.com/a3uAqnb.png)

# Pytorch Basics
---

# **📌 How Data is Represented in Deep Learning?**

Deep learning models process data in the form of **tensors** (multi-dimensional arrays).  
The shape of the tensor depends on the type of data being used.

## **🔹 1️⃣ Tabular Data (Structured Data)**
- **Shape:** `(batch_size, features)`
- Each **row** is a sample, and each **column** is a feature.
- **Handled by:** `nn.Linear` (Fully Connected Layers).

## **🔹 2️⃣ Image Data (Computer Vision)**
- **Shape:** `(batch_size, channels, height, width)`
  - **RGB Image:** `channels = 3` (Red, Green, Blue).
  - **Grayscale Image:** `channels = 1` (sometimes omitted).
- **Handled by:** `nn.Conv2d` (Convolutional Layers).

| **Data Type** | **Tensor Shape** | **Handled by** |
|--------------|-----------------|---------------|
| **Tabular Data** | `(batch_size, features)` | `nn.Linear` |
| **Image Data** | `(batch_size, channels, height, width)` | `nn.Conv2d` |

✅ Each data type has a specific tensor representation and requires different processing techniques.


# **📌 How to Change the Dimensions in PyTorch?**

Manipulating tensor shapes is essential in deep learning. PyTorch provides several functions to modify tensor dimensions.

## **🔹 1️⃣ Flatten**
- Converts **any shape** to `(batch_size, features)`.
- **Example:**  
  `(batch_size, channels, height, width) → (batch_size, features)`

## **🔹 2️⃣ Squeeze**
- **Removes dimensions** with size `1`.
- **Example:**  
  `(1, 32, 3, 28, 28) → (32, 3, 28, 28)`

## **🔹 3️⃣ Unsqueeze**
- **Adds a dimension** with size `1` at a specified position.
- **Example:**  
  `(3, 28, 28) → (1, 3, 28, 28)`

## **🔹 4️⃣ Permute**
- **Reorders the dimensions** of a tensor by specifying the **new order of indices**.
- **Example:**  
  `(32, 28, 28, 3) → permute(0, 3, 1, 2) → (32, 3, 28, 28)`

## **🔹 5️⃣ View (works similar to reshape)**
- **Reshapes a tensor freely** while maintaining the same number of elements.
- **Example:**  
  `(32, 28, 28, 3) → view(-1, 28*28*3) → `(32, 28*28*3)`

| **Operation** | **Function** | **Purpose** | **Example Transformation** |
|--------------|-------------|-------------|----------------------------|
| **Flatten** | `.flatten()` | Convert tensor to (batch, features) | `(32, 3, 28, 28) → (32, 3*28*28)` |
| **Squeeze** | `.squeeze()` | Remove dims of size 1 | `(1, 3, 28, 28) → (3, 28, 28)` |
| **Unsqueeze** | `.unsqueeze(dim)` | Add a dim of size 1 | `(3, 28, 28) → (1, 3, 28, 28)` |
| **Permute** | `.permute(dims)` | Change order of dimensions | `(32, 28, 28, 3) → (32, 3, 28, 28)` |
| **View** | `.view(shape)` | Reshape freely | `(32, 28, 28, 3) → (32, 28*28*3)` |


In [None]:
My comments:
Image data = (batch_size, channels, height, width) #Batch_size = samples, channels= the color of images RGB=3, Grey Scale=1, height and width for each image.
#we used previously take channels and height and width to make an element as features(batch_size, features) then flatten then do nn.Linear,
#But here we don't change the elements we pass them into conv2d then activation then pooling then flatten then Classification or regression (Fully Connected NN -the linear layer)

batch_size = how many data we have(samples), features = how many columns
---------
how to handle images, play with dimensions:
5 important functions:
1. flatten: convert any shape to batch_size, features. use after all conv2d layers then flatten then nn.Linear then softmax or sigmoid
2. squeeze:remove dimensions with size 1 even if it's batch_size, if the error shows that the shapes is (1, 32, 3, 28, 28)
# that means we need to sqeeze to get rid off 1
3. unsqueeze: add a dimension of size 1(the opposite of sqeeze) to increase the dimension 1
4. permute: reorder or rearrange the dimensions (32, 28, 28, 3) → permute(0, 3, 1, 2)
# 0: dont change the place of batch_size value, 3: bring the channels to the second place, 1:the height bring it to the third place,
# 2:the width bring it to the fourth place. → (32, 3, 28, 28) this is how pytorch always read the dimensions:
 # (batch_size, channels, height, width) so permute help to make data arranged this way
5. view: reshape in pytorch(except that reshape may result reshaping a copy or reshape the exact one )
#(32, 28, 28, 3) → view(-1, 28*28*3) →(32, 28283),,, -1 means alll the batch_size and features= channels*weight*height



In [None]:
import torch

# 1️⃣ Flatten - Convert any shape to (batch_size, features),
x = torch.randn(32, 3, 28, 28)
x_flat = x.flatten(start_dim=1)#The dimensions starts from 0, so we want to start from channel so we put 1 (batch_size, channels*height*widdth)= (0, 1*2*3)= (samples, features)
print("Flatten:", x_flat.shape)  # (32, 2352)

# 2️⃣ Squeeze - Remove dimensions with size 1
x = torch.randn(1, 3, 28, 28) #bring the x back again
x_sq = x.squeeze()
print("Squeeze:", x_sq.shape)  # (3, 28, 28)

# 3️⃣ Unsqueeze - Add a new dimension of size 1
x = torch.randn(3, 28, 28)
x_unsq = x.unsqueeze(0)
print("Unsqueeze:", x_unsq.shape)  # (1, 3, 28, 28)

# 4️⃣ Permute - Reorder dimensions , batch remains the same
x = torch.randn(32, 28, 28, 3)  # (batch, height, width, channels) the indices are (0,1,2,3) 0=batch 1=height 2=width 3=channels
x_perm = x.permute(0, 3, 1, 2)  # (batch, channels, height, width),,,, tha channel is first after the batch#####
print("Permute:", x_perm.shape)  # (32, 3, 28, 28)

# 5️⃣ View - Reshape freely while keeping same number of elements, close to flatten except you choose what to change,,,,, view=reshape they do the same thing it just another way.
x = torch.randn(32, 28, 28, 3)
x_view = x.view(32,-1)  # Flatten all except batch
print("View:", x_view.shape)  # (32, 28*28*3)


Flatten: torch.Size([32, 2352])
Squeeze: torch.Size([3, 28, 28])
Unsqueeze: torch.Size([1, 3, 28, 28])
Permute: torch.Size([32, 3, 28, 28])
View: torch.Size([32, 2352])


### 🔹 Changing Data Type or Moving Data/Model to CPU/GPU  

PyTorch allows you to **change the datatype** of a tensor and **move it between CPU and GPU** using `.to()`.  

---

### ✅ **Change Datatype**
Use `.to(dtype)` to convert a tensor's data type.

In [None]:
import torch

# Create a float32 tensor
x = torch.tensor([1.2, 2.3, 3.4], dtype=torch.float32)# tensor means مصفوفات
print(x.dtype)  # Output: torch.float32

# Convert to float16 or x.astype(torch.float16)
x_half = x.to(torch.float16)#to function we can use it either to change the datatype or to change the device between CPU or GPU
#if we want to change it to integer : x.to(torch.long)
print(x_half.dtype)  # Output: torch.float16

torch.float32
torch.float16


### ✅ **Move Tensors to GPU (if available)**
Use `.to(device)` to move a tensor to GPU for faster computation.

**GPUs are faster and more efficient** in most cases when training or inferencing deep learning models.  


In [None]:
# Automatically select CPU or GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Create a tensor and move it to GPU
x_gpu = x.to(device)
print(x_gpu.device)  # Output: cuda:0 (if GPU is available) or cpu

cuda:0


*****IMPORTANT$$$$$$$ Note: When training a model, always
move BOTH the model and data to the same device. Otherwise, you will get an error like this:

`RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!`


THAT MEANS YOU FORGET (x.to(device))

### 🧠 AI Layers

PyTorch provides various **neural network layers** to build deep learning models. Below are some of the most commonly used layers.

---

## 🔹 1️⃣ Linear Layer (`nn.Linear`)

### 📌 **Usage**
1. Used for **fully connected layers** (Dense layers).
2. Typically used as the **final layer** in CNNs for classification.

In [None]:
# Example

import torch
import torch.nn as nn

# Define a Linear layer
linear_layer = nn.Linear(in_features=5, out_features=3)#care about channels here only so they are RGB=3

# Random input tensor (batch_size=16, in_features=5)
x = torch.randn(16, 5) #in linear layer batch size first then features
print(x)

# Forward pass
output = linear_layer(x)

print("Input Shape:", x.shape)       # (16, 5)
print("Output Shape:", output.shape)  # (16, 3)

tensor([[-0.3215, -1.5391, -0.4777,  0.3489, -1.0123],
        [ 0.5330, -0.0193, -0.0350, -0.8496, -0.9325],
        [-0.6535,  0.8277, -0.2233,  0.2804,  0.8666],
        [ 0.5195, -1.3002,  0.0297,  0.2345,  0.6802],
        [-0.2172,  2.1970,  1.1002,  0.7686, -0.8665],
        [ 0.0791, -1.1951,  0.0657,  1.0787,  0.4159],
        [-0.2228, -0.2784,  0.8773, -0.2988, -0.8165],
        [ 0.3773, -0.9491, -0.5993,  0.3754,  1.1734],
        [-0.2324,  0.2299,  1.0320, -1.3078,  1.0531],
        [ 0.0797,  0.9410,  0.8499,  0.7984,  0.0718],
        [ 0.7790,  0.2712,  0.4804, -1.1975,  0.0832],
        [ 0.1350, -0.4812,  0.5602, -0.8057,  0.7549],
        [ 0.8890, -0.2294,  0.3046, -0.1650, -0.6700],
        [-0.9126,  0.3510, -0.3198, -1.0591, -0.4240],
        [ 0.4036, -1.5656, -1.3853,  1.4013, -0.2497],
        [-0.3393, -0.5641, -1.4072, -0.3465,  0.6024]])
Input Shape: torch.Size([16, 5])
Output Shape: torch.Size([16, 3])


## 🔹 2️⃣ Convolutional Layer (`nn.Conv2d`)


###  📌 **Usage**
✅ `nn.Conv2d` is used for **feature extraction** in images.  
🚫 It **does not perform classification or regression**—you need a `nn.Linear` layer for that.  


In [None]:
# Example

import torch
import torch.nn as nn

# Define a Conv2D layer
conv_layer = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3) #RGB -> channels is 3, Black&White -> channels is , out_channels if it colored will be 3 if not 16 usually CNN has increased channels dimension 16, 32, 64,
# but in Linear Layer is the opposite the out_channel is smaller.

# Random input tensor (batch_size=16, channels=3, height=32, width=32)
x = torch.randn(16, 3, 32, 32)

# Forward pass
output = conv_layer(x)

print("Input Shape:", x.shape)       # (16, 3, 32, 32)
print("Output Shape:", output.shape)  # (16, 16, 30, 30) (batch_size, out_channels ,height, width ) #calculate it using the feature map formula


Input Shape: torch.Size([16, 3, 32, 32])
Output Shape: torch.Size([16, 16, 30, 30])


# **📌 Important Note**
When you use **Convolutional layers (`nn.Conv2d`)**, the **output channels tend to be larger** than the input channels (unlike `nn.Linear`).  

### **Why?**
- Each convolutional layer **extracts more useful features** from the input.
- As more filters are applied, the **number of channels increases**.
- Meanwhile, **spatial size (height & width) decreases**.

✅ **This allows the model to capture richer features while reducing unnecessary spatial details.**


The **output image size** after a convolutional layer is calculated using the formula (or you can get it by trial-and-error):

![image.png](https://i.imgur.com/8XKBFBU.jpeg)

## 🔹 3️⃣ Pooling (`nn.MaxPool2d` / `nn.AvgPool2d`)**

###  📌 **Usage**
- **Reduces spatial dimensions** (height & width) while retaining important features.
- Typically used **after Conv2D layers** to downsample feature maps.

In [None]:
import torch
import torch.nn as nn

# Define layers
conv = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1)#same padding
relu = nn.ReLU()    ## An old friend ;)     Do you remember why do we need it here? for adding non-linearity
pool = nn.MaxPool2d(kernel_size=2, stride=2)#reduce size of the image

# Sample input tensor (batch_size=2, channels=3, height=32, width=32)
x = torch.randn(2, 3, 32, 32)

"""
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
"""
# 🔹 Step 1: Convolution
x = conv(x)
print("After Conv2D:", x.shape)  # (2, 16, 32, 32)the channel increase due to conv2d

# 🔹 Step 2: ReLU Activation
x = relu(x)
print("After ReLU:", x.shape)  # (2, 16, 32, 32)

# 🔹 Step 3: Pooling (Reduces spatial size)
x = pool(x)
print("After Pooling:", x.shape)  # (2, 16, 16, 16)#the image size reduced in pooling







'''
conv2d : increase the number of channels
pooling : reduce the size of images (height, width)
ReLu : add non-linearity to the features
'''

After Conv2D: torch.Size([2, 16, 32, 32])
After ReLU: torch.Size([2, 16, 32, 32])
After Pooling: torch.Size([2, 16, 16, 16])


### **But wait a minute, if Conv2D extracts the features only, how should we do the classification/regression? 🤔**

Conv2D layers are responsible for extracting features (edges, textures, patterns, etc.) from the input data, but they **do not perform classification or regression directly**.  

To classify or predict, we need to **map the extracted features** to the desired output using **fully connected layers (`nn.Linear`)** after flattening the feature maps.

---

### **🏗️ CNN Structure Example**
A typical CNN model for classification or regression follows this pattern:

1️⃣ **`nn.Conv2d`** → Extracts features .  
2️⃣ **`nn.MaxPool2d`** → Reduces feature map size to focus on important information.  
3️⃣ **`nn.Conv2d`** → Extracts more features.  

... Add more layers if you want

4️⃣ **Flatten & `nn.Linear`** → Maps extracted features to output classes or predictions.


![image.png](https://i.imgur.com/fwNdXJs.jpeg)


---

### **🔹 Example: Classification with CNN**
| **Layer**                  | **Purpose**                        | **Example Shape Transformation** |
|----------------------------|-------------------------------------|-----------------------------------|
| **Input Image**            | Raw input                          | `(batch_size, channels, height, width)` |
| **`nn.Conv2d`**            | Extract features                   | `(32, 3, 32, 32) → (32, 16, 30, 30)` |
| **`nn.MaxPool2d`**         | Downsample feature maps            | `(32, 16, 30, 30) → (32, 16, 15, 15)` |
| **`nn.Conv2d`**            | Extract more features              | `(32, 16, 15, 15) → (32, 32, 13, 13)` |
| **Flatten**                | Prepare for fully connected layers | `(32, 32, 13, 13) → (32, 32*13*13)` |
| **`nn.Linear`**            | Map features to classes/predictions| `(32, 32*13*13) → (32, 10)` (for 10 classes) |




In [None]:
### STEP2:Model class: ###


import torch
import torch.nn as nn

#### Define layers
conv = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3)  # Conv2D layer
relu = nn.ReLU()  # Activation function, the size now  16, 30 ,30
pool = nn.MaxPool2d(kernel_size=2)  # Pooling layer
flatten = nn.Flatten()  # Flatten layer to prepare for Linear
linear = nn.Linear(16 * 15 * 15, 10)  # Fully connected layer for classification (10 classes), 0 to 9
#channel*height*width
softmax = nn.Softmax(dim=1)  # ## Another old friend ;),,,,, sigmoid for...

# Sample input tensor (batch_size=2, channels=3, height=32, width=32)
x = torch.randn(2, 3, 32, 32)






#### USE THE LAYERS



# 🔹 Step 1: Convolution
x = conv(x)
print("After Conv2D:", x.shape)  # (2, 16, 30, 30), channel increases was # Sample input tensor (batch_size=2, channels=3, height=32, width=32)
# 32 the size then 32-2=30 the padding not written but it affect the size ,
#If i dont want to reduce the size by 2(height, width) add at conv = padding =1 to make no change in the dimensions of the images


# 🔹 Step 2: ReLU Activation
x = relu(x)
print("After ReLU:", x.shape)  # (2, 16, 30, 30)

# 🔹 Step 3: Pooling (Reduces spatial size)
x = pool(x)
print("After Pooling:", x.shape)  # (2, 16, 15, 15)

# 🔹 Step 4: Flatten (Convert to batch_size, features)
x = flatten(x)
print("After Flatten:", x.shape)  # (2, 16*15*15)

# 🔹 Step 5: Fully Connected Layer
x = linear(x) # the input here is the feature after flatten nn.Linear(16 * 15 * 15, 10),,,, 15 you can know it by replacing the number with randoms as(2789345759t8345)trillion,
#then print the shape ,then you will get an error that tells you the shape of the images
print("After Linear (Logits):", x.shape)  # (2, 10) → 10 values, one per class

# 🔹 Step 6: Softmax (Convert logits(16 * 15 * 15, 10)(numbers that before softmax) to probabilities)
x = softmax(x)# WHY??? for probability,, Mean Squared Error(MSE)?? for regression ,, softmax then loss is CrossEntropyLoss ??? multi classification
#Do we need softmax to do classification?! no if we use CrossEntropyLoss it apply softmax internally
print("After Softmax (Probabilities):", x[0])  # Probabilities for each class


After Conv2D: torch.Size([2, 16, 30, 30])
After ReLU: torch.Size([2, 16, 30, 30])
After Pooling: torch.Size([2, 16, 15, 15])
After Flatten: torch.Size([2, 3600])
After Linear (Logits): torch.Size([2, 10])
After Softmax (Probabilities): tensor([0.0890, 0.0669, 0.1922, 0.1198, 0.1254, 0.0540, 0.0289, 0.0952, 0.1651,
        0.0636], grad_fn=<SelectBackward0>)


# **Great! Now you know how to build a CNN model!**

However, PyTorch has a specific structure to organize your workflow.

---

## **📌 PyTorch Workflow Organization**

### **It consists of 4 main components:**
1️⃣ **Dataset Class**  
- Handles loading and preprocessing data.  
- Converts raw data (e.g., images, CSVs) into model-ready tensors.  

2️⃣ **Model Class**  
- Defines the architecture of your neural network (e.g., layers, activations).  

3️⃣ **Training Loop**  
- Updates model weights using backpropagation and optimizers.  
- Computes the loss for every batch and adjusts the parameters to minimize it.  

4️⃣ **Validation Loop**  
- Evaluates the model's performance on a validation set.  
- Does not update weights but computes metrics like accuracy or loss.  

---

### **📌 Note:**
All the labs will follow this structure. You will just modify the content for different tasks, such as changing datasets, architectures, or loss functions.

# **📌 Dataset Class**

- The **Dataset Class** is designed to **load and preprocess only one sample** at a time.
- The **DataLoader** uses the Dataset Class to load **multiple samples (batches)**.

## **1️⃣ It Could Be Ready-to-Use:**

In [None]:
#Ready data

from torchvision.datasets import MNIST

train_dataset = MNIST(root="./data", train=True, download=True)
test_dataset = MNIST(root="./data", train=False, download=True)

#Or write it by yourself:

for torchvision.datasets import CIFAR10

train_dataset = CIFAR10(root="./data", train=True, download=True)
test_dataset = CIFAR10(root="./data", train=False, download=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9.91M/9.91M [00:00<00:00, 16.5MB/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28.9k/28.9k [00:00<00:00, 496kB/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1.65M/1.65M [00:00<00:00, 4.47MB/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4.54k/4.54k [00:00<00:00, 11.0MB/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






In [None]:
from torchvision import transforms

# Define a transform to convert PIL image to tensor , we need it to be tensor
transform = transforms.ToTensor()

# Get the first sample
image, label = train_dataset[0]

# Convert PIL image to tensor
image_tensor = transform(image)

# Print the shape of the tensor
print("Image Tensor Shape:", image_tensor.shape) #Image Tensor Shape: torch.Size([1, 28, 28]) , Label: 5,,, since the channel is =1 that means the images is grey scale
print("Label:", label)

Image Tensor Shape: torch.Size([1, 28, 28])
Label: 5


## **2️⃣ Or You Have to Define It Yourself (We will explore how to define it in another lab).**

# **📌 Model Class**
---

## **📌 Key Components:**
1️⃣ **Define Layers (`__init__` method):**  
- Use PyTorch modules (e.g., `nn.Conv2d`, `nn.Linear`) to create the model's architecture.  

2️⃣ **Forward Pass (`forward` method):**  
- Specify how the input should be fed through the layers step by step.  

---

## **1️⃣ You may define It Yourself:**

In [None]:
import torch
import torch.nn as nn

class CustomModel(nn.Module):
    def __init__(self):
        """
        1️⃣ Define all layers in the model.
        """
        super(CustomModel, self).__init__()

        # Convolutional Layer + Activation + Pooling
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1) #out_channnel= usually you choose it 16 or 32 so 28*28
        #TODO    #You remembered something?
        self.relu = nn.ReLU()

        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

        # Fully Connected Layer
        #What is the expected size after doing the maxpooling? Check above to know the original size
        self.fc = nn.Linear(16 * 14 * 14, 10)  # Output 10 classes,#14*14 using the formula

        # ???? Layer
        #TODO. #Which layer is this , multiclass sp we use Softamx
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x): #now we use them not only determine them
        """
        2️⃣ Define the forward pass (how data flows through the model).
        """
        x = self.conv1(x)  # Convolution
        x = self.relu(x)#TODO  # Activation function?
        x = self.pool(x)  # Pooling
        x = x.view(-1, 16*14*14 )#TODO#What should we do before using fully connected layer? -1 for whatevere input batching,
        # 16*14*14 16 is the number of out_channels and 14,14 from the formula
        x = self.fc(x)  # Fully connected layer
        x = self.softmax(x)#TODO  # Convert logits to probabilities
        return x  # Returns probability distribution


###
## **2️⃣ Or It Could Be Given Ready-to-Use (Will explore in another lab):**

In [None]:
# Example: pretrained model
import torchvision.models as models
model = models.resnet18(pretrained=True)



### for our custom model:
#1:initialize an object, 2:use it inside the loop
model = CustomModel()# add parameters if any
pred = model(x)



# **📌 Training Loop**

## **What is the Training Loop?**
The **training loop** is responsible for **updating the model's weights** so that it learns to minimize the loss function.


In [None]:
def train_one_epoch(model, dataloader, criterion, optimizer, device):
    model.train()  # Set model to training mode, you will understand why later# training mode: turn on dropout and batch normalization for the training mode
    total_loss = 0

    # loop for steps ####
    for images, labels in dataloader:

        images, labels = images.to(device) , labels.to(device)#TODO  #We don't want to train in the CPU right? so what we do here


        outputs = model(images)  # Forward pass
        loss = criterion(outputs, labels)  # Compute loss


        #backProbogation steps from stage2
        optimizer.zero_grad()  # Reset , forget gradients
        loss.backward()  # Backpropagation (compute gradients)
        optimizer.step()  # Update model parameters, update weight with gradients that we just calculated in loss.backward()

        # Collect the loss
        total_loss += loss.item()

    return total_loss / len(dataloader)  # Return average loss





#differences: batche = [1, 2, 3, 4, 5]
#Step: each batch you take inside the loop is one size (each time you pass a batch into the loop)
#Epoch: you passed all the data if the epoch is 20 and we have 5 batches,
#it will run them 10 times but with reorder(shuffle the batches)


# **📌 Validation Loop**

## **What is the Validation Loop?**
- The **validation loop** is used to **evaluate model performance** on unseen data.  
- Unlike the training loop, **it does NOT update the model’s weights**.  
- It helps track **loss and accuracy** to monitor model improvements.

In [None]:
#### Doesn't update the weights, we can calculate the accuracy

def validate(model, dataloader, criterion, device): #something wrong######
    model.eval() #evaluation mode: turn on dropout and batch normalization for the evaluation mode
    correct, total = 0, 0
    with torch.no_grad(): #Disable gradient calculation
        for images, labels in dataloader: #every iteration gives us a batch the image and it's label
            #device = torch.device("cuda" if torch.cuda.is_available() else "cpu") #I added it
            images, labels = images.to(device), labels.to(device)#TODO  #Both data should be in the same device right?, my addition


            outputs = model(images)
            correct += (outputs.argmax(dim=1) == labels).sum().item()
            total += labels.size(0)


    #another way van be here
    return 100 * correct / total  # Return accuracy

''' Another way to write the accuracy in percentage inside the for loop as:
avg_loss = total_loss / len(dataloader) # calculate the loss
accuracy = 100 * correct / total
return avg_loss, accuracy
'''



#### any dataset class is designed to read one sample only from the data.
#### DataLoader take the dataset class and read many samples at a time by specifying the batch_size=64 for example(use more GPU so reduce it if you have error),
#and do shuffle in train data only.

#  EX
# train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
# test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# **📌 Full Training Process in PyTorch**

Now that you understand the **Dataset Class, Model Class, Training Loop, and Validation Loop**, it's time to put everything together into a **full training process**.


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import torchvision.transforms as transforms
from torchvision.datasets import MNIST

#Put the data in dataloader
#Why? will know in another lab be patient
# 🔹 Load MNIST Dataset
transform = transforms.ToTensor()
train_dataset = MNIST(root="./data", train=True, transform=transform, download=True)
test_dataset = MNIST(root="./data", train=False, transform=transform, download=True)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)


#Model class :
#Train loop : images and labels on to(device)
#Validation loop: images and labels on to(device)


# Run Training
model = CustomModel()###take it from previous cells
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") #TODO  #The model will go to device.  but which device? the same as the data before
model.to(device) #And the model too on to(device) such as the images and labels in both training, validatio loops.

criterion = nn.CrossEntropyLoss()#TODO  #Which loss function used here? Hint: this is Multi-Class Classification
optimizer = optim.Adam(model.parameters(), lr=0.0008)




#loop for epochs #####
for epoch in range(5):  # Train for 5 epochs
    train_one_epoch(model, train_loader, criterion, optimizer, device)
    accuracy = validate(model, test_loader, criterion, device)
    print(f"Epoch {epoch+1}: Validation Accuracy = {accuracy:.2f}%")


Epoch 1: Validation Accuracy = 92.26%
Epoch 2: Validation Accuracy = 94.50%
Epoch 3: Validation Accuracy = 96.33%
Epoch 4: Validation Accuracy = 96.84%
Epoch 5: Validation Accuracy = 97.27%


![image.png](https://i.imgur.com/1xbDOQX.jpeg)

### Contributed by: Mohamed Eltayeb
