<a href="https://colab.research.google.com/github/anjha1/Deep-Learning/blob/main/12_PyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



### **PyTorch -**

#### **Introduction**

* PyTorch is an **open-source machine learning framework**.
* Mainly used for **developing and training deep learning models**.
* Developed by **Facebook's AI Research Lab** and released in **2016**.
* Offers a **flexible and dynamic approach** to building neural networks.
* Popular among researchers and developers.

#### **Key Features**

1. **Dynamic Computational Graphs**

   * Graphs are **built and modified on-the-fly** as the program runs.
   * Allows for **intuitive and flexible** model development.
   * Supports standard **Python control flow** and easy debugging.

2. **Automatic Differentiation**

   * Efficient computation of **gradients for backpropagation**.
   * Supports **data loading**, **model building**, **optimization**, and **evaluation**.

3. **GPU Acceleration**

   * Enables training on **GPUs** to **speed up computations**.
   * Backed by a **large and active community** with many tutorials and pre-trained models.

4. **Comparison with TensorFlow**

   * TensorFlow: uses **static computation graphs**.
   * PyTorch: uses **dynamic graphs** for more **flexibility and ease of use**.

#### **Use in Industry and Research**

* **Widely used in research**.
* Gaining popularity in **industry applications**.
* Provides a **user-friendly platform** for building deep learning models.

---




---

## 🔥 **PyTorch - In-Depth**

---

### **1. PyTorch Architecture Overview**

* **Core Components**:

  1. **Tensors** – Multidimensional arrays, like NumPy arrays but with GPU support.
  2. **Autograd** – Automatic differentiation engine for backpropagation.
  3. **nn.Module** – Base class for all neural networks.
  4. **torch.optim** – Optimization algorithms (SGD, Adam, etc.).
  5. **Data utilities** – `torch.utils.data.Dataset` & `DataLoader` for handling data.

* **Workflow**:

  * Define model using `nn.Module`
  * Forward pass → loss calculation
  * Backward pass using `autograd`
  * Optimizer updates parameters

---

### **2. Tensors in PyTorch**

* Similar to **NumPy arrays**, but can run on **GPU** using `.to("cuda")` or `.cuda()`.

* Created using:

  ```python
  x = torch.tensor([1.0, 2.0])
  y = torch.zeros(2, 3)
  z = torch.rand(4, 4)
  ```

* **Operations**: element-wise, matrix multiplication, reshaping (`.view()` or `.reshape()`), etc.

* **Device control**:

  ```python
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
  x = x.to(device)
  ```

---

### **3. Autograd - Automatic Differentiation**

* **`requires_grad=True`** tracks computation for automatic differentiation.

* Builds **Dynamic Computation Graph** at runtime.

* Example:

  ```python
  x = torch.tensor([2.0], requires_grad=True)
  y = x**2
  y.backward()
  print(x.grad)  # Output: tensor([4.])
  ```

* **`.backward()`** computes gradients.

* Use **`with torch.no_grad():`** to disable gradient tracking during inference.

---

### **4. `nn.Module` and Model Building**

* Every model in PyTorch is a subclass of `nn.Module`.

#### **Example:**

```python
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.linear = nn.Linear(10, 1)

    def forward(self, x):
        return self.linear(x)
```

* Key methods:

  * `__init__()`: define layers
  * `forward()`: define forward pass

---

### **5. Optimizers (torch.optim)**

* PyTorch provides various optimizers:

  * `SGD`, `Adam`, `RMSprop`, etc.

* Example:

  ```python
  optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
  ```

* Steps:

  1. `optimizer.zero_grad()`
  2. `loss.backward()`
  3. `optimizer.step()`

---

### **6. Data Loading Utilities**

* **`Dataset`**: Custom data logic
* **`DataLoader`**: Batches, shuffling, multiprocessing
* Example:

  ```python
  from torch.utils.data import DataLoader, Dataset

  class MyDataset(Dataset):
      def __init__(self):
          self.data = torch.randn(100, 10)

      def __len__(self):
          return len(self.data)

      def __getitem__(self, idx):
          return self.data[idx]

  loader = DataLoader(MyDataset(), batch_size=32, shuffle=True)
  ```

---

### **7. Training Loop Structure**

```python
for epoch in range(epochs):
    for inputs, targets in dataloader:
        outputs = model(inputs)
        loss = criterion(outputs, targets)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
```

---

### ✅ **Tips for Beginners**

* Use `.to(device)` to move model and tensors to GPU.
* Track gradients only when training (not during inference).
* Use **TensorBoard**, **WandB**, or **Matplotlib** to monitor training.
* Save models with `torch.save()` and load using `torch.load()`.

---




In [1]:
import torch

### Tensors
At its core. PyTorch is a library for processing tensors. A tensor is a number, vector, matrix, or any n-dimensional array. Let's create a tensor with a single number.

In [2]:
t1=torch.tensor(6.0)
t1

tensor(6.)

In [3]:
t1.dtype

torch.float32

**Vector**

In [4]:
t2=torch.tensor([1.,2,3,4])
t2

tensor([1., 2., 3., 4.])

**Matrix**

In [5]:
t3=torch.tensor([[5,6,7],
                [8,9,2],
                [1,2,3]])
t3

tensor([[5, 6, 7],
        [8, 9, 2],
        [1, 2, 3]])

In [6]:
t3.shape

torch.Size([3, 3])

**3-Dimensional-Array**

In [7]:
t4 = torch.tensor([[[1,2,3],[4,5,6],[8,9,10]],[[9,8,7],[6,5,4],[3,2,1]]])
t4

tensor([[[ 1,  2,  3],
         [ 4,  5,  6],
         [ 8,  9, 10]],

        [[ 9,  8,  7],
         [ 6,  5,  4],
         [ 3,  2,  1]]])

In [8]:
t4.ndim

3

In [9]:
t1

tensor(6.)

In [10]:
t1.shape

torch.Size([])

In [11]:
t1.size()

torch.Size([])

In [12]:
t2.shape

torch.Size([4])

In [13]:
t2.size()

torch.Size([4])

In [14]:
t4.shape

torch.Size([2, 3, 3])

In [15]:
t4.size()

torch.Size([2, 3, 3])

---

## 🧮 **Tensor Operations and Gradients in PyTorch**

### **1. Creating Tensors**


In [16]:
x = torch.tensor(3.)
w = torch.tensor(4., requires_grad=True)
b = torch.tensor(5., requires_grad=True)

* `x`, `w`, and `b` are all scalar tensors (single float values).
* `w` and `b` have `requires_grad=True`, which tells PyTorch to **track gradients** for them (useful for autograd).



### **2. Arithmetic Operation**

In [17]:
y = w * x + b
y

tensor(17., grad_fn=<AddBackward0>)


* Combines tensors using standard arithmetic.

* Value of `y`:
  $y = w \times x + b = 4 \times 3 + 5 = 17$

* PyTorch automatically tracks this computation for backpropagation.

---

### **3. Gradient Computation using Autograd**

In [18]:
y.backward()

* Computes the **derivatives of `y` w\.r.t. tensors with `requires_grad=True`**, i.e., `w` and `b`.

* This uses PyTorch's **autograd** system (automatic differentiation engine).

---

### ✅ **Key Concept: Autograd**

* PyTorch keeps track of operations using a **dynamic computation graph**.
* `.backward()` triggers the computation of gradients.
* After `.backward()`, you can access gradients via:

## 🧠 **Viewing Gradients in PyTorch**

### **1. Accessing Gradients**

* PyTorch stores computed gradients in the `.grad` attribute of tensors:

In [19]:
print('dy/dx:', x.grad)  # None
print('dy/dw:', w.grad)  # tensor(3.)
print('dy/db:', b.grad)  # tensor(1.)

dy/dx: None
dy/dw: tensor(3.)
dy/db: tensor(1.)



### **2. Explanation of Gradient Values**

| Derivative | Value  | Reason                                                     |
| ---------- | ------ | ---------------------------------------------------------- |
| dy/dx      | `None` | `x` does **not** have `requires_grad=True`, so no gradient |
| dy/dw      | `3.`   | Gradient of `y = wx + b` w\.r.t. `w` is `x = 3`            |
| dy/db      | `1.`   | Gradient of `y = wx + b` w\.r.t. `b` is `1` (∂y/∂b = 1)    |

---

### ✅ **Key Notes:**

* `.grad` gives **partial derivatives** of output w\.r.t. each tensor with `requires_grad=True`.
* `x.grad` is `None` because we didn’t set `requires_grad=True` for `x`.
* The term **"grad"** is short for **gradient**, which means derivative (commonly used in ML).

---




## 🧮 PyTorch Tensor Functions

### 🔹 1. Creating a Tensor with a Fixed Value

----

📌 This creates a **3×2 tensor** where **every element is 42**.

In [20]:
t6 = torch.full((3, 2), 42)
t6

tensor([[42, 42],
        [42, 42],
        [42, 42]])

### 🔹 2. Tensor Concatenation

In [21]:
t_new=torch.tensor([[5,6],
                [8,2],
                [1,2]])
t_new,t6

(tensor([[5, 6],
         [8, 2],
         [1, 2]]),
 tensor([[42, 42],
         [42, 42],
         [42, 42]]))

In [22]:
t7 = torch.cat((t_new, t6))
t7

tensor([[ 5,  6],
        [ 8,  2],
        [ 1,  2],
        [42, 42],
        [42, 42],
        [42, 42]])


📌 `torch.cat()` joins tensors **along the first dimension (rows)** if not specified otherwise.

---

## ✅ Summary:

| Function                 | Description                                           |
| ------------------------ | ----------------------------------------------------- |
| `torch.full(shape, val)` | Creates a tensor filled with the given constant `val` |
| `torch.cat(tensors)`     | Concatenates tensors with compatible shapes           |

🧠 Make sure the tensors have **matching dimensions except along the axis you're concatenating** (default is `dim=0`).

---



In [23]:
t8=torch.sin(t7)
t8

tensor([[-0.9589, -0.2794],
        [ 0.9894,  0.9093],
        [ 0.8415,  0.9093],
        [-0.9165, -0.9165],
        [-0.9165, -0.9165],
        [-0.9165, -0.9165]])

In [24]:
t9=torch.reshape(t8,(3,2,2))
t9

tensor([[[-0.9589, -0.2794],
         [ 0.9894,  0.9093]],

        [[ 0.8415,  0.9093],
         [-0.9165, -0.9165]],

        [[-0.9165, -0.9165],
         [-0.9165, -0.9165]]])

You can learn more about tensor operations here: [https://pytorch.org/docs/stable/torch.html](https://pytorch.org/docs/stable/torch.html). Experiment with some more tensor functions and operations using the empty cells below.

---

## 🔄 Interoperability with NumPy (PyTorch + NumPy)

### 🔹 NumPy kya hai?

**NumPy** is a powerful library for numerical computing in Python.

**Used with:**

* **Pandas** – Data analysis
* **Matplotlib** – Visualization
* **OpenCV** – Image processing

**Why use with PyTorch?**

> PyTorch doesn't reinvent the wheel — it works well with NumPy arrays to benefit from Python’s existing data science ecosystem.

---

### 🔸 NumPy Array Creation Example

In [25]:
import numpy as np

x = np.array([[1, 2], [3, 4.1]])
x

array([[1. , 2. ],
       [3. , 4.1]])


### 🔸 Converting NumPy array to PyTorch Tensor

In [26]:
y = torch.from_numpy(x)
y

tensor([[1.0000, 2.0000],
        [3.0000, 4.1000]], dtype=torch.float64)

> ✅ `torch.from_numpy()` **shares memory** with NumPy — meaning changes in one reflect in the other unless explicitly copied.

---

## ✅ Summary Table

| Operation       | PyTorch Code                 | Notes                   |
| --------------- | ---------------------------- | ----------------------- |
| NumPy → PyTorch | `torch.from_numpy(np_array)` | Shares memory (no copy) |
| PyTorch → NumPy | `tensor.numpy()`             | Must be CPU tensor      |

---

In [27]:
x.dtype,y.dtype

(dtype('float64'), torch.float64)

We can convert a PyTorch tensor to a Numpy array using the .numpy method of a tensor

In [28]:
y

tensor([[1.0000, 2.0000],
        [3.0000, 4.1000]], dtype=torch.float64)

In [29]:
z=y.numpy()
z

array([[1. , 2. ],
       [3. , 4.1]])

In [30]:
z.dtype

dtype('float64')


**The interoperability between PyTorch and Numpy is essential** because most datasets you'll work with will likely be read and preprocessed as Numpy arrays.

You might wonder why we need a library like PyTorch at all since Numpy already provides data structures and utilities for working with multi-dimensional numeric data. There are two main reasons:

1. **Autograd**: The ability to automatically compute gradients for tensor operations is essential for training deep learning models.
2. **GPU support**: While working with massive datasets and large models, PyTorch tensor operations can be performed efficiently using a Graphics Processing Unit (GPU). Computations that might typically take hours can be completed within minutes using GPUs.

---



 Making Training Data

In [31]:
# Input (temp, rainfall, humidity)
inputs = np.array([
    [73, 67, 43],
    [91, 88, 64],
    [87, 134, 58],
    [102, 43, 37],
    [69, 96, 70]
], dtype='float32')


In [32]:
# Targets (apples, oranges)
target = np.array([
    [56, 70],
    [81, 101],
    [119, 133],
    [22, 37],
    [103, 119]
], dtype='float32')


🔹 Convert Input and Target to Tensors

In [33]:
inputs = torch.from_numpy(inputs)
target = torch.from_numpy(target)

print(inputs, "\n")
print(target)


tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]]) 

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [34]:
w=torch.randn(2,3,requires_grad=True)
b=torch.randn(2,requires_grad=True)
w,b

(tensor([[ 0.5965,  0.8186,  0.7135],
         [-1.5023,  1.5681,  0.3915]], requires_grad=True),
 tensor([ 1.4868, -1.7268], requires_grad=True))

In [35]:
M=w.T

In [36]:
inputs.shape,w.T.shape

(torch.Size([5, 3]), torch.Size([3, 2]))

In [37]:
def model1(y):
    return torch.matmul(y, w.T) + b


In [38]:
def model(x):
  return x @ w.T + b

In [39]:
preds=model(inputs)
preds

tensor([[130.5627,  10.4983],
        [173.4749,  24.6075],
        [204.4653, 100.3995],
        [123.9342, -73.0525],
        [171.1810,  72.5526]], grad_fn=<AddBackward0>)

In [40]:
preds1=model1(inputs)
preds1

tensor([[130.5627,  10.4983],
        [173.4749,  24.6075],
        [204.4653, 100.3995],
        [123.9342, -73.0525],
        [171.1810,  72.5526]], grad_fn=<AddBackward0>)

In [42]:
def MSE(actual,target):
  diff=actual-target
  return torch.sum(diff*diff)/diff.numel()


In [43]:
loss=MSE(preds,target)
loss

tensor(6116.2705, grad_fn=<DivBackward0>)

In [44]:
loss.backward()

In [46]:
print(w,"\n")
print(w.grad)

tensor([[ 0.5965,  0.8186,  0.7135],
        [-1.5023,  1.5681,  0.3915]], requires_grad=True) 

tensor([[ 7279.1094,  7102.8770,  4525.1616],
        [-5712.3613, -4853.7646, -3332.3557]])


In [48]:
w.grad.zero_()
b.grad.zero_()
print(w.grad,"\n")
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]]) 

tensor([0., 0.])


In [49]:
preds=model(inputs)
preds

tensor([[130.5627,  10.4983],
        [173.4749,  24.6075],
        [204.4653, 100.3995],
        [123.9342, -73.0525],
        [171.1810,  72.5526]], grad_fn=<AddBackward0>)

In [50]:
loss=MSE(preds,target)
loss

tensor(6116.2705, grad_fn=<DivBackward0>)

In [51]:
loss.backward()

In [52]:
print(w.grad,"\n")
print(b.grad)

tensor([[ 7279.1094,  7102.8770,  4525.1616],
        [-5712.3613, -4853.7646, -3332.3557]]) 

tensor([ 84.5236, -64.9989])


In [53]:
with torch.no_grad():
  w-=w.grad*1e-5
  b-=b.grad*1e-5
  w.grad.zero_()
  b.grad.zero_()

In [54]:
print(w,"\n")
print(b)

tensor([[ 0.5238,  0.7476,  0.6682],
        [-1.4452,  1.6166,  0.4248]], requires_grad=True) 

tensor([ 1.4859, -1.7261], requires_grad=True)


In [55]:
preds=model(inputs)
preds

tensor([[118.5434,  19.3540],
        [157.7034,  36.2104],
        [185.9892, 113.8067],
        [111.7801, -63.9052],
        [156.1711,  83.4871]], grad_fn=<AddBackward0>)

In [56]:
loss=MSE(preds,target)
loss

tensor(4374.4385, grad_fn=<DivBackward0>)

In [57]:
for i in range(400):
  preds=model(inputs)
  loss=MSE(preds,target)
  loss.backward()

  with torch.no_grad():
    w-=w.grad*1e-5
    b-=b.grad*1e-5
    w.grad.zero_()
    b.grad.zero_()
  print(f"(Epoch: {i}/{100}) &  Loss: {loss}")

(Epoch: 0/100) &  Loss: 4374.4384765625
(Epoch: 1/100) &  Loss: 3197.49853515625
(Epoch: 2/100) &  Loss: 2401.27001953125
(Epoch: 3/100) &  Loss: 1861.6370849609375
(Epoch: 4/100) &  Loss: 1494.958251953125
(Epoch: 5/100) &  Loss: 1244.870361328125
(Epoch: 6/100) &  Loss: 1073.387451171875
(Epoch: 7/100) &  Loss: 954.9134521484375
(Epoch: 8/100) &  Loss: 872.1976318359375
(Epoch: 9/100) &  Loss: 813.6151123046875
(Epoch: 10/100) &  Loss: 771.3317260742188
(Epoch: 11/100) &  Loss: 740.06689453125
(Epoch: 12/100) &  Loss: 716.2627563476562
(Epoch: 13/100) &  Loss: 697.5203857421875
(Epoch: 14/100) &  Loss: 682.2232666015625
(Epoch: 15/100) &  Loss: 669.2815551757812
(Epoch: 16/100) &  Loss: 657.9599609375
(Epoch: 17/100) &  Loss: 647.7632446289062
(Epoch: 18/100) &  Loss: 638.3564453125
(Epoch: 19/100) &  Loss: 629.5144653320312
(Epoch: 20/100) &  Loss: 621.084228515625
(Epoch: 21/100) &  Loss: 612.9625854492188
(Epoch: 22/100) &  Loss: 605.0801391601562
(Epoch: 23/100) &  Loss: 597.3887

In [59]:
preds=model(inputs)
preds

tensor([[ 57.9893,  69.4484],
        [ 81.5940,  97.8730],
        [118.7667, 140.6984],
        [ 24.3170,  33.4585],
        [ 99.2575, 116.0088]], grad_fn=<AddBackward0>)

In [60]:
loss=MSE(preds,target)
loss

tensor(11.4576, grad_fn=<DivBackward0>)

In [61]:
from math import sqrt
sqrt(loss)

3.384905090804598

In [62]:
preds

tensor([[ 57.9893,  69.4484],
        [ 81.5940,  97.8730],
        [118.7667, 140.6984],
        [ 24.3170,  33.4585],
        [ 99.2575, 116.0088]], grad_fn=<AddBackward0>)

In [63]:
target

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


---

## 🧠 Neural Network using PyTorch

### 🔍 GPU Availability Check

To check if GPU is available for training models in Google Colab:

```python
# To check GPU availability
!nvidia-smi
```

### 🖥️ Sample Output:

```
Wed May 24 08:25:17 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      | MIG M.               |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P8     9W /  70W |     3MiB / 15360MiB  |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
```

---

### 📌 Notes:

* `Tesla T4`: This is the GPU available in Google Colab.
* `Memory-Usage`: Only 3 MiB used out of 15,360 MiB (i.e., \~15 GB).
* `GPU-Util`: 0% → GPU is currently **idle**, ready for use.
* **No running processes found** → No model is currently using the GPU.

---

### ✅ Use Case:

Before training deep learning models in PyTorch, it's a good practice to check if GPU is available. It significantly speeds up training time compared to CPU.

---


In [64]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt

In [65]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

100%|██████████| 26.4M/26.4M [00:01<00:00, 20.7MB/s]
100%|██████████| 29.5k/29.5k [00:00<00:00, 335kB/s]
100%|██████████| 4.42M/4.42M [00:00<00:00, 6.06MB/s]
100%|██████████| 5.15k/5.15k [00:00<00:00, 17.0MB/s]


In [66]:
# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

In [67]:
type(training_data)

In [68]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape}, {y.dtype}")
    # print(X)
    # print(y)
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]), torch.int64


In [69]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cpu device


In [70]:
# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


In [71]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In [72]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


In [73]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [74]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.299036  [    0/60000]
loss: 2.292254  [ 6400/60000]
loss: 2.264594  [12800/60000]
loss: 2.264613  [19200/60000]
loss: 2.256015  [25600/60000]
loss: 2.217301  [32000/60000]
loss: 2.226303  [38400/60000]
loss: 2.189348  [44800/60000]
loss: 2.196368  [51200/60000]
loss: 2.159278  [57600/60000]
Test Error: 
 Accuracy: 48.7%, Avg loss: 2.145223 

Epoch 2
-------------------------------
loss: 2.157396  [    0/60000]
loss: 2.155075  [ 6400/60000]
loss: 2.080441  [12800/60000]
loss: 2.107667  [19200/60000]
loss: 2.055507  [25600/60000]
loss: 1.985938  [32000/60000]
loss: 2.027066  [38400/60000]
loss: 1.935456  [44800/60000]
loss: 1.957864  [51200/60000]
loss: 1.883225  [57600/60000]
Test Error: 
 Accuracy: 57.1%, Avg loss: 1.866807 

Epoch 3
-------------------------------
loss: 1.899794  [    0/60000]
loss: 1.880609  [ 6400/60000]
loss: 1.736744  [12800/60000]
loss: 1.800181  [19200/60000]
loss: 1.681061  [25600/60000]
loss: 1.626066  [32000/600

In [75]:
# Save the model
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


In [76]:
# Load model
model = NeuralNetwork()
model.load_state_dict(torch.load("model.pth"))
print("<All keys matched successfully>")

<All keys matched successfully>


In [77]:
# Prediction
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot"
]
model.eval()
x, y = test_data[0][0], test_data[0][1]

In [78]:
with torch.no_grad():
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')


Predicted: "Ankle boot", Actual: "Ankle boot"
