# Pytorch implementation

In [1]:
!pip install numpy torch

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

In [1]:
import torch

# Tensors

At its core, Pytorch is a library for processing tensors. A tensor is a number, vector, matrix or any n-dimensional array.


In [2]:
# Let's create tensor with a single number
t1 = torch.tensor(4.8)
t1

tensor(4.8000)

In [3]:
# Vector tensor - 1D

t2 = torch.tensor([1,2,3,4,5.])
t2

tensor([1., 2., 3., 4., 5.])

In [4]:
# Matrix - 2D

t3 = torch.tensor([ [1,2,3,100], [4,5,6,200], [7,8,9,300] ])
t3

tensor([[  1,   2,   3, 100],
        [  4,   5,   6, 200],
        [  7,   8,   9, 300]])

In [5]:
# Matrix - 3D

t4 = torch.tensor([
    [
       [1,2,100],
       [3,4,200],
    ],
    [
        [5,6,1000],
        [7,8,2000],
    ]
])
t4

tensor([[[   1,    2,  100],
         [   3,    4,  200]],

        [[   5,    6, 1000],
         [   7,    8, 2000]]])

**NOTE:** Tensors can have any number of dimensions and different lenghts along each dimension.

We can inspect the lenght along each dimension using the ```.shape``` property of tensor.

In [6]:
print(t1)
print("\nt1 shape:", t1.shape)

tensor(4.8000)

t1 shape: torch.Size([])


In [7]:
print(t2)
t2.shape

tensor([1., 2., 3., 4., 5.])


torch.Size([5])

In [8]:
print(t3)
t3.shape

tensor([[  1,   2,   3, 100],
        [  4,   5,   6, 200],
        [  7,   8,   9, 300]])


torch.Size([3, 4])

In [9]:
print(t4)
t4.shape

tensor([[[   1,    2,  100],
         [   3,    4,  200]],

        [[   5,    6, 1000],
         [   7,    8, 2000]]])


torch.Size([2, 2, 3])

In [10]:
# Matrix - 3D but improper

t5 = torch.tensor([
    [
       [1,2],
       [3,4,200],
    ],
    [
        [5,6,1000],
        [7,8,2000],
    ]
])
t5

# It is not possible to create tensors with an improper shape

ValueError: expected sequence of length 2 at dim 2 (got 3)

# Tensors operations and gradients

We can combine tensors with the usual arithmetic operations. Let's look at an example.

In [11]:
# Create tensors

x = torch.tensor(5.)
y = torch.tensor(10, requires_grad= True)
z = torch.tensor(15, requires_grad= True)

x, y, z

RuntimeError: Only Tensors of floating point and complex dtype can require gradients

In [15]:
x = torch.tensor(5.)
y = torch.tensor(10., requires_grad= True)
z = torch.tensor(15., requires_grad= True)

# It means we can calculate derivatives of y and z but not x
x, y, z

(tensor(5.), tensor(10., requires_grad=True), tensor(15., requires_grad=True))

## 1.) Arithmetic operations


In [16]:
a = x * y + z
a

tensor(65., grad_fn=<AddBackward0>)

## 2.) Autograd(automic gradients)

Note: What makes Pytorch unique is that we can automatically compute the derivative of "z" w.r.t. the tensors having ```requires_grad = True```, i.e. here y and z. This feature is called *Autograd*.

To compute the derivatives, we can invoke the ```.backward``` method on our result "a".

In [17]:
a.backward()

NOTE: The derivatives of a w.r.t. input tensors are stored in the ```.grad``` property of the respective tensors.

In [18]:
y.grad

tensor(5.)

In [19]:
z.grad

tensor(1.)

In [20]:
# Compute derivatives [a = x * y + z]

print("da/dx", x.grad)
print("da/dy", y.grad)      # x = 5
print("da/dz", z.grad)      # 1

da/dx None
da/dy tensor(5.)
da/dz tensor(1.)


In [21]:
# Let's differentatie once more.

print("da/dx", x.grad)
print("da/dy", y.grad)
print("da/dz", z.grad)

# No change occured, same value

da/dx None
da/dy tensor(5.)
da/dz tensor(1.)


### Note:

1. ```da/dy``` has the same value as ```x```, i.e. 5.

2. ```da/dz``` has value as ```1```.

3. ```da/dx``` is None because x doesn't have ```requires_set= True```

The ```grad``` in y.grad is short for gradient, which is another term for gradient.

The term gradient is primarily used while dealing with vectors and matrices.

# Tensor functions

Apart from Arithmetic opeartions, the torch module also contains many functions for creating and manipulating tensors.

Let's ook at some examples.

## 1.) Tensor with fixed value for all element

In [22]:
t6 = torch.full((4,5), 8)
t6

tensor([[8, 8, 8, 8, 8],
        [8, 8, 8, 8, 8],
        [8, 8, 8, 8, 8],
        [8, 8, 8, 8, 8]])

In [23]:
t7 = torch.full((4,5), 0)
t7

tensor([[0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]])

## 2.) Concatanate 2 tensors with compatible shapes

In [24]:
t8 = torch.cat((t6, t7))
t8

tensor([[8, 8, 8, 8, 8],
        [8, 8, 8, 8, 8],
        [8, 8, 8, 8, 8],
        [8, 8, 8, 8, 8],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]])

In [25]:
t3

tensor([[  1,   2,   3, 100],
        [  4,   5,   6, 200],
        [  7,   8,   9, 300]])

In [26]:
t9 = torch.full((3,4), 2)
t9

tensor([[2, 2, 2, 2],
        [2, 2, 2, 2],
        [2, 2, 2, 2]])

In [27]:
t10 = torch.cat((t3, t9))
t10

tensor([[  1,   2,   3, 100],
        [  4,   5,   6, 200],
        [  7,   8,   9, 300],
        [  2,   2,   2,   2],
        [  2,   2,   2,   2],
        [  2,   2,   2,   2]])

## 3.) Calculate the Sine of each element

In [28]:
t11 = torch.sin(t10)
t11

tensor([[ 0.8415,  0.9093,  0.1411, -0.5064],
        [-0.7568, -0.9589, -0.2794, -0.8733],
        [ 0.6570,  0.9894,  0.4121, -0.9998],
        [ 0.9093,  0.9093,  0.9093,  0.9093],
        [ 0.9093,  0.9093,  0.9093,  0.9093],
        [ 0.9093,  0.9093,  0.9093,  0.9093]])

## 4.) Change the shape of tensor

In [29]:
t12 = t11.reshape(3, 2, 4)
t12

tensor([[[ 0.8415,  0.9093,  0.1411, -0.5064],
         [-0.7568, -0.9589, -0.2794, -0.8733]],

        [[ 0.6570,  0.9894,  0.4121, -0.9998],
         [ 0.9093,  0.9093,  0.9093,  0.9093]],

        [[ 0.9093,  0.9093,  0.9093,  0.9093],
         [ 0.9093,  0.9093,  0.9093,  0.9093]]])

# Interoperability with Numpy

Instead of reinventing the wheel, Pytorch interoperates well with Numpy to leverage its existing ecosystem of tools and libraries.

In [30]:
# Create an array in Numpy

import numpy as np

x = np.array([ [1,2],[3,4] ])
x

array([[1, 2],
       [3, 4]])

**NOTE:** We can convert a numpy array to a Pytorch tensor using ```torch.from_numpy(numpy_array_name)```.

We can also convert back from tensor to numpy using ```.numpy(tensor_name)```

In [31]:
# Converting a Numpy array to a torch tensor

y = torch.from_numpy(x)
y

tensor([[1, 2],
        [3, 4]])

In [32]:
x.dtype, y.dtype

(dtype('int64'), torch.int64)

In [33]:
# Converting a Tensor to a Numpy

z = y.numpy()
z

array([[1, 2],
       [3, 4]])

# Why Interoperability?

- The interoperability between Numpy and Pytorch is essential because most datasets you'll work with likely be read and preprocessed as Numpy arrays.

- You might wonder why we need a library like Pytorch at all since Numpy already provides data structures and utilities for working with multi-dimentional numeric data. There are 2 main reasons -

1. **Autograd:** The ability to automatically compute gradients for tensor operations is essential for training depp learning models.

2. **GPU Support:** While working with massive datasets and large models, Pytorch tensor operations can be performed efficiently using a GPU.

- Computations that might typically take hours can be completed within minutes using GPUs.

# Linear Regression from Scratch using Pytorch

In [34]:
import numpy as np
import torch

In [35]:
# Making trainign data
# (temp, rainfall, humidity)

INPUTS = np.array([
    [70,60,42],
    [90,50,40],
    [87,134,58],
    [100,43,37],
    [69,96,70]],
    dtype = 'float32')

In [39]:
# Targets -> (apples, oranges)
TARGET = np.array([
    [60, 70],
    [50,40],
    [134,58],
    [43,37],
    [96,70]],
    dtype = 'float32')

In [40]:
# Convert both I/P and O/P in "tensors"

INPUT = torch.from_numpy(INPUTS)
TARGET = torch.from_numpy(TARGET)

print("inputs: ", INPUT)
print("\ntarget: ", TARGET)


inputs:  tensor([[ 70.,  60.,  42.],
        [ 90.,  50.,  40.],
        [ 87., 134.,  58.],
        [100.,  43.,  37.],
        [ 69.,  96.,  70.]])

target:  tensor([[ 60.,  70.],
        [ 50.,  40.],
        [134.,  58.],
        [ 43.,  37.],
        [ 96.,  70.]])


In [50]:
# Initiating Weights and Biases

w = torch.randn(2,3 , requires_grad=True)
b = torch.randn(2, requires_grad=True)

print("Weight: ", w)
print("\nBiases: ", b)

Weight:  tensor([[ 1.3838, -2.4968,  0.8480],
        [-0.7035,  0.1487,  1.2486]], requires_grad=True)

Biases:  tensor([0.8560, 0.0715], requires_grad=True)


In [51]:
#define the model

def model(x):
  return x @ w.t() + b

# NOTE: From here, I didn't heard lecture

In [53]:
# prediction
preds = model(INPUTS)
print(preds)

TypeError: unsupported operand type(s) for @: 'numpy.ndarray' and 'Tensor'

In [54]:
print(TARGET)

tensor([[ 60.,  70.],
        [ 50.,  40.],
        [134.,  58.],
        [ 43.,  37.],
        [ 96.,  70.]])


In [55]:
# Loss function MSE

def MSE(actual, target):
  diff = actual - target
  return torch.sum(diff * diff) / diff.numel()


In [57]:
# Error

loss = MSE(TARGET, preds)
print("loss: ", loss)

NameError: name 'preds' is not defined

In [58]:
# Compute Gradients

loss.backwards()

NameError: name 'loss' is not defined

In [59]:
print("w: ", w, "\n")
print("w.grad: ", w.grad)

w:  tensor([[ 1.3838, -2.4968,  0.8480],
        [-0.7035,  0.1487,  1.2486]], requires_grad=True) 

w.grad:  None


In [60]:
print("b: ", b, "\n")
print("b.grad: ", b.grad)

b:  tensor([0.8560, 0.0715], requires_grad=True) 

b.grad:  None


In [61]:
# Reset grad

w.grad.zero_()
b.grad.zero_()

print(w.grad)
print(b.grad)

AttributeError: 'NoneType' object has no attribute 'zero_'

In [64]:
# Adjust parameters

preds = model(INPUT)
print(preds)

loss = MSE(TARGET, preds)
print("\nLoss: ", loss)

loss.backward()
print("\nw.grad:", w.grad)
print("\nb.grad:", b.grad)


tensor([[ -16.4715,   12.1895],
        [  34.4764,   -5.8652],
        [-164.1422,   31.2112],
        [  63.2480,  -17.6872],
        [ -83.9969,   53.2079]], grad_fn=<AddBackward0>)

Loss:  tensor(13722.2441, grad_fn=<DivBackward0>)

w.grad: tensor([[-25850.0859, -37034.7266, -19785.3574],
        [-10279.5635,  -7989.1128,  -5409.1655]])

b.grad: tensor([-329.9316, -121.1663])


In [65]:
# Adjust weight and reset grad
with torch.no_grad():
  w -= w.grad * 1e-5
  b -= b.grad * 1e-5
  w.grad.zero_()
  b.grad.zero_()

In [66]:
print(w)
print(b)

tensor([[ 1.6423, -2.1264,  1.0458],
        [-0.6007,  0.2286,  1.3027]], requires_grad=True)
tensor([0.8593, 0.0727], requires_grad=True)


In [67]:
# Calculate again

preds = model(INPUT)
loss = MSE(TARGET, preds)
print(loss)

tensor(6985.7822, grad_fn=<DivBackward0>)


In [68]:
# Training for multiple epochs

for i in range(400):
  preds= model(INPUT)
  loss = MSE(TARGET, preds)
  loss.backward()

  with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()
  print(f"Epochs({i}/{100}) & Loss: {loss}")

Epochs(0/100) & Loss: 6985.7822265625
Epochs(1/100) & Loss: 6192.0703125
Epochs(2/100) & Loss: 5610.65625
Epochs(3/100) & Loss: 5180.0078125
Epochs(4/100) & Loss: 4856.5068359375
Epochs(5/100) & Loss: 4609.2392578125
Epochs(6/100) & Loss: 4416.28564453125
Epochs(7/100) & Loss: 4262.1064453125
Epochs(8/100) & Loss: 4135.685546875
Epochs(9/100) & Loss: 4029.208251953125
Epochs(10/100) & Loss: 3937.133544921875
Epochs(11/100) & Loss: 3855.529296875
Epochs(12/100) & Loss: 3781.604248046875
Epochs(13/100) & Loss: 3713.37646484375
Epochs(14/100) & Loss: 3649.436767578125
Epochs(15/100) & Loss: 3588.78125
Epochs(16/100) & Loss: 3530.69873046875
Epochs(17/100) & Loss: 3474.67578125
Epochs(18/100) & Loss: 3420.348388671875
Epochs(19/100) & Loss: 3367.453125
Epochs(20/100) & Loss: 3315.80078125
Epochs(21/100) & Loss: 3265.252685546875
Epochs(22/100) & Loss: 3215.70751953125
Epochs(23/100) & Loss: 3167.090576171875
Epochs(24/100) & Loss: 3119.34423828125
Epochs(25/100) & Loss: 3072.425048828125
E

In [69]:
preds = model(INPUT)
loss = MSE(TARGET, preds)
print(loss)

tensor(136.5571, grad_fn=<DivBackward0>)


In [70]:
from math import sqrt
sqrt(loss)

11.685764122635833

In [71]:
preds

tensor([[ 64.1042,  46.4683],
        [ 53.7314,  44.4190],
        [114.2530,  59.7681],
        [ 45.3772,  41.0406],
        [112.4409,  78.7282]], grad_fn=<AddBackward0>)

In [73]:
TARGET

tensor([[ 60.,  70.],
        [ 50.,  40.],
        [134.,  58.],
        [ 43.,  37.],
        [ 96.,  70.]])

array([[ 60.,  70.],
       [ 50.,  40.],
       [134.,  58.],
       [ 43.,  37.],
       [ 96.,  70.]], dtype=float32)

## You can see they are almost close earch other

# Neural Network using Pytorch

In [77]:
# To check GPU

!nvidia-smi

/bin/bash: line 1: nvidia-smi: command not found


In [78]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt

In [79]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

100%|██████████| 26.4M/26.4M [00:01<00:00, 17.9MB/s]
100%|██████████| 29.5k/29.5k [00:00<00:00, 273kB/s]
100%|██████████| 4.42M/4.42M [00:00<00:00, 4.89MB/s]
100%|██████████| 5.15k/5.15k [00:00<00:00, 8.08MB/s]


In [80]:
type(training_data)

In [81]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print("Shape of X [N, C, H, W]: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)
    # print(X)
    # print(y)
    break

Shape of X [N, C, H, W]:  torch.Size([64, 1, 28, 28])
Shape of y:  torch.Size([64]) torch.int64


In [82]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cpu device


In [83]:
# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


In [84]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In [85]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [86]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [87]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.303740  [    0/60000]
loss: 2.294145  [ 6400/60000]
loss: 2.266220  [12800/60000]
loss: 2.266222  [19200/60000]
loss: 2.256898  [25600/60000]
loss: 2.208330  [32000/60000]
loss: 2.231146  [38400/60000]
loss: 2.184168  [44800/60000]
loss: 2.185485  [51200/60000]
loss: 2.159570  [57600/60000]
Test Error: 
 Accuracy: 42.4%, Avg loss: 2.151335 

Epoch 2
-------------------------------
loss: 2.153939  [    0/60000]
loss: 2.150010  [ 6400/60000]
loss: 2.088743  [12800/60000]
loss: 2.116952  [19200/60000]
loss: 2.060699  [25600/60000]
loss: 1.990301  [32000/60000]
loss: 2.032094  [38400/60000]
loss: 1.941968  [44800/60000]
loss: 1.955834  [51200/60000]
loss: 1.890530  [57600/60000]
Test Error: 
 Accuracy: 53.7%, Avg loss: 1.883492 

Epoch 3
-------------------------------
loss: 1.906819  [    0/60000]
loss: 1.884526  [ 6400/60000]
loss: 1.768864  [12800/60000]
loss: 1.819063  [19200/60000]
loss: 1.697651  [25600/60000]
loss: 1.649017  [32000/600

In [88]:
#save model
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


In [89]:
#load model
model = NeuralNetwork()
model.load_state_dict(torch.load("model.pth"))

<All keys matched successfully>

In [90]:
## Prediction

classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"
