# <font color = 'orange'> Pytorch

---

## Tensors

At its core, PyTorch is a library for processing tensors. A tensor is a number, vector, matrix, or any n-dimensional array. Let's create a tensor with a single number.

In [1]:
import torch

In [2]:
# creating a scaler in torch
t1 = torch.tensor(4.)
print(t1)
print(t1.dtype)

tensor(4.)
torch.float32


In [3]:
# creating an array in torch
t2 = torch.tensor([1,2,3,4])
print(t2)

tensor([1, 2, 3, 4])


In [4]:
# creating a 2D tensor in torch
t3 = torch.tensor([ [5, 6], [7, 8], [9, 10] ])
print(t3)

tensor([[ 5,  6],
        [ 7,  8],
        [ 9, 10]])


In [5]:
# creating 3D tensor in torch
t4 = torch.tensor([
    [
        [11, 12, 13],
        [14, 15, 16]
    ],
    [
        [17, 18, 19],
        [20, 21, 22]
    ]
 ])

print(t4)

tensor([[[11, 12, 13],
         [14, 15, 16]],

        [[17, 18, 19],
         [20, 21, 22]]])


In [6]:
# to find the shape of the tensor

print(t1)
print(t1.shape,'\n')

print(t2)
print(t2.shape,'\n')

print(t3)
print(t3.shape, '\n')

print(t4)
print(t4.shape, '\n')

tensor(4.)
torch.Size([]) 

tensor([1, 2, 3, 4])
torch.Size([4]) 

tensor([[ 5,  6],
        [ 7,  8],
        [ 9, 10]])
torch.Size([3, 2]) 

tensor([[[11, 12, 13],
         [14, 15, 16]],

        [[17, 18, 19],
         [20, 21, 22]]])
torch.Size([2, 2, 3]) 



In [7]:
# we cannot create a ragged array in torch

t5 = torch.tensor([[1, 2, 3],[4, 5],[6]])
print(t5)

ValueError: expected sequence of length 3 at dim 1 (got 2)

---

## Tensor operations and gradients

We can combine tensors with the usual arithmetic operations.  
Let's look at an example:

In [8]:
# Create tensors.
x = torch.tensor(3.)
w = torch.tensor(4., requires_grad=True) # we can differentiate the variable w.r.t to another variable
b = torch.tensor(5., requires_grad=True)
x, w, b

(tensor(3.), tensor(4., requires_grad=True), tensor(5., requires_grad=True))

We've created three tensors: `x`, `w`, and `b`, all numbers. `w` and `b` have an additional parameter `requires_grad` set to `True`. We'll see what it does in just a moment.

Let's create a new tensor `y` by combining these tensors.

In [9]:
# arithmetic operations
y = w * x + b
y

tensor(17., grad_fn=<AddBackward0>)

In [10]:
# compute derivates
# y.backward() tells that we have to differentiate y w.r.t to all the possible variable where requires_grad = True
y.backward() # backward comes from backward propogation

In [11]:
# display gradients
print('dy/dx :',x.grad) #  requires_grad = False
print('dy/dw :',w.grad)
print('dy/db :',b.grad)

dy/dx : None
dy/dw : tensor(3.)
dy/db : tensor(1.)


As expected, `dy/dw` has the same value as `x`, i.e., `3`, and `dy/db` has the value `1`. Note that `x.grad` is `None` because `x` doesn't have `requires_grad` set to `True`.

The "grad" in `w.grad` is short for _gradient_, which is another term for derivative. The term _gradient_ is primarily used while dealing with vectors and matrices.

---

## Tensor functions

Apart from arithmetic operations, the `torch` module also contains many functions for creating and manipulating tensors. Let's look at some examples.

In [12]:
# create a tensor with a fixed value for every element
t6 = torch.full((3, 2), 42)
t6

tensor([[42, 42],
        [42, 42],
        [42, 42]])

In [13]:
t7 = torch.cat((t3, t6))

print(t3, '\n')
print(t6, '\n')

print('Concatination \n',t7)

tensor([[ 5,  6],
        [ 7,  8],
        [ 9, 10]]) 

tensor([[42, 42],
        [42, 42],
        [42, 42]]) 

Concatination 
 tensor([[ 5,  6],
        [ 7,  8],
        [ 9, 10],
        [42, 42],
        [42, 42],
        [42, 42]])


In [14]:
t8 = torch.sin(t7)

t8

tensor([[-0.9589, -0.2794],
        [ 0.6570,  0.9894],
        [ 0.4121, -0.5440],
        [-0.9165, -0.9165],
        [-0.9165, -0.9165],
        [-0.9165, -0.9165]])

In [15]:
t9 = t8.reshape(3, 2, 2)

t9

tensor([[[-0.9589, -0.2794],
         [ 0.6570,  0.9894]],

        [[ 0.4121, -0.5440],
         [-0.9165, -0.9165]],

        [[-0.9165, -0.9165],
         [-0.9165, -0.9165]]])

---

## Interoperability with Numpy

[Numpy](http://www.numpy.org/) is a popular open-source library used for mathematical and scientific computing in Python. It enables efficient operations on large multi-dimensional arrays and has a vast ecosystem of supporting libraries, including:

* [Pandas](https://pandas.pydata.org/) for file I/O and data analysis
* [Matplotlib](https://matplotlib.org/) for plotting and visualization
* [OpenCV](https://opencv.org/) for image and video processing


Instead of reinventing the wheel, PyTorch interoperates well with Numpy to leverage its existing ecosystem of tools and libraries.

In [16]:
import numpy as np

x = np.array([[1, 2], [3, 4]])
x

array([[1, 2],
       [3, 4]])

In [17]:
# converting numpy array to a torch tensor
y = torch.from_numpy(x)
y


tensor([[1, 2],
        [3, 4]], dtype=torch.int32)

In [18]:
# converting torch tensor to a numpy arrary
z = y.numpy()
z

array([[1, 2],
       [3, 4]])

The interoperability between PyTorch and Numpy is essential because most datasets you'll work with will likely be read and preprocessed as Numpy arrays.

You might wonder why we need a library like PyTorch at all since Numpy already provides data structures and utilities for working with multi-dimensional numeric data. There are two main reasons:

1. **Autograd**: The ability to automatically compute gradients for tensor operations is essential for training deep learning models.
2. **GPU support**: While working with massive datasets and large models, PyTorch tensor operations can be performed efficiently using a Graphics Processing Unit (GPU). Computations that might typically take hours can be completed within minutes using GPUs.

---

## Linear Regression using pytorch

---

In [19]:
#making training data
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43],
                   [91, 88, 64],
                   [87, 134, 58],
                   [102, 43, 37],
                   [69, 96, 70]], dtype='float32')

In [20]:
# Targets (apples, oranges)
# we have to output to predict
target = np.array([[56, 70],
                    [81, 101],
                    [119, 133],
                    [22, 37],
                    [103, 119]], dtype='float32')

In [21]:
# converting inputs and target which are in numpy array to a torch tensor
inputs = torch.from_numpy(inputs)
target = torch.from_numpy(target)

print(inputs, '\n')
print(target)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]]) 

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


---

In [22]:
# weights and baises

# 3 weights for each output as there are 3 input features
w = torch.randn(2, 3, requires_grad = True)
# 1 bias for each output
b = torch.randn(2, requires_grad = True)

print(w, '\n')
print(b)

tensor([[-0.3462,  0.0238,  1.0212],
        [-1.9029,  0.5728, -0.9464]], requires_grad=True) 

tensor([-0.2942,  0.3121], requires_grad=True)


---

In [23]:
# define the model
# best fit line formula for multiple linear regression(i.e more than 1 output class)
# y = w.Transpose() * x + b

def model(x):
    return x @ w.t() + b

# shape of the variables
# x = 5x3
# w = 2x3
# b = 1x2

---

In [24]:
# prediction
pred = model(inputs)
# output orange and apple for 5 inputs
pred

tensor([[  19.9360, -140.9179],
        [  35.6483, -183.0155],
        [  31.9985, -143.3764],
        [   3.1990, -204.1723],
        [  49.5818, -142.2466]], grad_fn=<AddBackward0>)

In [25]:
# actual output
print(target)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


---

In [26]:
# loss function MSE

def MSE(actual, predicted):
    diff = actual - predicted
    return torch.sum(diff * diff) / diff.numel()

In [27]:
# error
loss = MSE(target, pred)
loss
# as weights and bais are randomly initialized our loss will be high

tensor(34208.2578, grad_fn=<DivBackward0>)

---

In [28]:
# compute gradients
loss.backward()

In [29]:
print(w, '\n')
# w.grad(gradient descent) gives the (gives slope at particular point)differenitation of loss w.r.t weights
print(w.grad) # dL/dW

tensor([[-0.3462,  0.0238,  1.0212],
        [-1.9029,  0.5728, -0.9464]], requires_grad=True) 

tensor([[ -3986.4746,  -4800.4077,  -2786.8525],
        [-21582.5508, -22321.8770, -14097.3867]])


In [30]:
print(b, '\n')
print(b.grad) # dL/dB

tensor([-0.2942,  0.3121], requires_grad=True) 

tensor([ -48.1273, -254.7457])


---

In [31]:
# reset gradient
w.grad.zero_()
b.grad.zero_()

# after every epoch we reset the gradient otherwise it will keep on accumulating
print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


In [33]:
# Training for multiple epochs
for i in range(400):
    preds = model(inputs)
    loss = MSE(target, preds)
    loss.backward() # to calculate the gradient of 6 weights and 2 bias

    with torch.no_grad():
        lr =  1e-5 # learning rate
        # change weights and bais using w = w - (alpha * dL/dw) to reduce loss
        w -= w.grad * lr
        b -= b.grad * lr
        # reset the gradient
        w.grad.zero_()
        b.grad.zero_()

    print(f'Epoch {i+1} / 400 and loss {loss}')

# our loss will get reduced for every epoch

Epoch 1 / 400 and loss 45.273277282714844
Epoch 2 / 400 and loss 45.1656379699707
Epoch 3 / 400 and loss 45.05843734741211
Epoch 4 / 400 and loss 44.951759338378906
Epoch 5 / 400 and loss 44.84560775756836
Epoch 6 / 400 and loss 44.73998260498047
Epoch 7 / 400 and loss 44.634727478027344
Epoch 8 / 400 and loss 44.530052185058594
Epoch 9 / 400 and loss 44.42581558227539
Epoch 10 / 400 and loss 44.32209777832031
Epoch 11 / 400 and loss 44.21875762939453
Epoch 12 / 400 and loss 44.11594772338867
Epoch 13 / 400 and loss 44.013572692871094
Epoch 14 / 400 and loss 43.91169738769531
Epoch 15 / 400 and loss 43.81022644042969
Epoch 16 / 400 and loss 43.7092399597168
Epoch 17 / 400 and loss 43.608673095703125
Epoch 18 / 400 and loss 43.508541107177734
Epoch 19 / 400 and loss 43.4088134765625
Epoch 20 / 400 and loss 43.309627532958984
Epoch 21 / 400 and loss 43.21076202392578
Epoch 22 / 400 and loss 43.11243438720703
Epoch 23 / 400 and loss 43.01438903808594
Epoch 24 / 400 and loss 42.91684722900

Epoch 352 / 400 and loss 22.612197875976562
Epoch 353 / 400 and loss 22.57071304321289
Epoch 354 / 400 and loss 22.52927017211914
Epoch 355 / 400 and loss 22.487890243530273
Epoch 356 / 400 and loss 22.446653366088867
Epoch 357 / 400 and loss 22.405479431152344
Epoch 358 / 400 and loss 22.364376068115234
Epoch 359 / 400 and loss 22.323314666748047
Epoch 360 / 400 and loss 22.282373428344727
Epoch 361 / 400 and loss 22.241512298583984
Epoch 362 / 400 and loss 22.200754165649414
Epoch 363 / 400 and loss 22.160032272338867
Epoch 364 / 400 and loss 22.11937713623047
Epoch 365 / 400 and loss 22.078868865966797
Epoch 366 / 400 and loss 22.038389205932617
Epoch 367 / 400 and loss 21.997961044311523
Epoch 368 / 400 and loss 21.957674026489258
Epoch 369 / 400 and loss 21.917396545410156
Epoch 370 / 400 and loss 21.87723159790039
Epoch 371 / 400 and loss 21.837207794189453
Epoch 372 / 400 and loss 21.797197341918945
Epoch 373 / 400 and loss 21.75723648071289
Epoch 374 / 400 and loss 21.717418670

In [34]:
# prediction with new model

preds = model(inputs)
loss = MSE(target, preds)
print(loss)
# our loss is reduced from 22129.2246 to 133.5157

tensor(20.6700, grad_fn=<DivBackward0>)


---

In [35]:
print('Actual Values \n', target,'\n\n')
print('Predicted Values \n',preds)
# our prediction are some what closer

Actual Values 
 tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]]) 


Predicted Values 
 tensor([[ 56.9172,  70.8043],
        [ 84.1505,  96.5361],
        [114.6704, 141.5655],
        [ 20.1411,  39.1192],
        [105.7920, 110.7839]], grad_fn=<AddBackward0>)


---

## Neural Network using Pytorch
Dataset used is the Fashion MNIST dataset which is present in the `torch vision` module

In [36]:
import torch
from torch import nn # neural network
from torch.utils.data import DataLoader
from torchvision import datasets # dataset is present in torchvision module
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt

In [37]:
# downloading training data
training_data = datasets.FashionMNIST(root = 'data', train = True, download = True, transform = ToTensor())

# downloading test data
test_data = datasets.FashionMNIST(root = 'data', train = False, download = True, transform = ToTensor())

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data\FashionMNIST\raw\train-images-idx3-ubyte.gz


100%|█████████████████████████████████████████████████████████████████| 26421880/26421880 [00:10<00:00, 2454193.22it/s]


Extracting data\FashionMNIST\raw\train-images-idx3-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data\FashionMNIST\raw\train-labels-idx1-ubyte.gz


100%|████████████████████████████████████████████████████████████████████████| 29515/29515 [00:00<00:00, 144954.47it/s]


Extracting data\FashionMNIST\raw\train-labels-idx1-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz


100%|███████████████████████████████████████████████████████████████████| 4422102/4422102 [00:02<00:00, 1950706.79it/s]


Extracting data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz


100%|█████████████████████████████████████████████████████████████████████████| 5148/5148 [00:00<00:00, 2620103.99it/s]

Extracting data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz to data\FashionMNIST\raw






In [38]:
type(training_data)

torchvision.datasets.mnist.FashionMNIST

In [39]:
batch_size = 64

# creating data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

# for everytime we call test_dataloader it will give us the 64 records
for x, y in test_dataloader:
    print("Shape of X [N, C, H, W]: ", x.shape)
    print("Shape of y: ", y.shape, y.dtype)
    print()
    print(x)
    print(y)
    break

Shape of X [N, C, H, W]:  torch.Size([64, 1, 28, 28])
Shape of y:  torch.Size([64]) torch.int64

tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],


        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],


        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],


        ...,


        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 

In [40]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cpu device


In [41]:
# define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        # input_layer = 28*28, hidden_layer1 = 512, hidden_layer2 = 512, output_layer = 10
        self.linear_relu_stack = nn.Sequential(
                nn.Linear(28*28, 512), # connect from 28*28 neuron layer to 512 neuron layer
                nn.ReLU(),
                nn.Linear(512, 512),
                nn.ReLU(),
                nn.Linear(512, 10),
                nn.Softmax()
            )

    def forward(self, x):
        # input is x we have to flatten the image
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits


model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): Softmax(dim=None)
  )
)


In [42]:
# loss calculation

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 1e-3)

In [43]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train() # telling model that it is training model
    # ex: batch normalization layer behaves different during train and evaluation time
    for batch, (x, y) in enumerate(dataloader):
        x, y = x.to(device), y.to(device)

        # predict and calculate the error
        pred = model(x)
        loss = loss_fn(pred, y)

        # back propogation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(x)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [44]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [45]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.301853  [    0/60000]


  return self._call_impl(*args, **kwargs)


loss: 2.301404  [ 6400/60000]
loss: 2.301467  [12800/60000]
loss: 2.302562  [19200/60000]
loss: 2.300681  [25600/60000]
loss: 2.301144  [32000/60000]
loss: 2.301294  [38400/60000]
loss: 2.300223  [44800/60000]
loss: 2.301973  [51200/60000]
loss: 2.301639  [57600/60000]
Test Error: 
 Accuracy: 17.4%, Avg loss: 2.300764 

Epoch 2
-------------------------------
loss: 2.300459  [    0/60000]
loss: 2.300156  [ 6400/60000]
loss: 2.299892  [12800/60000]
loss: 2.301411  [19200/60000]
loss: 2.299201  [25600/60000]
loss: 2.299381  [32000/60000]
loss: 2.300031  [38400/60000]
loss: 2.298625  [44800/60000]
loss: 2.300725  [51200/60000]
loss: 2.300221  [57600/60000]
Test Error: 
 Accuracy: 19.1%, Avg loss: 2.299264 

Epoch 3
-------------------------------
loss: 2.299032  [    0/60000]
loss: 2.298865  [ 6400/60000]
loss: 2.298250  [12800/60000]
loss: 2.300213  [19200/60000]
loss: 2.297649  [25600/60000]
loss: 2.297542  [32000/60000]
loss: 2.298701  [38400/60000]
loss: 2.296918  [44800/60000]
loss: 

In [46]:
#save model
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


In [47]:
#load model
model = NeuralNetwork()
model.load_state_dict(torch.load("model.pth"))

<All keys matched successfully>

In [48]:
# Prediction

classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"


---