<a href="https://colab.research.google.com/github/RafsanJany-44/Machine-School/blob/main/PyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#TENSORS
Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs or other specialized hardware to accelerate computing. If you’re familiar with ndarrays, you’ll be right at home with the Tensor API. If not, follow along in this quick API walkthrough.

In [113]:
import torch
import numpy as np

#Tensor Initialization
Tensors can be initialized in various ways. Take a look at the following examples:
<br><br>
<b>Directly from data</b>

Tensors can be created directly from data. The data type is automatically inferred.



In [114]:
data = [[1, 2], [3, 4]]
x_data = torch.tensor(data)

In [115]:
x_data

tensor([[1, 2],
        [3, 4]])

#From a NumPy array

Tensors can be created from NumPy arrays (and vice versa - see Bridge with NumPy).

In [116]:
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

In [117]:
np_array

array([[1, 2],
       [3, 4]])

In [118]:
x_np

tensor([[1, 2],
        [3, 4]])

#From another tensor:

The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.

In [119]:
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 

Random Tensor: 
 tensor([[0.9247, 0.6148],
        [0.1284, 0.7852]]) 



#With random or constant values:

shape is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor.



In [120]:
shape = (2, 3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Random Tensor: 
 tensor([[0.7593, 0.5892, 0.9364],
        [0.0156, 0.0845, 0.0341]]) 

Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])


#Tensor Attributes
Tensor attributes describe their shape, datatype, and the device on which they are stored.

In [121]:
tensor = torch.rand(3, 4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


#Tensor Operations
Over 100 tensor operations, including transposing, indexing, slicing, mathematical operations, linear algebra, random sampling, and more are comprehensively described here.

Each of them can be run on the GPU (at typically higher speeds than on a CPU). If you’re using Colab, allocate a GPU by going to Edit > Notebook Settings.



In [122]:
# We move our tensor to the GPU if available
if torch.cuda.is_available():
  tensor = tensor.to('cuda')
  print(f"Device tensor is stored on: {tensor.device}")
else:
  print("No GPU, You gorib!!!")

Device tensor is stored on: cuda:0


Standard numpy-like indexing and slicing:

In [123]:
tensor = torch.ones(4, 4)
print(tensor)
tensor[:,1] = 0
print(tensor)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])


#Joining tensors
Use torch.cat to concatenate a sequence of tensors along a given dimension. See also torch.stack, another tensor joining op that is subtly different from torch.cat.

In [124]:
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)

tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])


#Multiplying tensors

In [125]:
# This computes the element-wise product
print(f"tensor.mul(tensor) \n {tensor.mul(tensor)} \n")
# Alternative syntax:
print(f"tensor * tensor \n {tensor * tensor}")

tensor.mul(tensor) 
 tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]]) 

tensor * tensor 
 tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])


This computes the matrix multiplication between two tensors

In [126]:
print(f"tensor.matmul(tensor.T) \n {tensor.matmul(tensor.T)} \n")
# Alternative syntax:
print(f"tensor @ tensor.T \n {tensor @ tensor.T}")

tensor.matmul(tensor.T) 
 tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]]) 

tensor @ tensor.T 
 tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]])


In-place operations Operations that have a _ suffix are in-place. For example: x.copy_(y), x.t_(), will change x.

In [127]:
print(tensor, "\n")
tensor.add_(5)
print(tensor)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]]) 

tensor([[6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.]])


#Bridge with NumPy
Tensors on the CPU and NumPy arrays can share their underlying memory locations, and changing one will change the other.

Tensor to NumPy array

In [128]:
t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")

t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]


In [129]:
t.add_(1)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]


#NumPy array to Tensor


In [130]:
n = np.ones(5)
t = torch.from_numpy(n)

np.add(n, 1, out=n)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
n: [2. 2. 2. 2. 2.]


#TORCH.AUTOGRAD
torch.autograd is PyTorch’s automatic `differentiation engine` that powers neural network training. In this section, you will get a conceptual understanding of how autograd helps a neural network train.

#Usage in PyTorch
Let’s take a look at a single training step. For this example, we load a pretrained resnet18 model from torchvision. We create a random data tensor to represent a single image with 3 channels, and height & width of 64, and its corresponding label initialized to some random values. Label in pretrained models has shape (1,1000).

In [131]:
import torch
from torchvision.models import resnet18, ResNet18_Weights
model = resnet18(weights=ResNet18_Weights.DEFAULT)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

Next, we run the input data through the model through each of its layers to make a prediction. This is the forward pass.

In [132]:
prediction = model(data) # forward pass

We use the model’s prediction and the corresponding label to calculate the error (loss). The next step is to backpropagate this error through the network. Backward propagation is kicked off when we call .backward() on the error tensor. Autograd then calculates and stores the gradients for each model parameter in the parameter’s .grad attribute.

In [133]:
loss = (prediction - labels).sum()
loss.backward() # backward pass

Next, we load an optimizer, in this case SGD with a learning rate of 0.01 and momentum of 0.9. We register all the parameters of the model in the optimizer.

In [134]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

Finally, we call .step() to initiate gradient descent. The optimizer adjusts each parameter by its gradient stored in .grad.

In [135]:
optim.step() #gradient descent

#Differentiation in Autograd
Let’s take a look at how autograd collects gradients. We create two tensors a and b with requires_grad=True. This signals to autograd that every operation on them should be tracked.

In [136]:
import torch

a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

We create another tensor Q from a and b.<br>
Q=3a<sup>3</sup>−b<sup>2</sup>

In [137]:
Q = 3*a**3 - b**2

Let’s assume a and b to be parameters of an NN, and Q to be the error. In NN training, we want gradients of the error w.r.t. parameters,<br>

dQ/da = 9a<sup>2</sup>

dQ/db = −2b

When we call `.backward()` on Q, autograd calculates these gradients and stores them in the respective tensors’ .grad attribute.

We need to explicitly pass a gradient argument in `Q.backward()` because it is a vector. gradient is a tensor of the same shape as Q, and it represents the gradient of Q w.r.t. itself, i.e.<br>

dQ/dQ = 1

Equivalently, we can also aggregate Q into a scalar and call backward implicitly, like `Q.sum().backward().`

In [138]:
external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad)

Gradients are now deposited in `a.grad` and `b.grad`

In [139]:
# check if collected gradients are correct
print(9*a**2 == a.grad)
print(-2*b == b.grad)

tensor([True, True])
tensor([True, True])


#Exclusion from the DAG
`torch.autograd` tracks operations on all tensors which have their `requires_grad` flag set to True. For tensors that don’t require gradients, setting this attribute to False excludes it from the gradient computation DAG.

The output tensor of an operation will require gradients even if only a single input tensor has `requires_grad=True`.



In [140]:
x = torch.rand(5, 5)
y = torch.rand(5, 5)
z = torch.rand((5, 5), requires_grad=True)

a = x + y
print(f"Does `a` require gradients? : {a.requires_grad}")
b = x + z
print(f"Does `b` require gradients?: {b.requires_grad}")

Does `a` require gradients? : False
Does `b` require gradients?: True


In a NN, parameters that don’t compute gradients are usually called frozen parameters. It is useful to “freeze” part of your model if you know in advance that you won’t need the gradients of those parameters (this offers some performance benefits by reducing autograd computations).

In finetuning, we freeze most of the model and typically only modify the classifier layers to make predictions on new labels. Let’s walk through a small example to demonstrate this. As before, we load a pretrained resnet18 model, and freeze all the parameters.

In [141]:
from torch import nn, optim

model = resnet18(weights=ResNet18_Weights.DEFAULT)

# Freeze all the parameters in the network
for param in model.parameters():
    param.requires_grad = False

Let’s say we want to finetune the model on a new dataset with 10 labels. In resnet, the classifier is the last linear layer model.fc. We can simply replace it with a new linear layer (unfrozen by default) that acts as our classifier.

In [142]:
model.fc = nn.Linear(512, 10)

Now all parameters in the model, except the parameters of `model.fc`, are frozen. The only parameters that compute gradients are the weights and bias of `model.fc`.

In [143]:
#Optimize only the classifier
optimizer = optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

#NEURAL NETWORKS
Neural networks can be constructed using the `torch.nn` package.

Now that you had a glimpse of autograd, nn depends on autograd to define models and differentiate them. An `nn.Module` contains layers, and a method forward(input) that returns the output.

For example, look at this network that classifies digit images:

convnet


It is a simple feed-forward network. It takes the input, feeds it through several layers one after the other, and then finally gives the output.
<br>
<br>
<br>
A typical training procedure for a neural network is as follows:

1.Define the neural network that has some learnable parameters (or weights)

2.Iterate over a dataset of inputs

3.Process input through the network

4.Compute the loss (how far is the output from being correct)

5.Propagate gradients back into the network’s parameters

Update the weights of the network, typically using a simple update rule:`weight = weight - learning_rate * gradient`

#Define the network

In [144]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square, you can specify with a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


You just have to define the forward function, and the backward function (where gradients are computed) is automatically defined for you using autograd. You can use any of the Tensor operations in the forward function.

The learnable parameters of a model are returned by net.parameters()

In [156]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight
print(params)

10
torch.Size([6, 1, 5, 5])
[Parameter containing:
tensor([[[[ 1.9232e-01, -5.6334e-03, -1.6369e-01,  1.2523e-01, -1.4956e-01],
          [ 1.9773e-01, -1.2453e-01, -1.0987e-01, -1.8757e-01, -1.4330e-01],
          [-1.8613e-01, -1.3763e-01, -1.4940e-01, -1.4837e-01,  6.2525e-02],
          [ 1.6503e-01, -1.2780e-01,  1.5100e-01, -1.6044e-01, -7.9404e-02],
          [-1.4913e-01,  1.9683e-01, -1.1957e-01, -1.7480e-01, -8.9490e-02]]],


        [[[ 1.1825e-01,  4.8853e-02,  1.8553e-01,  1.4455e-01,  1.6862e-01],
          [-1.5426e-01,  3.5097e-02, -6.4067e-03,  1.4910e-04, -1.3959e-01],
          [-8.1503e-02, -1.4404e-02, -1.0750e-02,  1.7928e-01,  3.4924e-02],
          [-9.8808e-02, -3.5931e-02,  1.9002e-01,  1.7632e-01, -1.2082e-01],
          [ 8.7676e-02, -1.4107e-01,  1.8256e-01, -4.8303e-03,  6.1732e-04]]],


        [[[-1.0891e-01,  9.1681e-02,  1.2000e-01,  5.1473e-02,  1.1189e-01],
          [-3.8148e-02,  1.5144e-01,  1.7460e-01,  2.0005e-01, -6.9418e-02],
          [ 4.191

Let’s try a random 32x32 input. <br>Note: expected input size of this net (LeNet) is 32x32. To use this net on the MNIST dataset, please resize the images from the dataset to 32x32.

In [146]:
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

tensor([[-0.0406,  0.0197, -0.1107, -0.0444, -0.0290,  0.0264,  0.0165,  0.0543,
         -0.0335,  0.0046]], grad_fn=<AddmmBackward0>)


Zero the gradient buffers of all parameters and backprops with random gradients:

In [147]:
net.zero_grad()
out.backward(torch.randn(1, 10))

<font color='coral'>NOTE</font>

torch.nn only supports mini-batches. The entire torch.nn package only supports inputs that are a mini-batch of samples, and not a single sample.

For example, nn.Conv2d will take in a 4D Tensor of nSamples x nChannels x Height x Width.

If you have a single sample, just use input.unsqueeze(0) to add a fake batch dimension.

Before proceeding further, let’s recap all the classes you’ve seen so far.

Recap:<br>
torch.Tensor - A multi-dimensional array with support for autograd operations like backward(). Also holds the gradient w.r.t. the tensor.

nn.Module - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.

nn.Parameter - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module.

autograd.Function - Implements forward and backward definitions of an autograd operation. Every Tensor operation creates at least a single Function node that connects to functions that created a Tensor and encodes its history.

At this point, we covered:
Defining a neural network

Processing inputs and calling backward

Still Left:
Computing the loss

Updating the weights of the network

<b>Loss Function</b><br>
A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.

There are several different loss functions under the nn package . A simple loss is: nn.MSELoss which computes the mean-squared error between the output and the target.

In [157]:
input

tensor([[[[-0.3298,  0.5683,  0.9246,  ..., -1.1337,  0.8672,  0.9334],
          [ 1.2841, -0.4362,  1.4985,  ...,  0.5957, -0.1073, -1.3281],
          [-0.6991,  1.2390,  0.0929,  ...,  0.6532, -1.0390,  1.1587],
          ...,
          [ 0.7134, -0.2108, -1.8475,  ..., -0.2834,  0.5376,  0.0920],
          [ 1.4700,  0.0525, -1.5714,  ...,  0.0936,  0.1018,  1.3091],
          [-1.9015,  0.2292, -0.1164,  ...,  0.2501, -0.2112,  0.9867]]]])

In [148]:
output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(1.0661, grad_fn=<MseLossBackward0>)


Now, if you follow loss in the backward direction, using its .grad_fn attribute, you will see a graph of computations that looks like this:

In [149]:
"""input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d-> flatten -> linear -> relu -> linear -> relu -> linear-> MSELoss-> loss"""

'input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d\n      -> flatten -> linear -> relu -> linear -> relu -> linear\n      -> MSELoss\n      -> loss'

So, when we call loss.backward(), the whole graph is differentiated w.r.t. the neural net parameters, and all Tensors in the graph that have requires_grad=True will have their .grad Tensor accumulated with the gradient.

In [150]:
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

<MseLossBackward0 object at 0x7fd110ba68f0>
<AddmmBackward0 object at 0x7fd110ba7820>
<AccumulateGrad object at 0x7fd110ba68f0>


<b>Backprop</b><br>
To backpropagate the error all we have to do is to loss.backward(). You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

Now we shall call loss.backward(), and have a look at conv1’s bias gradients before and after the backward.

In [151]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
None
conv1.bias.grad after backward
tensor([ 0.0033,  0.0062,  0.0012, -0.0184, -0.0012,  0.0032])


Now, we have seen how to use loss functions.

Read Later:

The neural network package contains various modules and loss functions that form the building blocks of deep neural networks. A full list with documentation is here.

The only thing left to learn is:

Updating the weights of the network

<b>Update the weights</b><br>
The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):

In [152]:
#weight = weight - learning_rate * gradient

We can implement this using simple Python code:

In [153]:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this, we built a small package: torch.optim that implements all these methods. Using it is very simple:

In [154]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

NOTE

Observe how gradient buffers had to be manually set to zero using optimizer.zero_grad(). This is because gradients are accumulated as explained in the Backprop section.