<h1><center>PyTorch Introduction</center></h1>

<br> A deep learning framework is an interface, library or a tool which allows us to build deep learning models more easily and quickly, without getting into the details of underlying algorithms. They provide a clear and concise way for defining models using a collection of pre-built and optimized components. PyTorch, TensorFlow, Keras are examples for commonly used deep learning frameworks.


<br><br>A standard deeplearning framework can:
<ul><li>Build and operate computational graphs (More on this to follow)</li>
    <li>Perform forward and back propagation</li>
    <li>Parallelize on GPU</li>
    <li>Provide standard architecture and other widely used primitives</li>
    </ul>
    
Why Pytorch?

<ul><li>Dynamic computational graph</li>
    <li>Memory efficient Usage</li>
    <li>Developer friendly: easy to implement, debug</li>
    <li>Tensor computation with strong gpu acceleration</li>
    </ul>

<h2>Basics of PyTorch</h2>

<h3>Building Block #1 : Tensors</h3>

If you’ve ever done machine learning in python, you’ve probably come across NumPy. The reason why we use Numpy is because it’s much faster than Python lists at doing matrix ops. Why? Because it does most of the heavy lifting in C.

But, in case of training deep neural networks, NumPy arrays simply don’t cut it. Code written using NumPy arrays alone would take months to train some of the state of the art networks. This is where Tensors come into play. PyTorch provides us with a data structure called Tensors, which is very similar to NumPy’s ndarray. But unlike the latter, <b>tensors can tap into resources of a GPU to significantly speed up matrix operations</b>.

In [None]:
!pip3 install torch

Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.2.106 (from torch)
  Using cached nvidia_curand_cu12-10.3.2.106-py3-

In [None]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0


In [None]:
import torch
import numpy as np

In [None]:
print(torch.__version__)

2.3.1+cu121


#**Tensor**

###Initializing a tensor

In [None]:
# Initializing a tensor
a = torch.Tensor([1, 2,3])
print(a.size())
print(a)

torch.Size([3])
tensor([1., 2., 3.])


In [None]:
a.type()

'torch.FloatTensor'

In [None]:
t = torch.Tensor([[1, 2, 3], [4, 5, 6]])
print(t)
print(t.t())
print(t.size())

tensor([[1., 2., 3.],
        [4., 5., 6.]])
tensor([[1., 4.],
        [2., 5.],
        [3., 6.]])
torch.Size([2, 3])


### Creating a scalar tensor

In [None]:
a = torch.tensor(20)
print(a)
print(a.size())
print(a.type())

tensor(20)
torch.Size([])
torch.LongTensor


###Converting pytorch tensor to scalar

In [None]:
print(a.item())
print(type(a.item()))

20
<class 'int'>


###Creating a float tensor

In [None]:
a = torch.tensor([2.0,3])
a.type()

'torch.FloatTensor'

###Providing the data type while creating the tensor

In [None]:
a = torch.tensor([2,3],dtype=torch.int32)
a.type()

'torch.IntTensor'

###Just like numpy, creating a tensor by copying another tensor results, pointing to same memory location









In [None]:
a = torch.tensor([2.0, 3])
b = a
c=a.clone()
b[0] = 11
print(a)
print(b)
print(c)

tensor([11.,  3.])
tensor([11.,  3.])
tensor([2., 3.])


###2D tensor

In [None]:
a = torch.tensor([[1,2],[3,4]])
print(a)

tensor([[1, 2],
        [3, 4]])


###Creating and initializing a tensor with random values

In [None]:
a = torch.rand(3,4)
print(a)

tensor([[0.1166, 0.9067, 0.9322, 0.1386],
        [0.4537, 0.1143, 0.8421, 0.8472],
        [0.3904, 0.6540, 0.0512, 0.9952]])


###Creating and initializing a float tensor with zeros

In [None]:
a = torch.zeros((4,5), dtype=torch.float32)
print(a)

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])


###Creating an Identity tensor

In [None]:
a = torch.eye(4,4)
print(a)

tensor([[1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [0., 0., 1., 0.],
        [0., 0., 0., 1.]])


#**Tensor Operations**

###Addition of two tensor

In [None]:
x = torch.rand(3,3)
y = torch.rand(3,3)
z = x+y
print(z)

tensor([[0.8764, 0.7288, 0.8337],
        [1.2990, 1.3119, 0.5343],
        [0.2397, 0.4752, 1.9146]])


###Addition of two tensor (alternative)

In [None]:
z = x.add(y)
print(z)
print(x)

tensor([[0.8764, 0.7288, 0.8337],
        [1.2990, 1.3119, 0.5343],
        [0.2397, 0.4752, 1.9146]])
tensor([[0.0111, 0.0831, 0.7757],
        [0.3303, 0.7507, 0.0851],
        [0.1100, 0.0687, 0.9490]])


###Inplace addition of two tensor

In [None]:
print(x)

x.add_(y)

print(x)

tensor([[0.0111, 0.0831, 0.7757],
        [0.3303, 0.7507, 0.0851],
        [0.1100, 0.0687, 0.9490]])
tensor([[0.8764, 0.7288, 0.8337],
        [1.2990, 1.3119, 0.5343],
        [0.2397, 0.4752, 1.9146]])


###Element wise multiplication

In [None]:
print(x*y)

tensor([[0.7583, 0.4706, 0.0484],
        [1.2583, 0.7362, 0.2400],
        [0.0311, 0.1932, 1.8488]])


###Matrix multiplication

In [None]:
z = torch.matmul(x,y[:,0])
print(z)
z.size()

tensor([1.5724, 2.4641, 0.9160])


torch.Size([3])

###Reshaping tensors

In [None]:
x = torch.rand(5,6)
print("Size of x :\n",x.size())
print("x:\n",x)
y = x.view(30)
print("Size of y :\n",y.size())
print("y:\n",y)
y = x.view(3,10)
print("Size of y :\n",y.size())
print("y:\n",y)

z = x.view(15,-1)
print("Size of z :\n",z.size())
print("z:\n",z)

z = x.view(-1,15)
print("Size of z :\n",z.size())
print("z:\n",z)

Size of x :
 torch.Size([5, 6])
x:
 tensor([[0.9707, 0.0126, 0.8908, 0.5658, 0.0661, 0.7999],
        [0.4487, 0.3790, 0.6069, 0.3659, 0.6588, 0.8794],
        [0.9515, 0.5225, 0.7153, 0.5951, 0.1788, 0.1032],
        [0.2507, 0.5446, 0.0125, 0.2736, 0.2900, 0.1199],
        [0.1821, 0.9479, 0.3811, 0.0064, 0.1512, 0.4417]])
Size of y :
 torch.Size([30])
y:
 tensor([0.9707, 0.0126, 0.8908, 0.5658, 0.0661, 0.7999, 0.4487, 0.3790, 0.6069,
        0.3659, 0.6588, 0.8794, 0.9515, 0.5225, 0.7153, 0.5951, 0.1788, 0.1032,
        0.2507, 0.5446, 0.0125, 0.2736, 0.2900, 0.1199, 0.1821, 0.9479, 0.3811,
        0.0064, 0.1512, 0.4417])
Size of y :
 torch.Size([3, 10])
y:
 tensor([[0.9707, 0.0126, 0.8908, 0.5658, 0.0661, 0.7999, 0.4487, 0.3790, 0.6069,
         0.3659],
        [0.6588, 0.8794, 0.9515, 0.5225, 0.7153, 0.5951, 0.1788, 0.1032, 0.2507,
         0.5446],
        [0.0125, 0.2736, 0.2900, 0.1199, 0.1821, 0.9479, 0.3811, 0.0064, 0.1512,
         0.4417]])
Size of z :
 torch.Size([15, 2]

###Generating Numbers in Sequence

In [None]:
x = torch.arange(9)
print(x)
x = x.view(3,3)
print(x)

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8])
tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])


###Concatenating two tensors at a given dimension

In [None]:
print(x)
print(torch.cat((x,x),dim=0))
print(torch.cat((x,x),dim=1))
print(x+x)

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8],
        [0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
tensor([[0, 1, 2, 0, 1, 2],
        [3, 4, 5, 3, 4, 5],
        [6, 7, 8, 6, 7, 8]])
tensor([[ 0,  2,  4],
        [ 6,  8, 10],
        [12, 14, 16]])


###Reducing the redundant dimension by squeeze operation

In [None]:
x1= torch.rand(1,5)
print("size of x1 squeezing:\n",x1.size())
print(x1)
x1 = x1.squeeze()
print("size of x1 after squeezing:\n", x1.size())
print(x1)

size of x1 squeezing:
 torch.Size([1, 5])
tensor([[0.5327, 0.3877, 0.8448, 0.6483, 0.3719]])
size of x1 after squeezing:
 torch.Size([5])
tensor([0.5327, 0.3877, 0.8448, 0.6483, 0.3719])


In [None]:
x1 = torch.rand(1,1,5)
print("size of x1 before squeezing:\n", x1.size())
print(x1)
x1 = x1.squeeze()
print("size of x1 after squeezing:\n", x1.size())
print(x1)

size of x1 before squeezing:
 torch.Size([1, 1, 5])
tensor([[[0.0103, 0.5598, 0.0166, 0.7481, 0.0359]]])
size of x1 after squeezing:
 torch.Size([5])
tensor([0.0103, 0.5598, 0.0166, 0.7481, 0.0359])


In [None]:
x2 = torch.rand(3,1,4,1,5)
print("size of x2 before squeezing:\n", x2.size())
#print(x2)
x2 = x2.squeeze()
print("size of x2 after squeezing:\n", x2.size())
x2 = x2.squeeze()
print("size of x2 after squeezing:\n", x2.size())
#print(x2)

size of x2 before squeezing:
 torch.Size([3, 1, 4, 1, 5])
size of x2 after squeezing:
 torch.Size([3, 4, 5])
size of x2 after squeezing:
 torch.Size([3, 4, 5])


###Extending the dimension by unsqueeze operation

In [None]:
x = torch.rand(3,4,5,6)
print(x.size())
print(x.unsqueeze(dim=2).size())
print(x.unsqueeze(dim=3).size())
print(x.unsqueeze(dim=4).size())

torch.Size([3, 4, 5, 6])
torch.Size([3, 4, 1, 5, 6])
torch.Size([3, 4, 5, 1, 6])
torch.Size([3, 4, 5, 6, 1])


###Numpy Bridge
####Converting a numpy array to torch tensor

In [None]:
x = np.array([[1.0,2],[3,4]])
print(type(x))
y = torch.from_numpy(x)
print(y)
y.type()

<class 'numpy.ndarray'>
tensor([[1., 2.],
        [3., 4.]], dtype=torch.float64)


'torch.DoubleTensor'

####Converting a torch tensor to numpy array

In [None]:
z = y.numpy()
z.dtype

dtype('float64')

**CUDA Tensors**

####Checking the availability of CUDA/GPU

In [None]:
torch.cuda.is_available()

True

In [None]:
# Linux bash command to print the status of nvidia gpu (memory and processes)
!nvidia-smi

Sun Aug 18 10:32:32 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   48C    P8              11W /  70W |      3MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

####Defining the device object (cpu/gpu)

In [None]:
device = torch.device("cuda")
x = torch.rand(3,3)
print(x)
x.type()

tensor([[0.7483, 0.3773, 0.2417],
        [0.2552, 0.0043, 0.6945],
        [0.4298, 0.0839, 0.3604]])


'torch.FloatTensor'

In [None]:
x = x.to(device)
x.type()

'torch.cuda.FloatTensor'


###Transferring the tensor to the respective device (here cpu->gpu)
####Transferring the tensor from gpu->cpu


In [None]:
# Transferring the tensor from gpu->cpu
device_cpu = torch.device("cpu")
x = x.to(device_cpu)
x.type()

'torch.FloatTensor'

<h3> Building Block #2 : Computation Graph</h3>
    
   Computation graphs lie at the heart of the way modern deep learning networks work, and PyTorch is no exception. Let us first get the hang of what they are.
   
Now, why should we create such a graph when we can sequentially execute operations required to compute the output?

Imagine, what were to happen, if you didn’t merely have to calculate the output but also train the network. You’ll have to compute the gradients for all the weights labelled by purple nodes. That would require you to figure your way around chain rule, and then update the weights.

<b>The computation graph is simply a data structure that allows you to efficiently apply the chain rule to compute gradients for all of your parameters.</b>

<h3> Building Block #3 : Variables</h3>

The Variable, just like a Tensor is a class that is used to hold data. It differs, however, in the way it’s meant to be used. <b>Variables are specifically tailored to hold values which change during training of a neural network, i.e. the learnable paramaters of our network.</b> Tensors on the other hand are used to store values that are not to be learned. For example, a Tensor maybe used to store the values of the loss generated by each example.

A Variable class wraps a tensor. You can access this tensor by calling <b>.data</b> attribute of a Variable.

The Variable also stores the gradient of a scalar quantity (say, loss) with respect to the parameter it holds. This gradient can be accessed by calling the <b>.grad</b> attribute. This is basically the gradient computed up to this particular node, and the gradient of the ever subsequent node, can be computed by multiplying the edge weight with the gradient computed at the node just before it.

The third attribute a Variable holds is a <b>grad_fn</b>, a Function object which created the variable.

While training neural networks, there are two steps: the forward pass, and the backward pass. Normally, if you were to implement it using python functions, you will have to define two functions. One, to compute the output during forward pass, and another, to compute the gradient to be propagated.

<b>PyTorch abstracts the need to write two separate functions (for forward, and for backward pass), into two member of functions of a single class called torch.autograd.Function.</b>

PyTorch combines Variables and functions to create a computation graph.

In [None]:
import torch
from torch.autograd import Variable
N=64
D_in=1000
H=100
D_out=1
x = Variable(torch.randn(N, D_in), requires_grad=False)
y = Variable(torch.randn(N, D_out), requires_grad=False)
w1 = Variable(torch.randn(D_in,H), requires_grad=True)
b1 = Variable(torch.randn(H), requires_grad=True)
w2 = Variable(torch.randn(H, D_out), requires_grad=True)
b2 = Variable(torch.randn(D_out), requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    y_pred = (x.mm(w1)+b1).clamp(min=0).mm(w2)+b2
    loss = (y_pred-y).pow(2).sum()
    loss.backward()
    w1.data -= learning_rate*w1.grad
    w2.data -= learning_rate*w2.grad
    b1.data -= learning_rate*b1.grad
    b2.data -= learning_rate*b2.grad
    w1.grad.data.zero_()
    w2.grad.data.zero_()
    b1.grad.data.zero_()
    b2.grad.data.zero_()
    if t%10==0:
      print(loss)

tensor(2350935.5000, grad_fn=<SumBackward0>)
tensor(7613089., grad_fn=<SumBackward0>)
tensor(1627877., grad_fn=<SumBackward0>)
tensor(8939.0469, grad_fn=<SumBackward0>)
tensor(818.5197, grad_fn=<SumBackward0>)
tensor(229.6561, grad_fn=<SumBackward0>)
tensor(68.0868, grad_fn=<SumBackward0>)
tensor(20.6655, grad_fn=<SumBackward0>)
tensor(6.3920, grad_fn=<SumBackward0>)
tensor(2.0080, grad_fn=<SumBackward0>)
tensor(0.6394, grad_fn=<SumBackward0>)
tensor(0.2060, grad_fn=<SumBackward0>)
tensor(0.0671, grad_fn=<SumBackward0>)
tensor(0.0220, grad_fn=<SumBackward0>)
tensor(0.0073, grad_fn=<SumBackward0>)
tensor(0.0025, grad_fn=<SumBackward0>)
tensor(0.0009, grad_fn=<SumBackward0>)
tensor(0.0004, grad_fn=<SumBackward0>)
tensor(0.0002, grad_fn=<SumBackward0>)
tensor(9.1850e-05, grad_fn=<SumBackward0>)
tensor(5.4582e-05, grad_fn=<SumBackward0>)
tensor(3.4742e-05, grad_fn=<SumBackward0>)
tensor(2.3655e-05, grad_fn=<SumBackward0>)
tensor(1.6587e-05, grad_fn=<SumBackward0>)
tensor(1.2095e-05, grad_f

In [None]:
print(w1.size())
print(w2.size())

torch.Size([1000, 100])
torch.Size([100, 1])


In [None]:
w1.ndim

2