## Difference between Numpy arrays and Pytorch tensors

- GPU Usage
- AutoGrad Functionality - Automatic Gradient Computation

In [1]:
import torch

In [2]:
torch.__version__

'2.4.1+cu121'

tensors -> Generalized N-dim containers of data

![Tensors](./image.png)

In [3]:
l = [1, 2, 3]
t1 = torch.tensor(l)
t1

tensor([1, 2, 3])

In [4]:
type(t1)

torch.Tensor

In [5]:
import numpy as np

In [6]:
np_array = np.array([2, 3, 4])
t2 = torch.tensor(np_array, dtype=torch.int64)
t2

tensor([2, 3, 4])

In above cell dtype must be specified, in order to create a tensor from numpy array

In [7]:
ones = torch.ones((4, 4))
ones

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [8]:
zeros = torch.zeros((3,3,3))
zeros

tensor([[[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]]])

In [9]:
rand = torch.rand((3,4))
rand

tensor([[0.5065, 0.5688, 0.4653, 0.7962],
        [0.2921, 0.6281, 0.9642, 0.1720],
        [0.2326, 0.0644, 0.2773, 0.3396]])

Every time we run this above cell, the outputs are differenrt, in order to make it reproducible, use `torch.manual_seed(10)`

In [10]:
torch.manual_seed(10)

rand = torch.rand((3,4))
rand

tensor([[0.4581, 0.4829, 0.3125, 0.6150],
        [0.2139, 0.4118, 0.6938, 0.9693],
        [0.6178, 0.3304, 0.5479, 0.4440]])

Random integres between 0 and 4

In [11]:
randint = torch.randint(0, 5, size=(2,2))
randint

tensor([[1, 0],
        [1, 3]])

In [12]:
arange = torch.arange(0, 10)
arange

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [13]:
arange3 = torch.arange(0, 10, 3)
arange3

tensor([0, 3, 6, 9])

In [14]:
reshape = torch.arange(0, 9).reshape(3, 3)
reshape

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])

If we want to create a tensor of zeros or ones, but with the shape of a different tensor

In [15]:
zeros1 = torch.zeros_like(reshape)
zeros1

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

In [16]:
ones1 = torch.ones_like(reshape)
ones1

tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]])

In [17]:
tsr = torch.arange(0, 9).reshape(3, 3)
tsr

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])

## Zero dimensional Tensor

In [18]:
zerod = torch.tensor(5)
zerod

tensor(5)

In [19]:
zerod = torch.tensor(3)
oned = torch.tensor([3,4,5])
twod = torch.tensor([[3,4,5], [4,6,7]])
threed = torch.tensor([[[4,5,6], [23,3,5]], [[4,5,1], [6,7,8]]])

In [20]:
print('zerod shape : ', zerod.shape)
print('oned shape : ', oned.shape)
print('twod shape : ', twod.shape)
print('threed shape : ', threed.shape)

zerod shape :  torch.Size([])
oned shape :  torch.Size([3])
twod shape :  torch.Size([2, 3])
threed shape :  torch.Size([2, 2, 3])


In [21]:
print('zerod ndim : ', zerod.ndim)
print('oned ndim : ', oned.ndim)
print('twod ndim : ', twod.ndim)
print('threed ndim : ', threed.ndim)

zerod ndim :  0
oned ndim :  1
twod ndim :  2
threed ndim :  3


## Tensor Attributes

In [22]:
tsr.shape

torch.Size([3, 3])

In [23]:
tsr.ndim

2

In [24]:
tsr.dtype

torch.int64

In [25]:
tsr.device

device(type='cpu')

By default tensors are created using `CPU`s, to use `GPU`s we need to explicitly mention the device to be `GPU`

### To check if a `GPU` is available

In [26]:
torch.cuda.is_available()

True

### To check the number of cuda devices

In [27]:
torch.cuda.device_count()

2

### To check the name of the device

In [28]:
torch.cuda.get_device_name()

'NVIDIA A100 80GB PCIe'

We can also pass the index of the device as the parameter to this function

In [29]:
torch.cuda.get_device_name(0)

'NVIDIA A100 80GB PCIe'

We can explicitly mention if we need `GPU`

In [30]:
cuda_tensor = torch.tensor([2, 4, 5], device='cuda')
cuda_tensor.device

device(type='cuda', index=0)

We can move tensors back and forth to and from `CPUs` or `GPUs`

In [31]:
cuda_tensor.to('cpu')
cuda_tensor.device

device(type='cuda', index=0)

This wouldn't work, you need to assign it to a variable

In [32]:
cuda_tensor = cuda_tensor.to('cpu')
cuda_tensor.device

device(type='cpu')

## Automatic Typecasting

In [33]:
t1 = torch.tensor([4, 6.7, 8])
t1.dtype

torch.float32

In [34]:
t1

tensor([4.0000, 6.7000, 8.0000])

Everything got upcasted

We can also explicitly set the type of the tensor

In [35]:
t2 = torch.tensor([5, 1.2, 5], dtype=torch.int64)
t2

tensor([5, 1, 5])

Everything got downcasted

## Accessing elements in a tensor

In [36]:
t = torch.tensor([[545,45,43], [54, 54, 43], [3, 56, 4]])
t[0]

tensor([545,  45,  43])

In [37]:
t[0][2]

tensor(43)

If we want to take out the scaler part laone

In [38]:
t[0][2].item()

43

Remember that only zero dimensional tensor could be taken out as a scalar

In [39]:
t[0].item()

RuntimeError: a Tensor with 3 elements cannot be converted to Scalar

#### Slicing

In [40]:
t[:1, -1]

tensor([43])

In [41]:
t[:2, 1:]

tensor([[45, 43],
        [54, 43]])

## Broadcasting

In [42]:
tensor = torch.tensor([4, 5, 5, 1, 33, 0])
tensor > 3

tensor([ True,  True,  True, False,  True, False])

In [43]:
tensor = torch.tensor([4, 5, 5, 1, 33, 0])
tensor[tensor > 3]

tensor([ 4,  5,  5, 33])

In [44]:
tensor + 10

tensor([14, 15, 15, 11, 43, 10])

## Standard Functions

In [45]:
t = t[:2]
t

tensor([[545,  45,  43],
        [ 54,  54,  43]])

In [46]:
t.sum()

tensor(784)

In [47]:
torch.sum(t)

tensor(784)

dim=0 => same shape as a row

In [48]:
t.sum(dim=0)

tensor([599,  99,  86])

dim=1 => same shape as a column

In [49]:
t.sum(dim=1)

tensor([633, 151])

In [50]:
t.max()

tensor(545)

In [51]:
t.max(dim=0)

torch.return_types.max(
values=tensor([545,  54,  43]),
indices=tensor([0, 1, 0]))

In [52]:
t.max(dim=0).values

tensor([545,  54,  43])

In [53]:
t.max(dim=0).indices

tensor([0, 1, 0])

## Matrix Multiplication

`torch.mul()`, `*` -> Element-wise multiplication

In [54]:
t1 = torch.tensor([3, 4, 5])
t2 = torch.tensor([4, 5, 6])

torch.mul(t1, t2)

tensor([12, 20, 30])

which is same as

In [55]:
t1 * t2

tensor([12, 20, 30])

`torch.mul()`, `@` -> Matrix multiplication

In [56]:
t3 = torch.tensor([[3, 4, 5], [4, 6, 1]])
t4 = torch.tensor([[4, 5], [5, 1], [9, 8]])

torch.matmul(t3, t4)

tensor([[77, 59],
        [55, 34]])

which is same as

In [57]:
t3 @ t4

tensor([[77, 59],
        [55, 34]])

### Transpose

In [58]:
t3.T

tensor([[3, 4],
        [4, 6],
        [5, 1]])

## Storing and loading tensors/models

In [59]:
torch.save(t3, 'weights.pt')

some people may also use `.pth` as an extension

`torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling

In [60]:
t3 = torch.load('weights.pt', weights_only=True)
t3

tensor([[3, 4, 5],
        [4, 6, 1]])

In [61]:
a = torch.tensor([[1.00001, 2.00002], [3.00004, 4.00001]])
b = torch.tensor([[1., 2], [3, 4]])

torch.allclose(a, b, atol=1e-4)

True

In [62]:
torch.allclose(a, b, atol=1e-5)

False

In [63]:
vec = torch.linspace(-5, 5, 10)
vec

tensor([-5.0000, -3.8889, -2.7778, -1.6667, -0.5556,  0.5556,  1.6667,  2.7778,
         3.8889,  5.0000])

In [64]:
vec2 = torch.clone(vec)
vec2

tensor([-5.0000, -3.8889, -2.7778, -1.6667, -0.5556,  0.5556,  1.6667,  2.7778,
         3.8889,  5.0000])

## Computing gradients

To compute gradients, the `dtype` must be of float type

In [65]:
grad = torch.tensor([5, 6, 7], requires_grad=True)

RuntimeError: Only Tensors of floating point and complex dtype can require gradients

In [66]:
grad = torch.tensor([5, 6, 7], dtype=torch.float32, requires_grad=True)
grad

tensor([5., 6., 7.], requires_grad=True)

## AutoGrad

Note that
- `requires_grad` must be set to `True`, to compute gradient
- The tensor should be `float`, in order to compute gradient (`requires_grad=True`)
- `grad` is an attribute,`backward()` is a function
- Perform `backward()` for the function y
- `dy/dx` can be found by `x.grad`

In [67]:
x = torch.tensor(12., requires_grad=True)
y = 3*x**2
y.backward()
x.grad

tensor(72.)

We can also do this for functions having more than one variable

In [68]:
x = torch.tensor(12., requires_grad=True)
z = torch.tensor(3., requires_grad=True)
y = 3*x**2 - z**3
y.backward()
print('dy/dx =',x.grad)
print('dy/dz =',z.grad)

dy/dx = tensor(72.)
dy/dz = tensor(-27.)


In this case, if atleast only one variable has `requires_grad=True`, we won't encounter an error

In [69]:
x = torch.tensor(12., requires_grad=True)
z = torch.tensor(3., requires_grad=False) # requires_grad has been set to False
y = 3*x**2 - z**3
y.backward()
print('dy/dx =',x.grad)
print('dy/dz =',z.grad)

dy/dx = tensor(72.)
dy/dz = None


### The real use-case

In [70]:
x = torch.tensor(12, requires_grad=False)
w = torch.tensor(34., requires_grad=True)
b = torch.tensor(90., requires_grad=True)

y = w * x + b
y.backward()

print("dy/dw =", w.grad)
print("dy/db =", b.grad)

dy/dw = tensor(12.)
dy/db = tensor(1.)


### A New Problem

In [71]:
x = torch.ones((2, 2), requires_grad=True) # .ones() would naturally create float tensors
y = x + 2
z = y * y + 3
out = z.mean()

out.backward()

print("dout/dx = ", x.grad)
print("dout/dy = ", y.grad)
print("dout/dz = ", z.grad)

dout/dx =  tensor([[1.5000, 1.5000],
        [1.5000, 1.5000]])
dout/dy =  None
dout/dz =  None


  print("dout/dy = ", y.grad)
  print("dout/dz = ", z.grad)


### Leaf Nodes

- All tensors that we create explicitly through a function line `torch.tensor()` or `torch.rand()` (and are not result of some operation) are **leaf tensors**
- While creating a tensor, when we explicitly set the argument `requires_grad=True`, it gets attached to a graph (as stated earlier, as a leaf)
- If any other tensor, using this leaf tensor and has `reuires_grad=True` will not be a leaf tensor (because this will be some intermediate node of the computation graph got created)
- If we explicitly set `requires_grad=False`, it does not get attached to any graph (but remains a leaf, because we created it). Any other tensor, using this tensor will also be a leaf, because no computation graph got created 

#### The gradients of leaf nodes only can be computed
Unless explicitly using the function `retain_grad()`

In [72]:
a = torch.rand((2, 2), requires_grad=True)
b = a + 2
c = torch.rand((2, 2), requires_grad=False)
d = c - 2

print(a.is_leaf)
print(b.is_leaf)
print(c.is_leaf)
print(d.is_leaf)

True
False
True
True


Let us revisit the previous example and try to find which are all the leaf tensors

In [73]:
x = torch.ones((2, 2), requires_grad=True) # .ones() would naturally create float tensors
y = x + 2
z = y * y + 3
out = z.mean()

out.backward()

print("dout/dx = ", x.grad)
print("dout/dy = ", y.grad)
print("dout/dz = ", z.grad)

print(x.is_leaf)
print(y.is_leaf)
print(z.is_leaf)
print(out.is_leaf)

dout/dx =  tensor([[1.5000, 1.5000],
        [1.5000, 1.5000]])
dout/dy =  None
dout/dz =  None
True
False
False
False


  print("dout/dy = ", y.grad)
  print("dout/dz = ", z.grad)


#### NOTE : Gradients are only tracked for only leaf nodes, that too leaf nodes with `requires_grad=True`

There is a workaround to compute gradient for intermediate tensors too. Before calling `.backward()`, call `.retain_grad()` for the variable you wante to compute the gradient

In [74]:
x = torch.ones((2, 2), requires_grad=True) # .ones() would naturally create float tensors
y = x + 2
z = y * y + 3
out = z.mean()

y.retain_grad()
z.retain_grad()
out.backward()

print("dout/dx = ", x.grad)
print("dout/dy = ", y.grad)
print("dout/dz = ", z.grad)

print(x.is_leaf)
print(y.is_leaf)
print(z.is_leaf)
print(out.is_leaf)

dout/dx =  tensor([[1.5000, 1.5000],
        [1.5000, 1.5000]])
dout/dy =  tensor([[1.5000, 1.5000],
        [1.5000, 1.5000]])
dout/dz =  tensor([[0.2500, 0.2500],
        [0.2500, 0.2500]])
True
False
False
False


Now, you can find that eventhough they aren't leaf nodes, the gradients could be computed

## Activation Functions

There are three ways to access activation functions in PyTorch

### Function

```
torch.relu()
torch.sigmoid()
torch.tanh()
...
```

### Class

```
torch.nn.ReLU
torch.nn.Sigmoid
torch.nn.Tanh
torch.nn.Softmax
torch.nn.LeakyReLU
torch.nn.ELU
...
```

### Function

```
torch.nn.functional.relu()
torch.nn.functional.celu()
torch.nn.functional.selu()
torch.nn.functional.gelu()
torch.nn.functional.sigmoid()
torch.nn.functional.tanh()
torch.nn.functional.leaky_relu()
...
```

In [75]:
x = torch.tensor([2., 4, -4])

print(torch.relu(x))

relu_object = torch.nn.ReLU()
print(relu_object(x))

print(torch.nn.functional.relu(x))

tensor([2., 4., 0.])
tensor([2., 4., 0.])
tensor([2., 4., 0.])


## Loss functions

### Class

```
torch.nn.MSELoss
torch.nn.CrossEntropyLoss
torch.nn.BCELoss
```

In [76]:
y_true = [6, 7, 8, 9]
y_pred = [7, 8, 9, 10]

mse = torch.nn.MSELoss()
mse(y_pred, y_true)

AttributeError: 'list' object has no attribute 'size'

- Direct lists cannot be used for computing loss
- Tensors should be in `Float` type to compute loss

In [77]:
y_true = torch.tensor([6, 7, 8, 9])
y_pred = torch.tensor([7, 8, 9, 10])

mse = torch.nn.MSELoss()
mse(y_pred, y_true)

RuntimeError: "mse_cpu" not implemented for 'Long'

In [78]:
y_true = torch.tensor([6., 7, 8, 9])
y_pred = torch.tensor([7., 8, 9, 10])

mse = torch.nn.MSELoss()
mse(y_pred, y_true)

tensor(1.)