<a href="https://colab.research.google.com/github/Voland24/ModernComputerVisionPytorch/blob/main/AutoGradsTensorsAndSpeedComparison.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Calculating gradients of tensors is a useful functionality to have when impementing Neural Networks, since gradients are a core component of gradient descent.
When defining a tensor we can specify that we need its gradient to be calculated

In [2]:
import torch
x = torch.tensor([[2.,-1,],[1.,1.]], requires_grad = True)
print(x)

tensor([[ 2., -1.],
        [ 1.,  1.]], requires_grad=True)


We define a simple function for which we want to calculate the gradient i.e. first derivative

In [12]:
out = x.pow(2).sum() # out = sum ( x ** 2)
                     #we know the gradient is 2 * x

print(out)

tensor(7., grad_fn=<SumBackward0>)


We now calculate the gradient in the out function with respect to x i.e. how a small change in x changes the out function


In [13]:
out.backward()

We obtain the gradient of out with respect to x 

In [14]:
grads = x.grad
print(grads)

tensor([[ 4., -2.],
        [ 2.,  2.]])


This matches up with our expectation since we see the gradients are 2 * original value and we analiticaly showed the derivative to be 2 * x

# **Why use Pytorch at all?**

When updating weights in backprop in a Neural Netowrk, we can see that the updating of one specific weight doesn't impact (and shouldn't) the updates of other weights since they are all updated simultaneously. However, on a CPU they are updated sequentially, i.e we calculate all the drivatives neccessary and then update the weights.

We can utilize GPU threads, since they are lightweight and there is a lot of them to update the weights faster. Pytorch is optimized to be run on a GPU.

In [3]:
x = torch.rand(1,6400)
y = torch.rand(6400,5000)

device = 'cuda' if torch.cuda.is_available() else 'cpu' #run on gpu if available

x, y = x.to(device), y.to(device) #transfer the tensors to GPU mem

%timeit z = (x@y) #mul two tensors on GPU


The slowest run took 6.59 times longer than the fastest. This could mean that an intermediate result is being cached.
25.9 µs ± 27.1 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [4]:
x, y = x.cpu(), y.cpu()
%timeit z = (x@y)

#we see how slower it is on a CPU to multiply the same two tensors

9.1 ms ± 83.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [5]:
import numpy as np
x = np.random.random((1,6400))
y = np.random.random((6400,5000))
%timeit z = np.matmul(x,y)

19.9 ms ± 544 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


GPU mul took around 26 microseconds to complete
CPU mul of tensors took around 9 milisecs
CPU mul of np arrays took around 20milisec