## TENSOR BASICS

A tensor is a multidimensional array. A CS-tensor can be used as a function that takes in multiple vectors and can outputs either other vectors, tensors or scalars. The tensors in these neural nets are the things that keep track of what has been learned.

In [None]:

import torch ## import module

tensor_empty = torch.empty(3,3) #parameter is size and dimensions, here it is 2D vector with 3 items each

tensor_random = torch.rand(1,2,3)# prints a random torch

tensor_zero = torch.zeros() # gives 0s for all items, can use ones instead etc

x = torch.ones(2,2, dtype=torch.float16) # specify class to float 16

x = torch.tensor([2.5,0.1]) # create data from a python list



You can also perform calculation operations with tensors.

In [None]:
x = torch.rand(2,2)
y = torch.rand(2,2)
z = x + y #or z = torch.add(x,y)
print(z)

z = torch.sub(x,y) # substraction
z = torch.mul(x,y) # multiplication

y.mul_(x) # multiplies all items inside () to y. called trailing underline

You can perform slicing operations

In [None]:
x = torch.rand(5,3)
print(x)
print(x[:,0]) #only first column as rows
print(x[1,:]) #prints row 1, all columns
print(x[1,1]) #prints element 1:1, you can use print(x[1,1]).item() to get the actual value

You can reshape tensors

In [None]:
x = torch.rand(4,4) # gives 16 values
print(x)
y = x.view(16) # prints 16 values in one dimension, you could also do y = x.view(-1,8)
# if you put -1, pytorch will fix the size in this case x.view(2,8)
print(y)

Converting from torch tensor to numpy array
note: if tensor is on CPU and not GPU, then both objects share same memory allocation. Changing one changes the other.

In [None]:
import torch
import numpy as np

a = torch.ones(5)
print(a)

b = a.numpy()
print(type(b))

a.add_(1)
print(a)
print(b) #added +1 to b too because they both point to same memory location.

Converting from  numpy array to torch tensor

In [None]:
a = np.ones(5)
print(a)

b = torch.from_numpy(a)
print(b)

a += 1
print(a)
print(b) # tensor gets modified too!

Checking if operations are being conducted on CPU vs GPU

In [None]:
if torch.cuda.is_available(): #checks if cupa is installed
    
device = torch.device("cuda")
x = torch.ones(5, device = device)
y = torch.ones(5)
y = y.to(device)
z = x + y
z = z.to("cpu")



This tells pytorch that it will need to calculate the gradiant of the tensor later in the optimization steps. When you have a variable you want to optimize, you need to specify this

In [None]:
x = torch.ones(5, requires_grad=True)# by default it is false
print(x)

## GRADIANT CALCULATIONS WITH AUTOGRAD

Autograd: This class is an engine to calculate derivatives. It records a graph of all the operations performed on a gradient enabled tensor and creates an acyclic graph called the dynamic computational graph. The leaves of this graph are input tensors and the roots are output tensors. Gradients are calculated by tracing the graph from the root to the leaf and multiplying every gradient in the way using the chain rule.

Neural networks are nothing more than composite mathematical functions that are delicately tweaked (trained) to output the required result. The tweaking or the training is done through a remarkable algorithm called backpropagation. Backpropagation is used to calculate the gradients of the loss with respect to the input weights to later update the weights and eventually reduce the loss. i.e. it is basically the application of the calculus chain rule.

Creating and training a neural network involves the following essential steps:
1. Define the architecture
2. Forward propagate on the architecture using input data
3. Calculate the loss
4. Backpropagate to calculate the gradient for each weight
5. Update the weights using a learning rate

The change in the loss for a small change in an input weight is called the gradient of that weight and is calculated using backpropagation. The gradient is then used to update the weight using a learning rate to overall reduce the loss and train the neural net.
This is done in an iterative way. For each iteration, several gradients are calculated and something called a computation graph is built for storing these gradient functions. PyTorch does it by building a Dynamic Computational Graph (DCG). This graph is built from scratch in every iteration providing maximum flexibility to gradient calculation. For example, for a forward operation (function)Mul a backward operation (function) called MulBackwardis dynamically integrated in the backward graph for computing the gradient.

In [None]:
import torch
x = torch.randn(3, requires_grad=True)
print(x)

# calculate gradiant of function with respect to x
y = x+2 # creates computational graph, the operation is the +, input are x and 2 and output is y.

print(y) # its at addbackward, it is an addition

z = y*y*2
print(z)

z = z.mean()# operation is meanBackward
z.backward() # calculates gradiant z with respect to x (dz/dx), no need for argument since it is scalar

print("gradiants are", x.grad) # calculates gradiants using partial derivates x radian vectors

If the last operation creates a scalar value (#2), then we do not need to pass an argument into z.backward. If it does not create a scalar value, we would need to pass it in as an argument (#1).

In [None]:
#1 
v = torch.tensor([0.1, 1.0, 0.001], dtype=torch.float)
z.backward(v)

#2 
z = z.mean()
z.backward()

Sometimes during training loop, when wanting to update the weights (shouldn't be part of the gradiant computation).  We have three options to prevent this:

In [None]:
import torch
x = torch.randn(3, requires_grad=True)
print(x)

#1: x.requires_grad_(False)
x.requires_grad_(False)

#2: x.detach() 
y = x.detach() # creates new tensor that doesn't require the gradiant

#3: wrap it with a with torch.no_grad():
with torch.no_grad():
    y = x + 2
    print(y) # no gradiant function here (addbackward0)

Note: whenever we call the backward function, the gradiant for the tensor is acumulated in the .grad attribute. the value are summed up:

In [None]:
import torch

weights = torch.ones(4, requires_grad=True)

#training loop
for epoch in range(3):
    model_output = (weights*3).sum() # simulates a model output
    
    model_output.backward() # calculates the gradiants
    
    print(weights.grad)
    
    # all values are summed up, which we don't want, we must empty the gradiants before moving on to 
    # optimization step
    weights.grad.zero_()Summary: it is important to remember when you are trying to calculate gradiants to specify the requires_grad parameter, if it is set to True you can calculate the gradiant using the .backward() function.
Before doing the next iteration in the optimization step we need to empty the gradiants with one of the three methods mentioned above.

This is important in the training step, useful in the pytorch optimizer:

In [None]:
import torch

weights = torch.ones(4, requires_grad=True)

optimizer = torch.optim.SGD(weights, lr=0.01) # stochastic gradiant descent 
optimizer.step()
optimizer.zero_grad() # does the same as [19]

Summary: it is important to remember when you are trying to calculate gradiants to specify the requires_grad parameter, if it is set to True you can calculate the gradiant using the .backward() function.
Before doing the next iteration in the optimization step we need to empty the gradiants with one of the three methods mentioned above.

## BACKPROPAGATION: THEORY WITH EXAMPLE