### Tensors
- Similar to array and matrices.
- Used to encode input, outputs and even the model's parameters
- ##

- Notice similarities
- Explain as if it were a language
- 

In [7]:
import torch
import numpy as np

In [8]:
data = [[1, 2], [3, 4]]
x_data = torch.tensor(data)

np_array = np.array(data)
t = torch.ones(5)

print(f'x_data {x_data}\n')

# Inheriting properties of a tensor
x_rand = torch.rand_like(x_data, dtype=torch.float)

print(x_rand.shape)

x_data tensor([[1, 2],
        [3, 4]])

torch.Size([2, 2])


In [9]:
# Tensors have attributes that can be acssesed as their member variables

tensor = torch.rand(3,4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


In [10]:
# If GPU is available, switch to it

if torch.cuda.is_available():
  tensor = tensor.to('cuda')

In [11]:
tensor = torch.ones(4, 4)
tensor[:,1] = 0
print(tensor)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])


In [12]:
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)

tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])


In [13]:
# This computes the element-wise product
print(f"tensor.mul(tensor) \n {tensor.mul(tensor)} \n")
# Alternative syntax:
print(f"tensor * tensor \n {tensor * tensor}")

tensor.mul(tensor) 
 tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]]) 

tensor * tensor 
 tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])


In [14]:
print(f"tensor.matmul(tensor.T) \n {tensor.matmul(tensor.T)} \n")
# Alternative syntax:
print(f"tensor @ tensor.T \n {tensor @ tensor.T}")

tensor.matmul(tensor.T) 
 tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]]) 

tensor @ tensor.T 
 tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]])


- Bridge between numpy and tensors
- A change in the tensor reflects in the NumPy array if you casted it from tensor to numpy array and vica-versa as they share memory.

In [16]:
t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")

t.add_(1)
print(f"t: {t}")
print(f"n: {n}")

t = torch.from_numpy(n)
np.add(n, 1, out=n)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]
t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]
t: tensor([3., 3., 3., 3., 3.])
n: [3. 3. 3. 3. 3.]


------

### Intro to Autograd
- torch.autograd is PyTorch’s automatic differentiation engine that powers neural network training
- Forward prop and back prop are defined

#### Usage in Pytorch
- Let’s take a look at a single training step. For this example, we load a pretrained resnet18 model from torchvision. We create a random data tensor to represent a single image with 3 channels, and height & width of 64, and its corresponding label initialized to some random values

In [34]:
import torch, torchvision
model = torchvision.models.resnet18(pretrained=True)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1000)

- Now to make a forward pass through the network we use _prediction = model(data)_ . Note that we do not store any data during this phase

In [35]:
prediction = model(data)

- We use the model’s prediction and the corresponding label to calculate the error (loss). Then backpropagate this error through the network. 
- Backward propagation is kicked off when we call .backward() on the error tensor. 
- Autograd calculates and stores the gradients for each model parameter in the parameter’s .grad attribute.

In [36]:
loss = (prediction - labels).sum()
loss.backward()

- Next, we load an optimizer, in this case SGD with a learning rate of 0.01 and momentum of 0.9. We register all the parameters of the model in the optimizer.
- Finally, we call .step() to initiate gradient descent. The optimizer adjusts each parameter by its gradient stored in .grad.

In [45]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
optim.step() #gradient descent

- As an example, consider two tensors and we'll see how autograd internally works

In [46]:
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

# Say the error be Q and a and b be its parameters
Q = 3*a**3 - b**2

- When we call .backward() on Q, autograd calculates gradients wrt a and b and stores them in the respective tensors’ .grad attribute.
- We can either aggregate Q into a scalar and call .backward implicitly : Q.sum.backward()
- Or we can provide an external gradiant argument to Q.backwawrd() as Q is a vector. Gradiant is a tensor of the same shape and represents the gradiant of Q wrt Q i.e. 1 


In [47]:
external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad)

# Or Q.sum.backward()
# check if collected gradients are correct
print(9*a**2 == a.grad)
print(-2*b == b.grad)

tensor([True, True])
tensor([True, True])
