<a href="https://colab.research.google.com/github/ChenshuLiu/Pytorch-Tutorial/blob/main/Pytorch_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pytorch Tutorial
#### Chenshu Liu
Reference: https://youtu.be/c36lUUr864M?list=PLw1A_xmFf3RLPk5cKf1PZo6-3E-boYifg 

In [3]:
import torch
import numpy as np

## Tensors Basics
* use `torch.empty(dimension)` to define an empty tensor object

In [3]:
x = torch.empty(2, 3)
print(x)

tensor([[8.6337e-34, 0.0000e+00, 3.5032e-44],
        [0.0000e+00,        nan, 0.0000e+00]])


* use `torch.rand(dimension)` to define a tensor with random numbers with given dimension

In [4]:
x = torch.rand(2, 2)
print(x)

tensor([[0.1364, 0.7621],
        [0.7713, 0.0085]])


* use `torch.ones(dimension, dtype)` to define tensor object with ones

In [5]:
x = torch.ones(2, 2, dtype = torch.double)
print(x)
print(x.dtype)
print(x.size())

tensor([[1., 1.],
        [1., 1.]], dtype=torch.float64)
torch.float64
torch.Size([2, 2])


* use `torch.tensor([array of values])` to cast array of values into a tensor object

In [6]:
x = torch.tensor([2.5, 0.1])
print(x)

tensor([2.5000, 0.1000])


### Tensor Arithematics
* Addition `torch.add(a,b)`
* Subtraction `torch.sub(a,b)`
* Multiplication `torch.mul(a,b)`
* Division `torch.div(a,b)`
* For in-place modification, add an underscore `_` behind the operation name (i.e. `a.add_(b)` will add a to b and assigned the sum to a)
* Without `_` in the operation, the result need to be assigned to a new object (e.g. `c = a.add(b)`)

In [24]:
x = torch.rand(2, 2)
y = torch.rand(2, 2)
# elementwise addition
z = x + y
#  or
z = torch.add(x, y)
print(x)
print(y)
print(z)

tensor([[0.6109, 0.4688],
        [0.5876, 0.3950]])
tensor([[0.5343, 0.4071],
        [0.7797, 0.0901]])
tensor([[1.1452, 0.8760],
        [1.3672, 0.4851]])


In [27]:
z = y.add(x)
print(x)
print(y)
print(z)
# in-place modification on y
y.add_(x)
print(y)

tensor([[0.6109, 0.4688],
        [0.5876, 0.3950]])
tensor([[1.7561, 1.3448],
        [1.9548, 0.8800]])
tensor([[2.3670, 1.8137],
        [2.5424, 1.2750]])
tensor([[2.3670, 1.8137],
        [2.5424, 1.2750]])


### Tensor Slicing

In [14]:
x = torch.rand(3, 3)
print(x[1, 1])
print(type(x[1, 1]))
# extract only value from the position
print(x[1, 1].item())
print(type(x[1, 1].item()))

tensor(0.9280)
<class 'torch.Tensor'>
0.9280080795288086
<class 'float'>


### Tensor Resizing
For unknown dimension, use `-1` and specify the other dimension (works the same in Numpy)

In [23]:
x = torch.rand(4, 4)
print(x.shape)
# -1 here is just a place-holder
y = x.view((-1, 8))
print(y.shape)

torch.Size([4, 4])
torch.Size([2, 8])


### Tensor and Numpy
The `torch.from_numpy()` function converts numpy array to tensor object, but the change is made in-place (i.e. modifying the numpy array will also modify the converted tensor)

In [6]:
a = np.ones(5)
b = torch.from_numpy(a)
print(type(a))
print(a)
print(type(b))
print(b)

# modify numpy will also alter tensor
a += 1
print(a)
print(b)

<class 'numpy.ndarray'>
[1. 1. 1. 1. 1.]
<class 'torch.Tensor'>
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


## Autograd
To calculate gradient for the model with respect to several values (i.e. $x_1, \dots, x_n$) in preparation for the backward propagation, we use the `requires_grad = True` argument to keep track of the gradients.

reference:
1. Gradient: https://builtin.com/data-science/gradient-descent 

In [7]:
import torch

In [6]:
x = torch.randn(3, requires_grad = True)
print(x)

y = x+2
print(y) #backward gradient with addition -- "AddBackward0"
z = y*y*2 
print(z) #backward gradient with multiplication -- "MulBackward0"
z = z.mean() 
print(z) #backward gradient with mean operation -- "MeanBackward0"
z.backward()
print(x.grad) #the gradients with respect to the three x's

tensor([-1.0523,  0.5415,  0.9816], requires_grad=True)
tensor([0.9477, 2.5415, 2.9816], grad_fn=<AddBackward0>)
tensor([ 1.7963, 12.9187, 17.7795], grad_fn=<MulBackward0>)
tensor(10.8315, grad_fn=<MeanBackward0>)
tensor([1.2636, 3.3887, 3.9754])


By default, `requires_grad` has a False argument, so when we try to request for gradient with respect to the independent variables (i.e. $x_1, \dots, x_n$), there will be an error message.

In [9]:
x = torch.randn(3, requires_grad = False)
print(x)

y = x+2
print(y) #will not show backward gradient with addition -- "AddBackward0"
z = y*y*2 
print(z) #will not show backward gradient with multiplication -- "MulBackward0"
z = z.mean() 
print(z) #will not show backward gradient with mean operation -- "MeanBackward0"
z.backward() #will produce error because the gradients were not requested in x
# print(x.grad)

tensor([-0.2098, -0.7104, -1.0785])
tensor([1.7902, 1.2896, 0.9215])
tensor([6.4095, 3.3259, 1.6985])
tensor(3.8113)


RuntimeError: ignored

### Removing Gradients Tracking
Can remove the tracked gradients in the following ways:

* `tensorobj.requires_grad_(False)` (NOTE: any function with an underscore behind will modify the object in-place)

In [16]:
x = torch.randn(3, requires_grad = True)
print(x)
x.requires_grad_(False) #in-place modification
print(x)

tensor([ 0.8699,  0.5317, -0.7827], requires_grad=True)
tensor([ 0.8699,  0.5317, -0.7827])


* `tensorobj.detach`

In [20]:
x = torch.randn(3, requires_grad = True)
print(x)
y = x.detach() #does not modify in-place
print(y)

tensor([-1.0555,  0.7687,  0.2573], requires_grad=True)
tensor([-1.0555,  0.7687,  0.2573])


* Perform operations within a `with` statement: `with torch.no_grad(): blahblahblah`

In [21]:
x = torch.randn(3, requires_grad = True)
print(x)
with torch.no_grad():
  y = x+2
  print(y)

tensor([ 0.6684, -1.3532,  1.0301], requires_grad=True)
tensor([2.6684, 0.6468, 3.0301])


### Gradients Accumulation During Epochs

In [22]:
weights = torch.ones(4, requires_grad = True)

for epoch in range(3):
  model_output = (weights*3).sum()
  model_output.backward()
  print(weights.grad)

tensor([3., 3., 3., 3.])
tensor([6., 6., 6., 6.])
tensor([9., 9., 9., 9.])


Note that in the three epochs, the gradients are accumulated. This is because the `backward` function will accumulate the gradients from the previous iteration. However, this is NOT what we want for model training (i.e. iterating through multiple epochs). Thus, we need to use the `grad.zero_()` function to modify the tracked gradients at the end of one epoch to restart the tracking.

In [23]:
weights = torch.ones(4, requires_grad = True)
for epoch in range(3):
  model_output = (weights*3).sum()
  model_output.backward()
  print(weights.grad)
  weights.grad.zero_()

tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])


Now, the gradients in every epoch seems right (i.e. we shouldn't expect any change in the gradients because the model stayed the same)

## Backpropagation