# Intro to Deep Learning - Recitation
TA: Shiwei Tan (shiwei.tan@rutgers.edu)

## Intro to Colab
+ Code & Text & Image
+ Context & Execution Order
+ Long-running Operations

## Intro to Pytorch
### Tensors

In [10]:
import torch

In [4]:
vec = torch.tensor([1, 5, 9])
vec

tensor([1, 5, 9])

In [5]:
mat = torch.tensor([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
mat

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [11]:
zeros_vec = torch.zeros(6)
print(zeros_vec)
ones_mat = torch.ones((4, 6))
print(ones_mat)

tensor([0., 0., 0., 0., 0., 0.])
tensor([[1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.]])


In [12]:
like = torch.zeros_like(ones_mat)
print(like)

tensor([[0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.]])


In [13]:
print(mat)
print("[1, 1]:", mat[1, 1])
print("[1]:", mat[1])
print("[1, :]:", mat[1, :])
print("[:, 1]:", mat[:, 1])
print("[:2, :]:", mat[:2, :], sep='\n')

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
[1, 1]: tensor(5)
[1]: tensor([4, 5, 6])
[1, :]: tensor([4, 5, 6])
[:, 1]: tensor([2, 5, 8])
[:2, :]:
tensor([[1, 2, 3],
        [4, 5, 6]])


### Operations

In [7]:
ones_vec = torch.ones(5)
twos_vec = torch.ones(5) * 2
print(ones_vec)
print(twos_vec)
print()

# Element-wise Operations
print("1.", ones_vec + twos_vec)
print("2.", ones_vec + 4)
print("3.", twos_vec * twos_vec)
print("4.", twos_vec ** 4)
print("5.", twos_vec ** twos_vec)
print("6.", torch.sin(ones_vec))
print("7.", torch.square(twos_vec))

tensor([1., 1., 1., 1., 1.])
tensor([2., 2., 2., 2., 2.])

1. tensor([3., 3., 3., 3., 3.])
2. tensor([5., 5., 5., 5., 5.])
3. tensor([4., 4., 4., 4., 4.])
4. tensor([16., 16., 16., 16., 16.])
5. tensor([4., 4., 4., 4., 4.])
6. tensor([0.8415, 0.8415, 0.8415, 0.8415, 0.8415])
7. tensor([4., 4., 4., 4., 4.])


In [8]:
print(twos_vec)
print(twos_vec.sum())
print(twos_vec.mean())
print(twos_vec.max())

tensor([2., 2., 2., 2., 2.])
tensor(10.)
tensor(2.)
tensor(2.)


In [9]:
vec = torch.tensor(list(range(16)))
print(vec)
print(vec.shape)

mat = vec.reshape((4, 4)) # vec itself is not changed!
print(mat)
print(mat.shape)
print(vec)

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])
torch.Size([16])
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]])
torch.Size([4, 4])
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])


In [10]:
# Matrix Operations
print(mat)
print(mat.T)
print(mat @ mat)
print(torch.mm(mat, mat))

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]])
tensor([[ 0,  4,  8, 12],
        [ 1,  5,  9, 13],
        [ 2,  6, 10, 14],
        [ 3,  7, 11, 15]])
tensor([[ 56,  62,  68,  74],
        [152, 174, 196, 218],
        [248, 286, 324, 362],
        [344, 398, 452, 506]])
tensor([[ 56,  62,  68,  74],
        [152, 174, 196, 218],
        [248, 286, 324, 362],
        [344, 398, 452, 506]])


Concatenation, stacking, etc.

Check the document if you are unsure:

https://pytorch.org/docs/stable/tensors.html

### Gradient Information
Pytorch can automatically track your computational graph and the gradient data!

In [14]:
x = torch.nn.Parameter(torch.tensor([1.]), requires_grad=True)
print(x)
print(x * x)

Parameter containing:
tensor([1.], requires_grad=True)
tensor([1.], grad_fn=<MulBackward0>)


In [12]:
y = x * x + 5 * x
y

tensor([6.], grad_fn=<AddBackward0>)

In [13]:
# This y.backward() calculates dy/d(node output) for every node.
# Every operation in your expressions that calculate y produces a resulting value and a corresponding node.
# For example, the expression x * x produces a node in the computational graph.
# That node will be connected with the two operand nodes (x and the same x), and another node created by the '+' operator later will be connected to this node, too.
# When you call .backward(), Pytorch will search from the y node and update the dy/d(node output) for each node it found.
# This is why we can then read x.grad to get dy/dx.
y.backward()
x.grad

tensor([7.])

## Example Gradient Descent - Calculating $\sqrt2$
Let's say we want to calculate the value of $\sqrt2$.

If we have: $f(x)=(x^2-2)^2$, then we can find the $\sqrt2$ by minimizing $f(x)$.

We minimize $f(x)$ by stepping $x$ to the direction that decreases $f(x)$. The direction can be known by caculating $f'(x)$.


In [18]:
x = torch.nn.Parameter(torch.Tensor([1.0]), requires_grad = True)

def f(x):
    return torch.square(x * x - 2)

alpha = 0.05

for i in range(15):
  f_value = f(x)
  print("f(", x.data.item(), ") = ", f_value.data.item())

  f_value.backward()

  #print("f'(", x.data.item(), ") = ", x.grad.item())

  #print()

  with torch.no_grad():
    # We don't need Pytorch to track the computational graph for this multiplication, so we do this in a no_grad() context.
    x -= alpha * x.grad

  # .backward() accumulates the gradient on the param.grad rather than directly setting param.grad to the gradient, so we need to clear the gradient before we do the next .backward().
  x.grad = None

print(2 ** 0.5)

f( 1.0 ) =  1.0
f( 1.2000000476837158 ) =  0.31359994411468506
f( 1.3344000577926636 ) =  0.04812602326273918
f( 1.3929471969604492 ) =  0.003563863690942526
f( 1.4095784425735474 ) =  0.00017131102504208684
f( 1.4132683277130127 ) =  7.143176844692789e-06
f( 1.414023756980896 ) =  2.8815361474698875e-07
f( 1.4141755104064941 ) =  1.158765883246815e-08
f( 1.4142059087753296 ) =  4.707203515863512e-10
f( 1.4142119884490967 ) =  1.9454660105111543e-11
f( 1.4142131805419922 ) =  1.1510792319313623e-12
f( 1.4142135381698608 ) =  1.4210854715202004e-14
f( 1.4142135381698608 ) =  1.4210854715202004e-14
f( 1.4142135381698608 ) =  1.4210854715202004e-14
f( 1.4142135381698608 ) =  1.4210854715202004e-14
1.4142135623730951


## Another Example of Gradient Descent - Finding the Midpoint
Suppost we have 3 points and we want to find a midpoint that minimizes the total squared distance to these 3 points. How do we find it?

We minimizing the funtion: $f(p)=||p-p_1||^2+||p-p_2||^2+||p-p_3||^2$.

We use gradient descent again! It's just that now we have 2 variables to optimize.

In [16]:
point_1 = torch.Tensor([1,2])
point_2 = torch.Tensor([3,4])
point_3 = torch.Tensor([-1,5])


distance_minimizer = torch.nn.Parameter(torch.Tensor([0,0]), requires_grad = True)

alpha = 0.01

for i in range(150):
  distance_1 = torch.sum( torch.square( point_1 - distance_minimizer ) )
  distance_2 = torch.sum( torch.square( point_2 - distance_minimizer ) )
  distance_3 = torch.sum( torch.square( point_3 - distance_minimizer ) )

  total_distance = distance_1 + distance_2 + distance_3

  if (i + 1) % 10 == 0:
    print("Total Distance:", total_distance)
    print( distance_minimizer )

  total_distance.backward()

  with torch.no_grad():
    distance_minimizer -= alpha * distance_minimizer.grad

  distance_minimizer.grad = None

print( (point_1 + point_2 + point_3)/3 )


Total Distance: tensor(26.8940, grad_fn=<AddBackward0>)
Parameter containing:
tensor([0.4270, 1.5657], requires_grad=True)
Total Distance: tensor(16.7941, grad_fn=<AddBackward0>)
Parameter containing:
tensor([0.6914, 2.5350], requires_grad=True)
Total Distance: tensor(13.8641, grad_fn=<AddBackward0>)
Parameter containing:
tensor([0.8338, 3.0572], requires_grad=True)
Total Distance: tensor(13.0140, grad_fn=<AddBackward0>)
Parameter containing:
tensor([0.9105, 3.3384], requires_grad=True)
Total Distance: tensor(12.7674, grad_fn=<AddBackward0>)
Parameter containing:
tensor([0.9518, 3.4898], requires_grad=True)
Total Distance: tensor(12.6959, grad_fn=<AddBackward0>)
Parameter containing:
tensor([0.9740, 3.5714], requires_grad=True)
Total Distance: tensor(12.6751, grad_fn=<AddBackward0>)
Parameter containing:
tensor([0.9860, 3.6154], requires_grad=True)
Total Distance: tensor(12.6691, grad_fn=<AddBackward0>)
Parameter containing:
tensor([0.9925, 3.6390], requires_grad=True)
Total Distance: 

## Review
- Colab
- Pytorch tensors
- Auto-differentiation
- Gradient descent examples

TA:\
Shiwei Tan\
shiwei.tan@rutgers.edu\
Office hour: Thursday 3pm - 4pm, CBIM
