## Day 1: Basic Operations in PyTorch

Before introducing PyTorch, we will first implement a simple network using numpy (which you should already be familiar with).

Note: Search in the documentation whenever you feel unsure: https://pytorch.org/docs/stable/torch.html

In [1]:
import numpy as np
np.random.seed(2020)

In [2]:
'''
Simple feedforward network with ReLU activation.

N: batch size,
D_in: input dimension
H: hidden dimension
D_out: output dimension
'''
N, D_in, H, D_out = 64, 100, 200, 5

# create random input and output data
x = np.random.randn(N, D_in)*0.001 # input
y = np.random.randn(N, D_out)*0.001 # output (i.e., labels)


#Randomly initialize weights, bias terms are ignored
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-1 # you can change this value
# run for 2000 steps
for t in range(2000):
    # Forward pass: compute predicted y
    h = x.dot(w1) # shape: (N, H)
    h_relu = np.maximum(h, 0) # add relu activation
    y_pred = h_relu.dot(w2) # shape: (N, D_out)
    
    # Compute and print loss
    loss = np.square(y_pred, y).sum()
    if t%100 == 0:
        print (t, loss)
    
    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0*(y_pred - y) # shape: (N, D_out)
    grad_w2 = h_relu.T.dot(grad_y_pred) # shape: (H, D_out)
    grad_h_relu = grad_y_pred.dot(w2.T) # shape: (N, H)
    grad_h = grad_h_relu.copy()
    grad_h[h<0] = 0 # hidden state values < 0 have no grad due to ReLU
    grad_w1 = x.T.dot(grad_h) # shape: (D_in, H)
    
    # Update weights
    w1 -= learning_rate * grad_w1 
    w2 -= learning_rate * grad_w2 
    
    
## run this and you should see the loss decreasing     

0 2.3520184568836067
100 0.9333180231669476
200 0.48612805469839365
300 0.2803719033005763
400 0.1744724200835167
500 0.11463585591478825
600 0.07832964286917948
700 0.055092783833448344
800 0.03961719982893863
900 0.029004932094094965
1000 0.021513919550906938
1100 0.016119207864832934
1200 0.012179067559047903
1300 0.009267101210280654
1400 0.007096674113319451
1500 0.005464666405809898
1600 0.0042290780438775325
1700 0.0032866944944580155
1800 0.0025635825342391573
1900 0.0020062774976896554


Numpy is a great framework, but it cannot utilize GPUs to accelerate its numerical computations. For modern deep neural networks, GPUs often provide speedups of 50x or greater, so unfortunately numpy won’t be enough for modern deep learning. Also, it's kind of troublesome to write the gradient calculation and backprop manually every time, especially for complext networks. Fortunately with PyTorch, things will be much easier.

Here we introduce the most fundamental PyTorch concept: the Tensor. A PyTorch Tensor is conceptually identical to a numpy array: a Tensor is an n-dimensional array, and PyTorch provides many functions for operating on these Tensors. Behind the scenes, Tensors can keep track of a computational graph and gradients, but they’re also useful as a generic tool for scientific computing.

Also unlike numpy, PyTorch Tensors can utilize GPUs to accelerate their numeric computations. To run a PyTorch Tensor on GPU, you simply need to cast it to a new datatype.

Here we first introduce some basic operations for PyTorch tensors, you will find them similar to NumPy.

In [3]:
'''
define a helper function that will summarise various properties of a tensor:
type, dimension, and contents of the tensor.
'''
def describe(x):
    print ("Type: {}".format(x.type()))
    print ("Shape/Size: {}".format(x.shape))
    print ("Values: \n{}".format(x))

In [4]:
import torch
torch.manual_seed(2020) # set a random seed for reproducing
print (torch.__version__)

# randomly initialize a tensor by specifying dimensions
describe(torch.Tensor(2, 3))

1.5.1
Type: torch.FloatTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[0., 0., 0.],
        [0., 0., 0.]])


In [5]:
describe(torch.rand(2, 3)) # uniform distribution [0, 1)
describe(torch.randn(2, 3)) # normal distribution

Type: torch.FloatTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[0.4869, 0.1052, 0.5883],
        [0.1161, 0.4949, 0.2824]])
Type: torch.FloatTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[-0.0264, -0.1360, -0.3136],
        [ 0.6418,  1.1961,  0.9936]])


We can also create tensors all filled with the same scalar. For creating a tensor of zeros or ones, we have built-in functions, and for filling it with specific values, we can use the fill_() method. Any PyTorch method with an underscore refers to an in-place operation; that is, it modifies the content in place without creating a new object.

In [6]:
describe(torch.zeros(2, 3))
x = torch.ones(2, 3)
describe(x)
x.fill_(5)
describe(x)

Type: torch.FloatTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[0., 0., 0.],
        [0., 0., 0.]])
Type: torch.FloatTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[1., 1., 1.],
        [1., 1., 1.]])
Type: torch.FloatTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[5., 5., 5.],
        [5., 5., 5.]])


You can also convert between NumPy arrays and PyTorch tensors.

In [7]:
npy = np.random.rand(2, 3)
describe(torch.from_numpy(npy))

Type: torch.DoubleTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[0.9477, 0.8570, 0.2478],
        [0.5205, 0.9915, 0.0510]], dtype=torch.float64)


Notice that the type of the tensor is DoubleTensor instead of the default FloatTensor. This corresponds with the data type of the NumPy random matrix, a float64.

You can also convert a tensor to other types (float, double, long, etc.)

In [8]:
x = x.long() # convert to long type
describe(x)

Type: torch.LongTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[5, 5, 5],
        [5, 5, 5]])


After you have created tensors, you can operate on them just like you do for Python or NumPy, or use PyTorch functions.

In [9]:
a = torch.randn(2, 3)
describe(a)
describe(a+a)
describe(torch.add(a, a))

Type: torch.FloatTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[ 1.0911, -0.5916, -3.1703],
        [-0.0083, -1.0251,  0.7644]])
Type: torch.FloatTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[ 2.1821, -1.1832, -6.3406],
        [-0.0165, -2.0501,  1.5288]])
Type: torch.FloatTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[ 2.1821, -1.1832, -6.3406],
        [-0.0165, -2.0501,  1.5288]])


Keeping track of the shape of your tensors is important when coding models. You can use view() function to reshape them.

In [10]:
a = torch.arange(6)
describe(a)

a = a.view(2, 3)
describe(a)

Type: torch.LongTensor
Shape/Size: torch.Size([6])
Values: 
tensor([0, 1, 2, 3, 4, 5])
Type: torch.LongTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])


You can specify the dimension in which you want to perform certain operations, like sum.

In [11]:
describe(torch.sum(a, dim=0)) # sum along the first dimension
describe(torch.sum(a, dim=1)) # sum along the second dimension
describe(torch.transpose(a, 0, 1)) # swap the dimensions (2,3)->(3,2)

Type: torch.LongTensor
Shape/Size: torch.Size([3])
Values: 
tensor([3, 5, 7])
Type: torch.LongTensor
Shape/Size: torch.Size([2])
Values: 
tensor([ 3, 12])
Type: torch.LongTensor
Shape/Size: torch.Size([3, 2])
Values: 
tensor([[0, 3],
        [1, 4],
        [2, 5]])


You can do indexing, slicing and joining in NumPy style as well.

In [12]:
x = torch.arange(6).view(2, 3)
describe(x)
describe(x[:1, :2])
describe(x[0, 1])

Type: torch.LongTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/Size: torch.Size([1, 2])
Values: 
tensor([[0, 1]])
Type: torch.LongTensor
Shape/Size: torch.Size([])
Values: 
1


You can also use PyTorch indexing functions for complex indexing.

In [13]:
indices = torch.LongTensor([0, 2])
describe(torch.index_select(x, dim=1, index=indices))

Type: torch.LongTensor
Shape/Size: torch.Size([2, 2])
Values: 
tensor([[0, 2],
        [3, 5]])


We can join tensors with the concatenation function.

In [14]:
x = torch.arange(6).view(2, 3)
describe(x)
describe(torch.cat([x,x], dim=0))
describe(torch.cat([x,x], dim=1))
describe(torch.stack([x,x])) # simply stack 2 tensors together

Type: torch.LongTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/Size: torch.Size([4, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5],
        [0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/Size: torch.Size([2, 6])
Values: 
tensor([[0, 1, 2, 0, 1, 2],
        [3, 4, 5, 3, 4, 5]])
Type: torch.LongTensor
Shape/Size: torch.Size([2, 2, 3])
Values: 
tensor([[[0, 1, 2],
         [3, 4, 5]],

        [[0, 1, 2],
         [3, 4, 5]]])


We can do various numerical and linear algebra operations.

In [15]:
x1 = torch.arange(6).view(2, 3)
describe(x1)
x2 = torch.ones(3, 2)
x2[:, 1] += 1 # broadcasting
describe(x2)

Type: torch.LongTensor
Shape/Size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.FloatTensor
Shape/Size: torch.Size([3, 2])
Values: 
tensor([[1., 2.],
        [1., 2.],
        [1., 2.]])


In [16]:
# matrix multiplication
# x1 and x2 need to be the same type
describe(torch.mm(x1, x2.long()))
describe(torch.matmul(x1, x2.long()))

Type: torch.LongTensor
Shape/Size: torch.Size([2, 2])
Values: 
tensor([[ 3,  6],
        [12, 24]])
Type: torch.LongTensor
Shape/Size: torch.Size([2, 2])
Values: 
tensor([[ 3,  6],
        [12, 24]])


PyTorch tensor class encapsulates the data (the tensor itself) and a range of operations, such as algebraic operations, indexing, and reshaping operations. 

More importantly, we can use automatic differentiation to automate the computation of backward passes in neural networks. The autograd package in PyTorch provides exactly this functionality. When using autograd, the forward pass of your network will define a computational graph; nodes in the graph will be Tensors, and edges will be functions that produce output Tensors from input Tensors. Backpropagating through this graph then allows you to easily compute gradients.

This sounds complicated, it’s pretty simple to use in practice. Each Tensor represents a node in a computational graph. If x is a Tensor that has x.requires_grad=True then x.grad is another Tensor holding the gradient of x with respect to some scalar value.

Here we use PyTorch Tensors and autograd to implement our two-layer network; now we no longer need to manually implement the backward pass through the network, and simply use loss.backward() to do the backprop!

In [17]:
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda") ## in case you run on GPU

N, D_in, H, D_out = 64, 100, 200, 5

# create random tensors to hold input and outputs
x = torch.randn(N, D_in, device=device, dtype=dtype)*0.001
y = torch.randn(N, D_out, device=device, dtype=dtype)*0.001

# create random tensors for weights
# remember to add requires_grad=True to indicate that we want to compute gradients with
# respect to these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)


learning_rate = 1e-1
for t in range(2000):
    # clamp function implements the ReLU activation
    y_pred = x.mm(w1).clamp(min=0.).mm(w2)
    
    loss = (y_pred - y).pow(2).sum()
    if t%100 == 0:
        print (t, loss.item())
    
    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call w1.grad and w2.grad will be Tensors holding the gradient
    # of the loss with respect to w1 and w2 respectively.
    loss.backward()
    
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        
        # manually zero the gradients after updating weights
        # otherwise they will accumulate
        w1.grad.zero_()
        w2.grad.zero_()


0 4.825924873352051
100 0.9016178250312805
200 0.46214866638183594
300 0.26687490940093994
400 0.1663445234298706
500 0.10926847904920578
600 0.07445896416902542
700 0.05210120975971222
800 0.037226129323244095
900 0.027064412832260132
1000 0.019950421527028084
1100 0.014880144037306309
1200 0.011212642304599285
1300 0.008522626012563705
1400 0.006527079734951258
1500 0.005031653214246035
1600 0.0039027994498610497
1700 0.0030427188612520695
1800 0.002383644925430417
1900 0.0018749602604657412


### Summary

- Tensor and its operations

- Use Autograd to do backprop

References:

Natural Language Processing with PyTorch. O'REALLY. Delip Rao & Brian McMahan.

Official PyTorch Tutorials: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html