---

# Practical Machine Learning with Python
# Chapter 9: Introduction to PyTorch

## Guillermo Avendaño-Franco 

### HPC Summer Workshop 2019

---

This material was elaborated from a variety of sources, mostly other notebooks on Internet.

A list of the most relevant sources for this notebook are:

 1. A class in Machine Learning at University of Turin. The original autors were Dr. Ciro Cattuto, Dr. Laetitia Gauvin and Dr. André Panisson. The material was adapted to fit the delivery format of a workshop on Practical Machine Learning on Python for graduate students and Faculty at West Virginia University. The original notebooks can be downloaded from [Rugantio Costa's Github Repository](https://github.com/rugantio/MachineLearningCourse.git)

 2. An introduction to Machine Learning from Yale Digital Humanities Lab, a series of notebooks can be found at:
    <https://github.com/YaleDHLab/lab-workshops.git>
    
 3. The notebooks from Aurelien Geron author of Hands-on Machine Learning with Scikit-learn and TensorFlow. The notebooks are located here <https://github.com/ageron/handson-ml>

# Introduction to PyTorch

PyTorch is a Python-based scientific computing package that is very similar to Numpy and is intended to be used in two contexts:

 1. As replacement for NumPy when you want to take advantage of using the power of GPUs
    
 2. As a full featured Deep Learning research platform that provides maximum flexibility and speed

The idea on PyTorch is to use the familiarity of Numpy ndarrays and allow the processing to happen on GPUs if they are available.

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import torch

## Creation

There are several tensor constructors. Similar to those in Numpy.

In [3]:
x = torch.zeros(4,4)
x

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

There is a constructor for uninitalized variables:

In [11]:
y = torch.empty(5,5)
y

tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00, -1.0842e-19,  0.0000e+00, -1.0842e-19],
        [ 9.8091e-45, -1.0842e-19,  0.0000e+00, -1.0842e-19,  8.4078e-45]])

The values contain the remanents of the data in the locations of memory where the object is created.

A Tensor with random values in its entries:

In [14]:
x = torch.rand(3,3)
x

tensor([[0.1756, 0.2438, 0.2049],
        [0.1692, 0.0251, 0.7472],
        [0.4917, 0.8754, 0.5943]])

In [23]:
x = torch.randint(0,9,(3,3))
x

tensor([[4, 6, 4],
        [6, 1, 5],
        [6, 1, 6]])

In [None]:
x = torch.randint

Similar to Numpy, new tensor can be created from lists and lists of lists:

In [16]:
x = torch.tensor([3.14, 1.67])
x

tensor([3.1400, 1.6700])

New tensor sharing properties from an existing object are created from `.new_*` methods:

In [30]:
x=torch.tensor(range(16), dtype=torch.int8).reshape(4,4)
x

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]], dtype=torch.int8)

In [34]:
x.new_empty((2,2))

tensor([[0, 0],
        [0, 0]], dtype=torch.int8)

In [33]:
x.new_full((2,2),2.3)

tensor([[2, 2],
        [2, 2]], dtype=torch.int8)

In [35]:
x.new_ones((2,2))

tensor([[1, 1],
        [1, 1]], dtype=torch.int8)

In [36]:
x.new_zeros((2,2))

tensor([[0, 0],
        [0, 0]], dtype=torch.int8)

## Operations

In [40]:
x=torch.rand((2,3))
print(x)
y=torch.rand((2,3))
print(y)

tensor([[0.5005, 0.6047, 0.5775],
        [0.4108, 0.5556, 0.6435]])
tensor([[0.5355, 0.3402, 0.7948],
        [0.4382, 0.2077, 0.4019]])


In [41]:
x+y

tensor([[1.0360, 0.9449, 1.3723],
        [0.8489, 0.7633, 1.0453]])

In [42]:
torch.add(x,y)

tensor([[1.0360, 0.9449, 1.3723],
        [0.8489, 0.7633, 1.0453]])

In [44]:
z=torch.empty((2,3))
torch.add(x,y, out=z)
z

tensor([[1.0360, 0.9449, 1.3723],
        [0.8489, 0.7633, 1.0453]])

In [45]:
y.add(x)

tensor([[1.0360, 0.9449, 1.3723],
        [0.8489, 0.7633, 1.0453]])

In [46]:
y.add_(x)
y

tensor([[1.0360, 0.9449, 1.3723],
        [0.8489, 0.7633, 1.0453]])

## Numpy like operations

Traditional operations for Numpy apply to tensors

In [49]:
x=torch.tensor(range(64)).reshape(8,8)
x

tensor([[ 0,  1,  2,  3,  4,  5,  6,  7],
        [ 8,  9, 10, 11, 12, 13, 14, 15],
        [16, 17, 18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29, 30, 31],
        [32, 33, 34, 35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44, 45, 46, 47],
        [48, 49, 50, 51, 52, 53, 54, 55],
        [56, 57, 58, 59, 60, 61, 62, 63]])

In [53]:
y=x[2:5,:]
y

tensor([[16, 17, 18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29, 30, 31],
        [32, 33, 34, 35, 36, 37, 38, 39]])

Same as Numpy, those are views and changing the view, change the undelying tensor.

In [55]:
y[0,0]=100
x

tensor([[  0,   1,   2,   3,   4,   5,   6,   7],
        [  8,   9,  10,  11,  12,  13,  14,  15],
        [100,  17,  18,  19,  20,  21,  22,  23],
        [ 24,  25,  26,  27,  28,  29,  30,  31],
        [ 32,  33,  34,  35,  36,  37,  38,  39],
        [ 40,  41,  42,  43,  44,  45,  46,  47],
        [ 48,  49,  50,  51,  52,  53,  54,  55],
        [ 56,  57,  58,  59,  60,  61,  62,  63]])

In [58]:
y=x.view(64)
z=x.view(-1,16)
print(x.size(),y.size(),z.size())

torch.Size([8, 8]) torch.Size([64]) torch.Size([4, 16])


## Data extraction into Python numbers and lists

In [65]:
y=x[6,6]
y

tensor(54)

In [66]:
y.item()

54

In [68]:
x.tolist()

[[0, 1, 2, 3, 4, 5, 6, 7],
 [8, 9, 10, 11, 12, 13, 14, 15],
 [100, 17, 18, 19, 20, 21, 22, 23],
 [24, 25, 26, 27, 28, 29, 30, 31],
 [32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47],
 [48, 49, 50, 51, 52, 53, 54, 55],
 [56, 57, 58, 59, 60, 61, 62, 63]]

## Torch Tensors {to, from} Numpy Arrays 

In [74]:
x=torch.tensor(range(36)).reshape(6,6)
x

tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17],
        [18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35]])

In [75]:
y=x.numpy()
y

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

The Torch Tensor and NumPy array will share their underlying memory locations (if the Torch Tensor is on CPU), and changing one will change the other.

In [76]:
x[0,0]=999
y

array([[999,   1,   2,   3,   4,   5],
       [  6,   7,   8,   9,  10,  11],
       [ 12,  13,  14,  15,  16,  17],
       [ 18,  19,  20,  21,  22,  23],
       [ 24,  25,  26,  27,  28,  29],
       [ 30,  31,  32,  33,  34,  35]])

Conversion from Numpy to Torch preserving memory addressing is also possible


In [85]:
xn=np.random.rand(3,3)
xn

array([[0.65551616, 0.57391767, 0.46032994],
       [0.26434871, 0.53665938, 0.39121073],
       [0.07917681, 0.12344948, 0.18009185]])

In [86]:
xt=torch.from_numpy(xn)
xt

tensor([[0.6555, 0.5739, 0.4603],
        [0.2643, 0.5367, 0.3912],
        [0.0792, 0.1234, 0.1801]], dtype=torch.float64)

In [87]:
xt[0,0]=100
xn

array([[1.00000000e+02, 5.73917671e-01, 4.60329936e-01],
       [2.64348709e-01, 5.36659382e-01, 3.91210726e-01],
       [7.91768079e-02, 1.23449484e-01, 1.80091852e-01]])

# Autograd: Automatic gradient evaluation

In [99]:
x = torch.ones(3, 3, requires_grad=True)
print(x)

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], requires_grad=True)


In [100]:
y = x + 3.14
print(y)

tensor([[4.1400, 4.1400, 4.1400],
        [4.1400, 4.1400, 4.1400],
        [4.1400, 4.1400, 4.1400]], grad_fn=<AddBackward0>)


In [101]:
y.grad_fn

<AddBackward0 at 0x1209798d0>

In [102]:
z = y**2 * y * 3
out = z.mean()

print(z, out)

tensor([[212.8739, 212.8739, 212.8739],
        [212.8739, 212.8739, 212.8739],
        [212.8739, 212.8739, 212.8739]], grad_fn=<MulBackward0>) tensor(212.8739, grad_fn=<MeanBackward0>)


The property of being tracked for gradient computation can be enable after the object is created

In [96]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x1219cb590>


## Gradients

In [103]:
out.backward()

In [104]:
x.grad

tensor([[17.1396, 17.1396, 17.1396],
        [17.1396, 17.1396, 17.1396],
        [17.1396, 17.1396, 17.1396]])

# Neural Networks

In [132]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)


Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


In [133]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

10
torch.Size([6, 1, 3, 3])


In [134]:
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

tensor([[-0.0330, -0.0107, -0.1141,  0.0615, -0.0580, -0.0052, -0.0361,  0.1123,
          0.0832, -0.0408]], grad_fn=<AddmmBackward>)


In [135]:
input.shape

torch.Size([1, 1, 32, 32])

In [136]:
net.zero_grad()
out.backward(torch.randn(1, 10))

In [137]:
output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(1.0492, grad_fn=<MseLossBackward>)


In [138]:
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU


<MseLossBackward object at 0x1219da250>
<AddmmBackward object at 0x1219da3d0>
<AccumulateGrad object at 0x1219da250>


In [139]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)


conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-1.3402e-04,  1.4335e-02, -2.1720e-05, -4.2137e-03, -2.3313e-03,
         6.1429e-03])


In [140]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update
