## PyTorch exercises

### Tensors

1. Make a tensor of size (2, 17)
2. Make a torch.FloatTensor of size (3, 1)
3. Make a torch.LongTensor of size (5, 2, 1)
  - fill the entire tensor with 7s
4. Make a torch.ByteTensor of size (5,)
  - fill the middle 3 indices with ones such that it records [0, 1, 1, 1, 0]
5. Perform a matrix multiplication of two tensors of size (2, 4) and (4, 2). Then do it in-place.
6. Do element-wise multiplication of two randomly filled $(n_1,n_2,n_3)$ tensors. Then store the result in an Numpy array.

### Forward-prop/backward-prop
1. Create a Tensor that `requires_grad` of size (5, 5).
2. Sum the values in the Tensor.
3. Multiply the tensor by 2 and assign the result to a new python variable (i.e. `x = result`)
4. Sum the variable's elements and assign to a new python variable
5. Print the gradients of all the variables
6. Now perform a backward pass on the last variable (NOTE: for each new python variable that you define, call `.retain_grad()`)
7. Print all gradients again

### Deep-forward NNs
1. Use dl_lab2. In Exercise 12 there, you had to build an $L$-layer neural network with the following structure: *[LINEAR -> RELU]$\times$(L-1) -> LINEAR -> SIGMOID*. Reimplement the manual code in PyTorch.
2. Compare test accuracy using different optimizers: SGD, Adam, Momentum.

## Implementing a deep convolutional neural network using PyTorch

### The multilayer CNN architecture

### Loading and preprocessing the data

In [20]:
import torch
import numpy as np
import torch.nn as nn
import torchvision 
from torchvision import transforms 
image_path = './'
transform = transforms.Compose([transforms.ToTensor()])

mnist_dataset = torchvision.datasets.MNIST(root=image_path, 
                                           train=True, 
                                           transform=transform, 
                                           download=True)

from torch.utils.data import Subset
mnist_valid_dataset = Subset(mnist_dataset, torch.arange(10000)) 
mnist_train_dataset = Subset(mnist_dataset, torch.arange(10000, len(mnist_dataset)))
mnist_test_dataset = torchvision.datasets.MNIST(root=image_path, 
                                           train=False, 
                                           transform=transform, 
                                           download=False)

In [21]:
from torch.utils.data import DataLoader


batch_size = 64
torch.manual_seed(1)
train_dl = DataLoader(mnist_train_dataset, batch_size, shuffle=True)
valid_dl = DataLoader(mnist_valid_dataset, batch_size, shuffle=False)

### Implementing a CNN using the torch.nn module

#### Configuring CNN layers in PyTorch

 * **Conv2d:** `torch.nn.Conv2d`
   * `out_channels`
   * `kernel_size`
   * `stride`
   * `padding`
   
   
 * **MaxPool2d:** `torch.nn.MaxPool2d`
   * `kernel_size`
   * `stride`
   * `padding`
   
   
 * **Dropout** `torch.nn.Dropout`
   * `p`

### Constructing a CNN in PyTorch

In [22]:
import torch.nn as nn
model = nn.Sequential()
model.add_module('conv1', nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5, padding=2))
model.add_module('relu1', nn.ReLU())        
model.add_module('pool1', nn.MaxPool2d(kernel_size=2))   
model.add_module('conv2', nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5, padding=2))
model.add_module('relu2', nn.ReLU())        
model.add_module('pool2', nn.MaxPool2d(kernel_size=2))      

x = torch.ones((4, 1, 28, 28))
model(x).shape

torch.Size([4, 64, 7, 7])

In [23]:
model.add_module('flatten', nn.Flatten()) 

x = torch.ones((4, 1, 28, 28))
model(x).shape

torch.Size([4, 3136])

In [24]:
model.add_module('fc1', nn.Linear(3136, 1024)) 
model.add_module('relu3', nn.ReLU()) 
model.add_module('dropout', nn.Dropout(p=0.5)) 

model.add_module('fc2', nn.Linear(1024, 10)) 

In [25]:
device = torch.device("mps")
#device = torch.device("cpu")

model = model.to(device) 

In [26]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

def train(model, num_epochs, train_dl, valid_dl):
    loss_hist_train = [0] * num_epochs
    accuracy_hist_train = [0] * num_epochs
    loss_hist_valid = [0] * num_epochs
    accuracy_hist_valid = [0] * num_epochs
    for epoch in range(num_epochs):
        model.train()
        for x_batch, y_batch in train_dl:
            x_batch = x_batch.to(device) 
            y_batch = y_batch.to(device) 
            pred = model(x_batch)
            loss = loss_fn(pred, y_batch)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            loss_hist_train[epoch] += loss.item()*y_batch.size(0)
            is_correct = (torch.argmax(pred, dim=1) == y_batch).float()
            accuracy_hist_train[epoch] += is_correct.sum().cpu()

        loss_hist_train[epoch] /= len(train_dl.dataset)
        accuracy_hist_train[epoch] /= len(train_dl.dataset)
        
        model.eval()
        with torch.no_grad():
            for x_batch, y_batch in valid_dl:
                x_batch = x_batch.to(device) 
                y_batch = y_batch.to(device) 
                pred = model(x_batch)
                loss = loss_fn(pred, y_batch)
                loss_hist_valid[epoch] += loss.item()*y_batch.size(0) 
                is_correct = (torch.argmax(pred, dim=1) == y_batch).float() 
                accuracy_hist_valid[epoch] += is_correct.sum().cpu()

        loss_hist_valid[epoch] /= len(valid_dl.dataset)
        accuracy_hist_valid[epoch] /= len(valid_dl.dataset)
        
        print(f'Epoch {epoch+1} accuracy: {accuracy_hist_train[epoch]:.4f} val_accuracy: {accuracy_hist_valid[epoch]:.4f}')
    return loss_hist_train, loss_hist_valid, accuracy_hist_train, accuracy_hist_valid

torch.manual_seed(1)
num_epochs = 20
hist = train(model, num_epochs, train_dl, valid_dl)

Epoch 1 accuracy: 0.9482 val_accuracy: 0.9816


KeyboardInterrupt: 

In [None]:
import matplotlib.pyplot as plt


x_arr = np.arange(len(hist[0])) + 1

fig = plt.figure(figsize=(12, 4))
ax = fig.add_subplot(1, 2, 1)
ax.plot(x_arr, hist[0], '-o', label='Train loss')
ax.plot(x_arr, hist[1], '--<', label='Validation loss')
ax.set_xlabel('Epoch', size=15)
ax.set_ylabel('Loss', size=15)
ax.legend(fontsize=15)
ax = fig.add_subplot(1, 2, 2)
ax.plot(x_arr, hist[2], '-o', label='Train acc.')
ax.plot(x_arr, hist[3], '--<', label='Validation acc.')
ax.legend(fontsize=15)
ax.set_xlabel('Epoch', size=15)
ax.set_ylabel('Accuracy', size=15)

#plt.savefig('figures/14_13.png')
plt.show()

In [None]:
torch.mps.synchronize()
model_cpu = model.cpu()
pred = model(mnist_test_dataset.data.unsqueeze(1) / 255.)
is_correct = (torch.argmax(pred, dim=1) == mnist_test_dataset.targets).float()
print(f'Test accuracy: {is_correct.mean():.4f}') 

In [None]:
fig = plt.figure(figsize=(12, 4))
for i in range(12):
    ax = fig.add_subplot(2, 6, i+1)
    ax.set_xticks([]); ax.set_yticks([])
    img = mnist_test_dataset[i][0][0, :, :]
    pred = model(img.unsqueeze(0).unsqueeze(1))
    y_pred = torch.argmax(pred)
    ax.imshow(img, cmap='gray_r')
    ax.text(0.9, 0.1, y_pred.item(), 
            size=15, color='blue',
            horizontalalignment='center',
            verticalalignment='center', 
            transform=ax.transAxes)
    
    
#plt.savefig('figures/14_14.png')
plt.show()

In [5]:
import os

if not os.path.exists('models'):
    os.mkdir('models')

path = 'models/mnist-cnn.ph'
torch.save(model, path)
 

NameError: name 'torch' is not defined

# Tensors
## 1. Make a tensor of size (2, 17)







In [27]:
torch.Tensor(2, 17)

tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

## 2. Make a torch.FloatTensor of size (3, 1)

In [28]:
torch.FloatTensor(3, 1)

tensor([[0.],
        [0.],
        [0.]])

## 3. Make a torch.LongTensor of size (5, 2, 1)  - fill the entire tensor with 7s

In [29]:
torch.full((5, 2, 1), 7, dtype=torch.long)

tensor([[[7],
         [7]],

        [[7],
         [7]],

        [[7],
         [7]],

        [[7],
         [7]],

        [[7],
         [7]]])

## 4. Make a torch.ByteTensor of size (5,) - fill the middle 3 indices with ones such that it records [0, 1, 1, 1, 0]

In [30]:
x = torch.ByteTensor(5,)
x[1:-1] = 1
x

tensor([0, 1, 1, 1, 0], dtype=torch.uint8)

## 5. Perform a matrix multiplication of two tensors of size (2, 4) and (4, 2). Then do it in-place.

In [31]:
print(torch.mm(torch.Tensor(2, 4), torch.Tensor(4, 2)))

torch.Tensor(2, 4).mm(torch.Tensor(4, 2))

tensor([[0., 0.],
        [0., 0.]])


tensor([[0., 0.],
        [0., 0.]])

## 6. Do element-wise multiplication of two randomly filled $(n_1,n_2,n_3)$ tensors. Then store the result in an Numpy array.

In [32]:
n1, n2, n3 = 6, 1, 5

torch.mul(torch.randn(n1, n2, n3), torch.randn(n1, n2, n3)).numpy()

array([[[ 0.28335887,  0.35737008,  1.0660723 , -0.35816902,
         -0.4373236 ]],

       [[-0.26933527, -0.17327414,  2.0608163 , -0.61397487,
         -0.3014985 ]],

       [[ 0.15328172, -0.5223642 ,  0.24036913,  1.5886065 ,
         -0.3230379 ]],

       [[ 0.12424187, -1.0215594 ,  0.12761118, -0.03351678,
          0.36481565]],

       [[ 0.95602643, -1.3625723 ,  0.70075905,  0.554543  ,
         -0.22817194]],

       [[-1.1015437 , -1.0414515 ,  0.48947513,  0.00321078,
         -0.6435812 ]]], dtype=float32)

# Forward-prop/backward-prop
## 1. Create a Tensor that `requires_grad` of size (5, 5).






In [33]:
tensor = torch.randn(5, 5, requires_grad=True)
tensor

tensor([[-0.9276,  1.1120,  0.6155,  0.1938, -2.5832],
        [ 0.8539,  1.2466,  0.5057, -1.4782,  0.6147],
        [ 0.7124, -1.7765,  0.3539,  1.1996, -0.3030],
        [-1.7618,  0.6348, -0.7893, -1.6111, -1.8716],
        [ 0.5431,  0.6607,  2.2952,  0.6749,  1.7133]], requires_grad=True)

## 2. Sum the values in the Tensor.

In [34]:
tensor.sum().item()

0.8278233408927917

## 3. Multiply the tensor by 2 and assign the result to a new python variable (i.e. `x = result`)

In [35]:
x = tensor * 2
x

tensor([[-1.8552,  2.2239,  1.2311,  0.3876, -5.1664],
        [ 1.7079,  2.4931,  1.0114, -2.9565,  1.2295],
        [ 1.4247, -3.5530,  0.7077,  2.3992, -0.6060],
        [-3.5237,  1.2697, -1.5785, -3.2222, -3.7432],
        [ 1.0862,  1.3214,  4.5904,  1.3498,  3.4266]], grad_fn=<MulBackward0>)

## 4. Sum the variable's elements and assign to a new python variable

In [36]:
y = x.sum()
y.item()

1.6556466817855835

## 5. Print the gradients of all the variables


In [37]:
y.backward()

tensor.grad

tensor([[2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.]])

## 6. Now perform a backward pass on the last variable (NOTE: for each new python variable that you define, call `.retain_grad()`)

In [38]:
tensor = torch.randn(5, 5, requires_grad=True)
tensor.retain_grad()

output = tensor * 2
output.retain_grad()  

output.sum().backward()

tensor.grad

tensor([[2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.]])

## 7. Print all gradients again

In [39]:
print(tensor.grad)
output.grad

tensor([[2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.]])


tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])

# Deep-forward NNs
## 1. Use dl_lab2. In Exercise 12 there, you had to build an $L$-layer neural network with the following structure: *[LINEAR -> RELU]$\times$(L-1) -> LINEAR -> SIGMOID*. Reimplement the manual code in PyTorch.


In [58]:
def load_data():
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels

    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels

    classes = np.array(test_dataset["list_classes"][:]) # the list of classes
    
    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
    
    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

In [59]:
train_x_orig, train_y, test_x_orig, test_y, classes = load_data()


train_x_flatten = train_x_orig.reshape(train_x_orig.shape[0], -1).T   # The "-1" makes reshape flatten the remaining dimensions
test_x_flatten = test_x_orig.reshape(test_x_orig.shape[0], -1).T

train_x = train_x_flatten/255.
test_x = test_x_flatten/255.

In [220]:
class LLayerNN(nn.Module):
    def __init__(self, layer_dims):
        super(LLayerNN, self).__init__()
        self.layer_dims = layer_dims
        self.num_layers = len(layer_dims) - 1
        self.layers = nn.ModuleList()

        for l in range(1, self.num_layers):
            self.layers.append(nn.Linear(layer_dims[l-1], layer_dims[l]))
            self.layers.append(nn.ReLU())

        self.layers.append(nn.Linear(layer_dims[-2], layer_dims[-1]))
        self.layers.append(nn.Sigmoid())

    def forward(self, x):
        for layer in self.layers:
            x = layer(x.to(torch.float32))
        return x

In [274]:
layer_dims = [12288, 10000, 3000, 1000, 500, 100, 50, 20, 7, 5, 1]

model = LLayerNN(layer_dims)
model

LLayerNN(
  (layers): ModuleList(
    (0): Linear(in_features=12288, out_features=10000, bias=True)
    (1): ReLU()
    (2): Linear(in_features=10000, out_features=3000, bias=True)
    (3): ReLU()
    (4): Linear(in_features=3000, out_features=1000, bias=True)
    (5): ReLU()
    (6): Linear(in_features=1000, out_features=500, bias=True)
    (7): ReLU()
    (8): Linear(in_features=500, out_features=100, bias=True)
    (9): ReLU()
    (10): Linear(in_features=100, out_features=50, bias=True)
    (11): ReLU()
    (12): Linear(in_features=50, out_features=20, bias=True)
    (13): ReLU()
    (14): Linear(in_features=20, out_features=7, bias=True)
    (15): ReLU()
    (16): Linear(in_features=7, out_features=5, bias=True)
    (17): ReLU()
    (18): Linear(in_features=5, out_features=1, bias=True)
    (19): Sigmoid()
  )
)

## 2. Compare test accuracy using different optimizers: SGD, Adam, Momentum.

In [281]:
def train(model, train_x, train_y, test_x, test_y, num_epochs=100, optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0), criterion=nn.CrossEntropyLoss()):
    loss_hist_train = [0] * num_epochs
    accuracy_hist_train = [0] * num_epochs
    loss_hist_valid = [0] * num_epochs
    accuracy_hist_valid = [0] * num_epochs
    
    for epoch in range(num_epochs):
        X = torch.tensor(train_x.T, dtype=torch.float32)
        Y = torch.tensor(train_y, dtype=torch.float32)
        model.train()
        outputs = model(X)
        
        loss = criterion(outputs.T.to(torch.float32), Y)
        
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        loss_hist_train[epoch] += loss.item()
        is_correct = (torch.argmax(outputs, dim=1) == torch.tensor(train_y, dtype=torch.float32)).float()
        accuracy_hist_train[epoch] += is_correct.sum().cpu()
        
        loss_hist_train[epoch] /= len(train_dl.dataset)
        accuracy_hist_train[epoch] /= len(train_dl.dataset)
        
        model.eval()
        
        with torch.no_grad():
            
            X_test = torch.tensor(test_x.T, dtype=torch.float32)
            Y_test = torch.tensor(test_y, dtype=torch.float32)
            
            pred_test = model(X_test)
            loss = criterion(pred_test.T.to(torch.float32), Y_test)
            loss_hist_valid[epoch] += loss.item() 
            is_correct = (torch.argmax(pred_test, dim=1) == Y_test).float() 
            accuracy_hist_valid[epoch] += is_correct.sum().cpu()

        loss_hist_valid[epoch] /= len(valid_dl.dataset)
        accuracy_hist_valid[epoch] /= len(valid_dl.dataset)
        
        print(f'Epoch {epoch+1} accuracy: {accuracy_hist_train[epoch]:.4f} val_accuracy: {accuracy_hist_valid[epoch]:.4f}')

    


### SGD

In [282]:
train(model, train_x, train_y, test_x, test_y)

Epoch 1 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 2 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 3 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 4 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 5 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 6 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 7 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 8 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 9 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 10 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 11 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 12 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 13 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 14 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 15 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 16 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 17 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 18 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 19 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 20 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 21 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 22 accuracy: 0.0

### Adam

In [283]:
train(model, train_x, train_y, test_x, test_y, optimizer = torch.optim.Adam(model.parameters(), lr=0.0001), num_epochs=10)

Epoch 1 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 2 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 3 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 4 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 5 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 6 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 7 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 8 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 9 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 10 accuracy: 0.0027 val_accuracy: 0.0017


### Momentum

In [284]:
train(model, train_x, train_y, test_x, test_y, optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9), num_epochs=10)

Epoch 1 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 2 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 3 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 4 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 5 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 6 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 7 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 8 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 9 accuracy: 0.0027 val_accuracy: 0.0017
Epoch 10 accuracy: 0.0027 val_accuracy: 0.0017
