## Lenet-5 Inspired Network in Pytorch

Dieser Code wurde aus einem shallow zu einem dense network in ein CNN gewandelt. Daher sind teilweise die variabeln nicht angepasst (z.B. X_flat).

#### Load dependencies

In [3]:
!pip install torch-summary
import torch
import torch.nn as nn

import torchvision
from torchvision.datasets import MNIST
from torchvision import transforms

from torchsummary import summary

import matplotlib.pyplot as plt



You should consider upgrading via the 'C:\Users\Enrico\PycharmProjects\bfh-ai-1\venv\Scripts\python.exe -m pip install --upgrade pip' command.


#### Load data

In [4]:
train = MNIST('data', train=True, transform=transforms.ToTensor(), download=True)
test = MNIST('data', train=False, transform=transforms.ToTensor())
# ...toTensor() scales pixels from [0, 255] to [0, 1]

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data\MNIST\raw\train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting data\MNIST\raw\train-images-idx3-ubyte.gz to data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data\MNIST\raw\train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting data\MNIST\raw\train-labels-idx1-ubyte.gz to data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data\MNIST\raw\t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting data\MNIST\raw\t10k-images-idx3-ubyte.gz to data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data\MNIST\raw\t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting data\MNIST\raw\t10k-labels-idx1-ubyte.gz to data\MNIST\raw



#### Batch data

In [5]:
train_loader = torch.utils.data.DataLoader(train, batch_size=128, shuffle=True) 
test_loader = torch.utils.data.DataLoader(test, batch_size=128) 
# ...DataLoader() can also sample and run multithreaded over a set number of workers

In [6]:
X_sample, y_sample = iter(train_loader).next()

In [7]:
X_flat_sample_flat = X_sample.view(X_sample.shape[0], -1) # view() reshapes Tensor (confusingly)

In [8]:
X_flat_sample_flat.shape

torch.Size([128, 784])

In [9]:
X_flat_sample_2d = X_sample.view(X_sample.shape[0], 1, 28, 28) # view() reshapes Tensor (confusingly)

In [10]:
X_flat_sample_2d.shape

torch.Size([128, 1, 28, 28])

In [11]:
X_flat_sample_2d[0]

tensor([[[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0078, 0.4745, 0.9961, 0.9961, 0.3059,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,

#### Design neural network architecture

In [12]:
'''
# build our deep nn model
model = nn.Sequential( # attention! Use of Brackets! > meaning each line ends with a comma
    
     ### hidden layer 1     
    # Input channels = 1, output channels = 32
    torch.nn.Conv2d(1, 32, kernel_size=(5,5)), # 32 unique patterns of simple line orientations
    nn.ReLU(),

    ### hidden layer 2
    # Input channels = 32. output channels = 64
    torch.nn.Conv2d(32, 64, kernel_size= (5,5)), # 64 non linear recombinations of the 32 input features
    nn.ReLU(),
    
    ### pooling Layer
    nn.MaxPool2d((2,2)), # downsampling > discards 50% of the activations

#        self.pool = torch.nn.MaxPool2d((2,2), stride=2) # downsampling > discards exactly 75% of the activations
        
    ### dropout Layer
    nn.Dropout2d(0.3), # help to generalize for unseen data

    ### flatten Layer
    nn.Flatten(), # reduce dimensions

    ### hidden Layer 3
    nn.Linear(64 * 10 * 10, 128), # transform
    nn.ReLU(),
    
    ### dropout Layer
    nn.Dropout(0.5), # help to generalize for unseen data

    ### output Layer
    # input features = 128. output features = 10 for our 10 defined classes
    nn.Linear(128, 10),
    nn.ReLU(),

)
'''

'\n# build our deep nn model\nmodel = nn.Sequential( # attention! Use of Brackets! > meaning each line ends with a comma\n    \n     ### hidden layer 1     \n    # Input channels = 1, output channels = 32\n    torch.nn.Conv2d(1, 32, kernel_size=(5,5)), # 32 unique patterns of simple line orientations\n    nn.ReLU(),\n\n    ### hidden layer 2\n    # Input channels = 32. output channels = 64\n    torch.nn.Conv2d(32, 64, kernel_size= (5,5)), # 64 non linear recombinations of the 32 input features\n    nn.ReLU(),\n    \n    ### pooling Layer\n    nn.MaxPool2d((2,2)), # downsampling > discards 50% of the activations\n\n#        self.pool = torch.nn.MaxPool2d((2,2), stride=2) # downsampling > discards exactly 75% of the activations\n        \n    ### dropout Layer\n    nn.Dropout2d(0.3), # help to generalize for unseen data\n\n    ### flatten Layer\n    nn.Flatten(), # reduce dimensions\n\n    ### hidden Layer 3\n    nn.Linear(64 * 10 * 10, 128), # transform\n    nn.ReLU(),\n    \n    ### dr

In [13]:
from torch.nn.modules.flatten import Flatten
from torch.nn.modules.linear import Linear
from torch.nn.modules.pooling import MaxPool2d

# build our deep nn model
model = nn.Sequential(
    
    nn.Conv2d(1, 6, 5),
    nn.ReLU(),
    nn.MaxPool2d(2,2),
    
    nn.Conv2d(6, 16, 5),
    nn.ReLU(),
    nn.MaxPool2d(2,2),

    nn.Flatten(),
    
    nn.Linear(256, 120),
    nn.ReLU(),
    nn.Dropout(0.25),

    nn.Linear(120, 84),
    nn.ReLU(),
    nn.Dropout(0.5),

    nn.Linear(84, 10),

)

In [14]:
summary(model, (1, 28, 28))

Layer (type:depth-idx)                   Output Shape              Param #
├─Conv2d: 1-1                            [-1, 6, 24, 24]           156
├─ReLU: 1-2                              [-1, 6, 24, 24]           --
├─MaxPool2d: 1-3                         [-1, 6, 12, 12]           --
├─Conv2d: 1-4                            [-1, 16, 8, 8]            2,416
├─ReLU: 1-5                              [-1, 16, 8, 8]            --
├─MaxPool2d: 1-6                         [-1, 16, 4, 4]            --
├─Flatten: 1-7                           [-1, 256]                 --
├─Linear: 1-8                            [-1, 120]                 30,840
├─ReLU: 1-9                              [-1, 120]                 --
├─Dropout: 1-10                          [-1, 120]                 --
├─Linear: 1-11                           [-1, 84]                  10,164
├─ReLU: 1-12                             [-1, 84]                  --
├─Dropout: 1-13                          [-1, 84]                  --
├─L

Layer (type:depth-idx)                   Output Shape              Param #
├─Conv2d: 1-1                            [-1, 6, 24, 24]           156
├─ReLU: 1-2                              [-1, 6, 24, 24]           --
├─MaxPool2d: 1-3                         [-1, 6, 12, 12]           --
├─Conv2d: 1-4                            [-1, 16, 8, 8]            2,416
├─ReLU: 1-5                              [-1, 16, 8, 8]            --
├─MaxPool2d: 1-6                         [-1, 16, 4, 4]            --
├─Flatten: 1-7                           [-1, 256]                 --
├─Linear: 1-8                            [-1, 120]                 30,840
├─ReLU: 1-9                              [-1, 120]                 --
├─Dropout: 1-10                          [-1, 120]                 --
├─Linear: 1-11                           [-1, 84]                  10,164
├─ReLU: 1-12                             [-1, 84]                  --
├─Dropout: 1-13                          [-1, 84]                  --
├─L

#### Configure training hyperparameters

In [15]:
cost_fxn = nn.CrossEntropyLoss() # includes softmax activation

In [16]:
# optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
# optimizer = torch.optim.Adam(model.parameters())
optimizer = torch.optim.NAdam(model.parameters())

#### Train

In [17]:
def accuracy_pct(pred_y, true_y):
  _, prediction = torch.max(pred_y, 1) # returns maximum values, indices; fed tensor, dim to reduce
  correct = (prediction == true_y).sum().item()
  return (correct / true_y.shape[0]) * 100.0

In [18]:
n_batches = len(train_loader)
n_batches

469

In [19]:
n_epochs = 15

print('Training for {} epochs. \n'.format(n_epochs))

for epoch in range(n_epochs):
  
  avg_cost = 0.0
  avg_accuracy = 0.0
  
  for i, (X, y) in enumerate(train_loader): # enumerate() provides count of iterations  
    
    # forward propagation:
    # X_flat = X.view(X.shape[0], -1)
    X_flat = X.view(X.shape[0], 1, 28, 28)
    y_hat = model(X_flat)
    cost = cost_fxn(y_hat, y)
    avg_cost += cost / n_batches
    
    # backprop and optimization via gradient descent: 
    optimizer.zero_grad() # set gradients to zero; .backward() accumulates them in buffers
    cost.backward()
    optimizer.step()
    
    # calculate accuracy metric:
    accuracy = accuracy_pct(y_hat, y)
    avg_accuracy += accuracy / n_batches
    
    if (i + 1) % 100 == 0:
      print('Step {}'.format(i + 1))
    
  print('Epoch {}/{} complete: Cost: {:.3f}, Accuracy: {:.1f}% \n'
        .format(epoch + 1, n_epochs, avg_cost, avg_accuracy)) 

print('Training complete.')

Training for 15 epochs. 

Step 100
Step 200
Step 300
Step 400
Epoch 1/15 complete: Cost: 0.379, Accuracy: 88.4% 

Step 100
Step 200
Step 300
Step 400
Epoch 2/15 complete: Cost: 0.119, Accuracy: 96.7% 

Step 100
Step 200
Step 300
Step 400
Epoch 3/15 complete: Cost: 0.092, Accuracy: 97.5% 

Step 100
Step 200
Step 300
Step 400
Epoch 4/15 complete: Cost: 0.077, Accuracy: 98.0% 

Step 100
Step 200
Step 300
Step 400
Epoch 5/15 complete: Cost: 0.063, Accuracy: 98.2% 

Step 100
Step 200
Step 300
Step 400
Epoch 6/15 complete: Cost: 0.055, Accuracy: 98.5% 

Step 100
Step 200
Step 300
Step 400
Epoch 7/15 complete: Cost: 0.051, Accuracy: 98.5% 

Step 100
Step 200
Step 300
Step 400
Epoch 8/15 complete: Cost: 0.046, Accuracy: 98.7% 

Step 100
Step 200
Step 300
Step 400
Epoch 9/15 complete: Cost: 0.044, Accuracy: 98.8% 

Step 100
Step 200
Step 300
Step 400
Epoch 10/15 complete: Cost: 0.041, Accuracy: 98.9% 

Step 100
Step 200
Step 300
Step 400
Epoch 11/15 complete: Cost: 0.038, Accuracy: 98.9% 

Step

#### Test model

In [20]:
n_test_batches = len(test_loader)
n_test_batches

79

In [21]:
model.eval() # disables dropout (and batch norm)

with torch.no_grad(): # disables autograd, reducing memory consumption
  
  avg_test_cost = 0.0
  avg_test_acc = 0.0
  
  for X, y in test_loader:
    
    # make predictions: 
#    X_flat = X.view(X.shape[0], -1) # transforms to 1 x 784* (*with 28 x 28 pixel images)
    X_flat = X.view(X.shape[0], 1, 28, 28)
    y_hat = model(X_flat)
    
    # calculate cost: 
    cost = cost_fxn(y_hat, y)
    avg_test_cost += cost / n_test_batches
    
    # calculate accuracy:
    test_accuracy = accuracy_pct(y_hat, y)
    avg_test_acc += test_accuracy / n_test_batches

print('Test cost: {:.3f}, Test accuracy: {:.1f}%'.format(avg_test_cost, avg_test_acc))

# model.train() # 'undoes' model.eval()

Test cost: 0.036, Test accuracy: 99.1%
