<font color='blue'>
    
# ECE449 Machine Learning - Assignment 2

## Task 3: LeNet with PyTorch


[//]: # "Notebook Created by Jinghua Wang (jinghuawang@intl.zju.edu.cn), last modified on 2020-11-04"

</font>

*LeNet* is one of the first published CNNs to capture wide attention for its performance on computer vision image classification tasks.
The model was introduced by (and named for) Yann LeCun, then a researcher at AT&T Bell Labs, for the purpose of recognizing handwritten digits in images, in the paper "Gradient-Based Learning Applied to Document Recognition" by Yann LeCun, Leon Boottou, Yoshua Bengio, and Patrick Haffner.
The paper is available online at: http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf.


## LeNet

At a high level, LeNet (LeNet-5) consists of two parts:

(i) A convolutional encoder consisting of two convolutional layers.

(ii) A dense block consisting of three fully-connected layers.

Below is the data flow in LeNet, the input is a handwritten digit, the output a probability over 10 possible outcomes:
<img src="img/lenet.png" width="1000">


We took a small liberty with the original LeNet-5 model by removing the Gaussian activation in the final layer.
Other than that, our LeNet architechture should match the original LeNet-5 architecture.

Below is our LeNet architechture with more structural details. You may check this figure below carefully before you start your LeNet implementation.

<img src="img/lenet-vert.png" width="200">

From the structual perspective, note that the height and width of the output of each layer throughout the convolutional block are reduced (compared with the previous layer):

The first convolutional layer uses 2 pixels of padding to compensate for the reduction in height and width that would otherwise result from using a $5 \times 5$ kernel.

The second convolutional layer forgoes padding, and thus the height and width are both reduced by 4 pixels. 

As we go up the stack of layers, the number of channels increases layer-over-layer from 1 in the input to 6 after the first convolutional layer and 16 after the second convolutional layer.

Meanwhile, each pooling layer halves the height and width.

Finally, each fully-connected layer reduces dimensionality, emitting an output whose dimension matches the total number of classes in our classification problem.

Now, let's build LeNet and train this CNN using PyTorch.

<font color='blue'>
    
### We provide you this file so that you can test and debug your task 3 implementations in your .py file by yourself, you don't need to modify any code in this jupyter notebook, but you are welcome to change implementations in this notebook if you want to play with this example model training process.
    
### You don't need to submit this notebook.
    
</font>

In [1]:
# use automatic reloading for your code from task3-template.py
# remember to rename task3-template.py before you submit it.
%load_ext autoreload
%autoreload 2
%matplotlib notebook

In [2]:
import torch
from torch import nn

from task3_template import *

### 1. Load your LeNet-5 model

In [3]:
net = get_lenet()

In [4]:
X = torch.randn(size=(1, 1, 28, 28), dtype=torch.float32)
print("The output data shapes of all layers in your lenet are:")
for layer in net:
    X = layer(X)
    print(layer.__class__.__name__,'output shape: \t',X.shape)

The output data shapes of all layers in your lenet are:
Reshape output shape: 	 torch.Size([1, 1, 28, 28])
Conv2d output shape: 	 torch.Size([1, 6, 28, 28])
Sigmoid output shape: 	 torch.Size([1, 6, 28, 28])
AvgPool2d output shape: 	 torch.Size([1, 6, 14, 14])
Conv2d output shape: 	 torch.Size([1, 16, 10, 10])
Sigmoid output shape: 	 torch.Size([1, 16, 10, 10])
AvgPool2d output shape: 	 torch.Size([1, 16, 5, 5])
Flatten output shape: 	 torch.Size([1, 400])
Linear output shape: 	 torch.Size([1, 120])
Sigmoid output shape: 	 torch.Size([1, 120])
Linear output shape: 	 torch.Size([1, 84])
Sigmoid output shape: 	 torch.Size([1, 84])
Linear output shape: 	 torch.Size([1, 10])


In [5]:
print("The output data shapes of all layers in correct lenet are:")
print(get_correct_lenet_shape_str())

The output data shapes of all layers in correct lenet are:
Reshape output shape: 	 torch.Size([1, 1, 28, 28])
Conv2d output shape: 	 torch.Size([1, 6, 28, 28])
Sigmoid output shape: 	 torch.Size([1, 6, 28, 28])
AvgPool2d output shape: 	 torch.Size([1, 6, 14, 14])
Conv2d output shape: 	 torch.Size([1, 16, 10, 10])
Sigmoid output shape: 	 torch.Size([1, 16, 10, 10])
AvgPool2d output shape: 	 torch.Size([1, 16, 5, 5])
Flatten output shape: 	 torch.Size([1, 400])
Linear output shape: 	 torch.Size([1, 120])
Sigmoid output shape: 	 torch.Size([1, 120])
Linear output shape: 	 torch.Size([1, 84])
Sigmoid output shape: 	 torch.Size([1, 84])
Linear output shape: 	 torch.Size([1, 10])


#### In the two cells above, all the layer names, and the output data shapes of all the layers should be the same. 
#### (Please do not modify the layer names)

###  2. Train your LeNet model

In [6]:
batch_size = 256
train_iter, test_iter = load_data_fashion_mnist(batch_size=batch_size)

In [15]:
i = 0
for X, y in train_iter:
    print(len(X))
    print(len(y))
    i += 1
    if i > 10:
        break

RuntimeError: DataLoader worker (pid(s) 11332, 17452) exited unexpectedly

Now let us train and evaluate the LeNet-5 model using your implementation.

We recommend using Deep Learning accleration hardware (like CUDA compatible GPU). However, if you are using Mac or other computer without CUDA GPU, it is okay. After all, you only need to run LeNet training for 10 epochs.

If you are running without Deep Learning accleration hardware, we recommend that you print out test batch accuracy for every batch in every epoch in the training process, which might make things easier for debugging.

In [7]:
lr, num_epochs = 1.1, 10
# We recommend not to start training before you have got the correct LeNet-5 model
train_cnn(net, train_iter, test_iter, num_epochs, lr)

epoch: 0 | batch: 0 | cost: 2.31219482421875
epoch: 0 | batch: 50 | cost: 2.3105642795562744
epoch: 0 | batch: 100 | cost: 2.311424493789673
epoch: 0 | batch: 150 | cost: 2.294754981994629
epoch: 0 | batch: 200 | cost: 2.2430548667907715
epoch: 1 | batch: 0 | cost: 1.653808355331421
epoch: 1 | batch: 50 | cost: 1.1637475490570068
epoch: 1 | batch: 100 | cost: 1.0297045707702637
epoch: 1 | batch: 150 | cost: 0.8948513269424438
epoch: 1 | batch: 200 | cost: 0.9997429251670837
epoch: 2 | batch: 0 | cost: 0.7822579145431519
epoch: 2 | batch: 50 | cost: 0.9637014269828796
epoch: 2 | batch: 100 | cost: 0.7609109878540039
epoch: 2 | batch: 150 | cost: 0.633844256401062
epoch: 2 | batch: 200 | cost: 0.8041054606437683
epoch: 3 | batch: 0 | cost: 0.6174492835998535
epoch: 3 | batch: 50 | cost: 0.7399461269378662
epoch: 3 | batch: 100 | cost: 0.6168337464332581
epoch: 3 | batch: 150 | cost: 0.5519801378250122
epoch: 3 | batch: 200 | cost: 0.5882014036178589
epoch: 4 | batch: 0 | cost: 0.70365691

In [8]:
test_dataset_accuracy = evaluate_accuracy_try_gpu(net, test_iter)
print("After model training, your LeNet-5 achieves test dataset accuracy:",test_dataset_accuracy)

After model training, your LeNet-5 achieves test dataset accuracy: 0.8111


#### After 10 epochs of training with learning rate 0.9~1.1, batch size 256, a test dataset accuracy(on all 10000 test samples) >0.7000 is usually considered correct and will give you a full score for train_cnn function.

Congratulations on finishing task 3!