#Lecture4.1: Convolutional neural network

## 1. About Convolutional Neural Network


### 1.1 Transition From Neural Network

#### Hidden Layer Neural Network



> So let's do a recap of what we covered in the Neural Network section using a simple NN with 1 hidden layer (a pair of affine function and non-linear function)
>    
> 1. [Yellow box] Pass input into an affine function $\boldsymbol{y} = A\boldsymbol{x} + \boldsymbol{b}$
> 2. [Pink box] Pass logits to non-linear function, for example sigmoid, tanh (hyperbolic tangent), ReLU, or LeakyReLU
> 3. [Blue box] Pass output of non-linear function to another affine function
> 4. [Red box] Pass output of final affine function to softmax function to get our probability distribution over K classes
> 5. [Purple box] Finally we can get our loss by using our cross entropy function

<img src="https://docs.google.com/uc?export=download&id=1Df14oUuKdj_Xi3s-Uy9WE6CBwy1dK-vi" alt="no_image" style="width: 900px;"/>

#### Basic Convolutional Neural Network (CNN)

- A basic CNN just requires 2 additional layers!
    - **Convolution** and **pooling** layers **before our feedforward neural network**

<img src="https://docs.google.com/uc?export=download&id=1K_GFyEwf22jwKSeES1jVckoTBKg7slij" alt="no_image" style="width: 900px;"/>

### 1.2 One Convolutional Layer: High Level View

<img src="https://docs.google.com/uc?export=download&id=1sPfrxnF94eAppJ9pbwJtdqZ72Kj9qEs1" alt="no_image" style="width: 900px;"/>
<img src="https://docs.google.com/uc?export=download&id=1-h5yLnUHl0dS7KM-RD6Y-mJEgJIZ0PHc" alt="no_image" style="width: 900px;"/>
<img src="https://docs.google.com/uc?export=download&id=1sPfrxnF94eAppJ9pbwJtdqZ72Kj9qEs1" alt="no_image" style="width: 900px;"/>
<img src="https://docs.google.com/uc?export=download&id=1lVK_eBUvEZ5ciYLknXxuUt3sjrbG3BKc" alt="no_image" style="width: 900px;"/>
<img src="https://docs.google.com/uc?export=download&id=1uiL61vs8wizkLa1Wg4HVIHA7xgZZjq34" alt="no_image" style="width: 900px;"/>

### 1.3 One Convolutional Layer: High Level View Summary
<img src="https://docs.google.com/uc?export=download&id=1sPfrxnF94eAppJ9pbwJtdqZ72Kj9qEs1" alt="no_image" style="width: 900px;"/>

- As the **kernel is sliding/convolving** across the image $\rightarrow$ 2 operations done **per patch**
    1. Element-wise multiplication
    2. Summation
- More **kernels** $=$ more **feature map channels**
    - Can capture **more information** about the input
    

### 1.4 Multiple Convolutional Layers: High Level View
<img src="https://docs.google.com/uc?export=download&id=1Hm9vPsTdwLTOPs-eAwoRmUhAHVs7NuPJ" alt="no_image" style="width: 900px;"/>

### 1.5 Pooling Layer: High Level View
- 2 Common Types
    - Max Pooling
    - Average Pooling

<img src="https://docs.google.com/uc?export=download&id=1ucAjOwLsVc1QVEQt4X7entdToJRWxcwK" alt="no_image" style="width: 900px;"/>

### 1.6 Multiple Pooling Layers: High Level View
<img src="https://docs.google.com/uc?export=download&id=1Hm9vPsTdwLTOPs-eAwoRmUhAHVs7NuPJ" alt="no_image" style="width: 900px;"/>

### 1.7 Padding
<img src="https://docs.google.com/uc?export=download&id=1aAr0tpTkIf1ggO9HpUSRTv3QAD5S8PCM" alt="no_image" style="width: 900px;"/>

### 1.8 Padding Summary
- **Valid** Padding
    - Output size < Input Size
- **Same** Padding
    - Output size = Input Size

### 1.9 Dimension Calculations
- $ O = \frac {W - K + 2P}{S} + 1$
    - $O$: output height/length
    - $W$: input height/length
    - $K$: filter size (kernel size)
    - $P$: padding
        - $ P = \frac{K - 1}{2} $
    - $S$: stride
    
#### Example 1: Output Dimension Calculation for Valid Padding (Stride = 1, Kernel = 3 x 3, Padding = 0)
<img src="https://docs.google.com/uc?export=download&id=1h6TqOvPe0nnhqT3lnnc0uiMle0B9SwjA" alt="no_image" style="width: 900px;"/>

- $W = 4$
- $K = 3$
- $P = 0$
- $S = 1$
- $O = \frac {4 - 3 + 2*0}{1} + 1 = \frac {1}{1} + 1 = 1 + 1 = 2 $


#### Example 2: Output Dimension Calculation for Same Padding (Stride = 1, Kernel = 3 x 3, Padding = 1)
<img src="https://docs.google.com/uc?export=download&id=1rsYzCGOPMTKNh3ZTFtnegVU2Kpa871c3" alt="no_image" style="width: 900px;"/>

- $W = 5$
- $K = 3$
- $P = \frac{3 - 1}{2} = \frac{2}{2} = 1 $
- $S = 1 $
- $O = \frac {5 - 3 + 2*1}{1} + 1 = \frac {4}{1} + 1 = 5$

## 2. Building a Convolutional Neural Network with PyTorch

### Model A:
- 2 Convolutional Layers
    - Same Padding (same output size)
- 2 Max Pooling Layers
- 1 Fully Connected Layer
<img src="https://docs.google.com/uc?export=download&id=1kL3Ch9Pm7yspyGcICCmo60i1BLYp6cHc" alt="no_image" style="width: 900px;"/>

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

### Step 1: Loading MNIST Train Dataset
#### Images from 1 to 9

**MNIST Dataset and Size of Training Dataset (Excluding Labels)**

In [None]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets

In [None]:
train_dataset = dsets.MNIST(root='./data/MNIST/',
                            train=True,
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data/MNIST/',
                           train=False,
                           transform=transforms.ToTensor())

In [None]:
print(train_dataset.train_data.size())

torch.Size([60000, 28, 28])


**Size of our training dataset labels**

In [None]:
print(train_dataset.train_labels.size())

torch.Size([60000])


**Size of our testing dataset (excluding labels)**

In [None]:
print(test_dataset.test_data.size())

torch.Size([10000, 28, 28])


**Size of our testing dataset labels**

In [None]:
print(test_dataset.test_labels.size())

torch.Size([10000])


### Step 2: Make Dataset Iterable

**Load Dataset into Dataloader**

In [None]:
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

In [None]:
num_epochs

5

### Step 3: Create Model Class

### Step 3: Create Model Class
<img src="https://docs.google.com/uc?export=download&id=1kL3Ch9Pm7yspyGcICCmo60i1BLYp6cHc" alt="no_image" style="width: 900px;"/>

#### Two convolution layers
- $ O = \frac {W - K + 2P}{S} + 1$
    - $K$: **filter size (kernel size) = 5**
    - $P$: **same padding**
    - $S$: **stride = 1**

#### Two max-pooling layers
- $ O = \frac {W - K}{S} + 1$
    - $K$: **filter size (kernel size) = 2**
    - $S$: **stride size = filter size**


**Define our simple 2 convolutional layer CNN**

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()

        # Conv1 layer: output dim 16
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=2)
        self.relu1 = nn.ReLU()
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)

        # Conv2 layer: output dim 32
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2)
        self.relu2 = nn.ReLU()
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        # Calculate the size of the input to the fully connected layer
        # Assuming input image size is 28x28 (e.g., MNIST dataset)
        self.fc_input_dim = 32 * 7 * 7  # After two conv and two pool layers

        # FC layer; output dim: 10
        self.fc1 = nn.Linear(self.fc_input_dim, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.maxpool1(x)

        x = self.conv2(x)
        x = self.relu2(x)
        x = self.maxpool2(x)

        x = x.view(-1, self.fc_input_dim)  # Flatten the tensor
        x = self.fc1(x)
        return x

### Step 4: Instantiate Model Class

**Our model**

In [None]:
model = CNNModel()

### Step 5: Instantiate Loss Class

In [None]:
criterion = nn.CrossEntropyLoss()

### Step 6: Instantiate Optimizer Class

- Simplified equation
    - $\theta = \theta - \eta \cdot \nabla_\theta$
        - $\theta$: parameters
        - $\eta$: learning rate (how fast we want to learn)
        - $\nabla_\theta$: parameters' gradients
- Even simplier equation
    - `parameters = parameters - learning_rate * parameters_gradients`
    - **At every iteration, we update our model's parameters**

**Optimizer**

In [None]:
learning_rate = 0.01

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

### Parameters In-Depth

**Print model's parameter**

In [None]:
print(model.parameters())

<generator object Module.parameters at 0x7f47ecd66408>


In [None]:
print(len(list(model.parameters())))

6


In [None]:
# Convolution 1
print(list(model.parameters())[0].size())

In [None]:
# Convolution 1 Bias
print(list(model.parameters())[1].size())

In [None]:
# Convolution 2
print(list(model.parameters())[2].size())

In [None]:
# Convolution 2 Bias
print(list(model.parameters())[3].size())

In [None]:
# Fully Connected Layer 1
print(list(model.parameters())[4].size())

In [None]:
# Fully Connected Layer Bias
print(list(model.parameters())[5].size())

### Step 7: Train Model

- Process
    1. **Convert inputs to tensors with gradient accumulation abilities**
        - CNN Input: (1, 28, 28)
        - Feedforward NN Input: (1, 28*28)
    2. Clear gradient buffets
    3. Get output given inputs
    4. Get loss
    5. Get gradients w.r.t. parameters
    6. Update parameters using gradients
        - `parameters = parameters - learning_rate * parameters_gradients`
    7. REPEAT

**Model training**

In [None]:
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):

        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()

        # Forward pass to get output/logits
        outputs = model(images)

        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)

        # Getting gradients w.r.t. parameters
        loss.backward()

        # Updating parameters
        optimizer.step()

        iter += 1

        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:

                # Forward pass only to get logits/output
                outputs = model(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)

                # Total number of labels
                total += labels.size(0)

                # Total correct predictions
                correct += (predicted == labels).sum()

            accuracy = 100 * correct.item() / total

            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.item(), accuracy))

Iteration: 500. Loss: 0.3667975962162018. Accuracy: 89.48
Iteration: 1000. Loss: 0.20537136495113373. Accuracy: 93.25
Iteration: 1500. Loss: 0.15895791351795197. Accuracy: 94.72
Iteration: 2000. Loss: 0.2010471671819687. Accuracy: 96.01
Iteration: 2500. Loss: 0.17495428025722504. Accuracy: 96.2
Iteration: 3000. Loss: 0.34521064162254333. Accuracy: 96.79


### Model B:
- 2 Convolutional Layers
    - Same Padding (same output size)
- 2 **Average Pooling** Layers
- 1 Fully Connected Layer
<img src="https://docs.google.com/uc?export=download&id=13kU2k4ibhEy9YUGP4KHcv0tOkqiPjiOI" alt="no_image" style="width: 900px;"/>
<img src="https://docs.google.com/uc?export=download&id=11VqeA2765HZw_t65FlbxrjStKasnyV86" alt="no_image" style="width: 900px;"/>

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

**2 Conv + 2 Average Pool + 1 FC (Zero Padding, Same Padding)**

In [None]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data/MNIST/',
                            train=True,
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data/MNIST/',
                           train=False,
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()

        # Convolution 1
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=2)
        self.relu1 = nn.ReLU()

        # Average pool 1
        self.avgpool1 = nn.AvgPool2d(kernel_size=2)

        # Convolution 2
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2)
        self.relu2 = nn.ReLU()

        # Average pool 2
        self.avgpool2 = nn.AvgPool2d(kernel_size=2)

        # Fully connected 1 (readout)
        self.fc1 = nn.Linear(32 * 7 * 7, 10)

    def forward(self, x):
        # Convolution 1
        out = self.cnn1(x)
        out = self.relu1(out)

        # Average pool 1
        out = self.avgpool1(out)

        # Convolution 2
        out = self.cnn2(out)
        out = self.relu2(out)

        # Max pool 2
        out = self.avgpool2(out)

        # Resize
        # Original size: (100, 32, 7, 7)
        # out.size(0): 100
        # New out size: (100, 32*7*7)
        out = out.view(out.size(0), -1)

        # Linear function (readout)
        out = self.fc1(out)

        return out

'''
STEP 4: INSTANTIATE MODEL CLASS
'''

model = CNNModel()

'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()


'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.01

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN THE MODEL
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):

        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()

        # Forward pass to get output/logits
        outputs = model(images)

        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)

        # Getting gradients w.r.t. parameters
        loss.backward()

        # Updating parameters
        optimizer.step()

        iter += 1

        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:

                # Forward pass only to get logits/output
                outputs = model(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)

                # Total number of labels
                total += labels.size(0)

                # Total correct predictions
                correct += (predicted == labels).sum()

            accuracy = 100 * correct.item() / total

            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.item(), accuracy))

Iteration: 500. Loss: 0.5134146809577942. Accuracy: 85.56
Iteration: 1000. Loss: 0.3785870671272278. Accuracy: 88.82
Iteration: 1500. Loss: 0.4261114001274109. Accuracy: 89.75
Iteration: 2000. Loss: 0.2990914583206177. Accuracy: 91.14
Iteration: 2500. Loss: 0.2322869449853897. Accuracy: 92.26
Iteration: 3000. Loss: 0.18548358976840973. Accuracy: 92.71


> **Comparison of accuracies**
>
> It seems like average pooling test accuracy is less than the max pooling accuracy! Does this mean average pooling is better? This is not definitive and depends on a lot of factors including the model's architecture, seed (that affects random weight initialization) and more.

### Average Pooling Test Accuracy < Max Pooling Test Accuracy

### Model C:
- 2 Convolutional Layers
    - **Valid Padding** (smaller output size)
- 2 **Max Pooling** Layers
- 1 Fully Connected Layer
<img src="https://docs.google.com/uc?export=download&id=1LQFFmKWPMk6c1RRqEO4FZBGbBVajnEvq" alt="no_image" style="width: 900px;"/>


#### Two convolution layers
- $ O = \frac {W - K + 2P}{S} + 1$
    - $K$: filter size (kernel size) = 5
    - $P$: **without padding**
    - $S$: stride = 1

#### Two max-pooling layers
- $ O = \frac {W - K}{S} + 1$
    - $K$: filter size (kernel size) = 2
    - $S$: stride size = filter size

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

**2 Conv + 2 Max Pool + 1 FC (Valid Padding, No Padding)**

In [None]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data/MNIST/',
                            train=True,
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data/MNIST/',
                           train=False,
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()

        ## TODO ##

        ## End of TODO ##

    def forward(self, x):

        ## TODO ##

        ## End of TODO ##

        return out

'''
STEP 4: INSTANTIATE MODEL CLASS
'''

model = CNNModel()

'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()


'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.01

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN THE MODEL
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):

        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()

        # Forward pass to get output/logits
        outputs = model(images)

        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)

        # Getting gradients w.r.t. parameters
        loss.backward()

        # Updating parameters
        optimizer.step()

        iter += 1

        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:

                # Forward pass only to get logits/output
                outputs = model(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)

                # Total number of labels
                total += labels.size(0)

                # Total correct predictions
                correct += (predicted == labels).sum()

            accuracy = 100 * correct.item() / total

            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.item(), accuracy))

Iteration: 500. Loss: 0.30961570143699646. Accuracy: 88.91
Iteration: 1000. Loss: 0.3901764154434204. Accuracy: 92.62
Iteration: 1500. Loss: 0.14511889219284058. Accuracy: 94.6
Iteration: 2000. Loss: 0.12225310504436493. Accuracy: 95.11
Iteration: 2500. Loss: 0.1264938861131668. Accuracy: 95.94
Iteration: 3000. Loss: 0.16216857731342316. Accuracy: 96.55


### Summary of Results

| Model A | Model B | Model C |
|------|------|------|
|   Max Pooling  | Average Pooling | Max Pooling |
| Same Padding | Same Padding | Valid Padding |
| 97.04% | 93.59% | 96.5% |


| All Models |
|------|
| INPUT $\rightarrow$ CONV $\rightarrow$ POOL $\rightarrow$ CONV $\rightarrow$ POOL $\rightarrow$ FC |
| Convolution Kernel Size = 5 x 5 |
| Convolution Kernel Stride = 1 |
| Pooling Kernel Size = 2 x 2 |

### General Deep Learning Notes on CNN and FNN
- 3 ways to expand a convolutional neural network
    - More convolutional layers
    - Less aggressive downsampling
        - Smaller kernel size for pooling (gradually downsampling)
    - More fully connected layers
- Cons
    - Need a larger dataset
        - Curse of dimensionality
    - Does not necessarily mean higher accuracy


## 3. Building a Convolutional Neural Network with PyTorch (GPU)
### Model A

<img src="https://docs.google.com/uc?export=download&id=1kL3Ch9Pm7yspyGcICCmo60i1BLYp6cHc" alt="no_image" style="width: 900px;"/>
<img src="https://docs.google.com/uc?export=download&id=1pk-vIRpGphx2MWxWme6alknQXHojCKQ_" alt="no_image" style="width: 900px;"/>

GPU: 2 things must be on GPU
- `model`
- `tensors`

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- **Step 4: Instantiate Model Class**
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- **Step 7: Train Model**

**2 Conv + 2 Max Pooling + 1 FC (Same Padding, Zero Padding)**

In [None]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data/MNIST/',
                            train=True,
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data/MNIST/',
                           train=False,
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()

        # Convolution 1
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=0)
        self.relu1 = nn.ReLU()

        # Max pool 1
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)

        # Convolution 2
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=0)
        self.relu2 = nn.ReLU()

        # Max pool 2
        self.maxpool2 = nn.MaxPool2d(kernel_size=2)

        # Fully connected 1 (readout)
        self.fc1 = nn.Linear(32 * 4 * 4, 10)

    def forward(self, x):
        # Convolution 1
        out = self.cnn1(x)
        out = self.relu1(out)

        # Max pool 1
        out = self.maxpool1(out)

        # Convolution 2
        out = self.cnn2(out)
        out = self.relu2(out)

        # Max pool 2
        out = self.maxpool2(out)

        # Resize
        # Original size: (100, 32, 7, 7)
        # out.size(0): 100
        # New out size: (100, 32*7*7)
        out = out.view(out.size(0), -1)

        # Linear function (readout)
        out = self.fc1(out)

        return out

'''
STEP 4: INSTANTIATE MODEL CLASS
'''

model = CNNModel()

#######################
#  USE GPU FOR MODEL  #
#######################

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)

'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()


'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.01

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN THE MODEL
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):

        #######################
        #  USE GPU FOR MODEL  #
        #######################
        images = images.to(device)
        labels = labels.to(device)

        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()

        # Forward pass to get output/logits
        outputs = model(images)

        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)

        # Getting gradients w.r.t. parameters
        loss.backward()

        # Updating parameters
        optimizer.step()

        iter += 1

        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                images = images.to(device)
                labels = labels.to(device)

                # Forward pass only to get logits/output
                outputs = model(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)

                # Total number of labels
                total += labels.size(0)

                #######################
                #  USE GPU FOR MODEL  #
                #######################
                # Total correct predictions
                if torch.cuda.is_available():
                    correct += (predicted.cpu() == labels.cpu()).sum()
                else:
                    correct += (predicted == labels).sum()

            accuracy = 100 * correct.item() / total

            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.item(), accuracy))

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:10<00:00, 979780.99it/s] 


Extracting ./data/MNIST/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 158100.34it/s]


Extracting ./data/MNIST/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:01<00:00, 1489363.01it/s]


Extracting ./data/MNIST/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 3371178.33it/s]


Extracting ./data/MNIST/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/MNIST/raw

Iteration: 500. Loss: 0.4038761258125305. Accuracy: 89.52
Iteration: 1000. Loss: 0.28193721175193787. Accuracy: 93.49
Iteration: 1500. Loss: 0.1496436446905136. Accuracy: 94.74
Iteration: 2000. Loss: 0.1562383770942688. Accuracy: 95.58
Iteration: 2500. Loss: 0.08470043540000916. Accuracy: 96.2
Iteration: 3000. Loss: 0.048186045140028. Accuracy: 96.71


> **More Efficient Convolutions via Toeplitz Matrices**
>
> This is beyond the scope of this particular lesson. But now that we understand how convolutions work, it is critical to know that it is quite an inefficient operation if we use for-loops to perform our 2D convolutions (5 x 5 convolution kernel size for example) on our 2D images (28 x 28 MNIST image for example).
>
> A more efficient implementation is in converting our convolution kernel into a Toeplitz matrix and our image into a vector. Then, we will do just one matrix operation using our Toeplitz matrix and vector.
>
> There will be a whole lesson dedicated to this operation released down the road.

## Summary

- Transition from **Feedforward Neural Network**
    - Addition of **Convolutional** & **Pooling** Layers before Linear Layers
- One **Convolutional** Layer Basics
- One **Pooling** Layer Basics
    - Max pooling
    - Average pooling
- **Padding**
- **Output Dimension** Calculations and Examples
    -  $ O = \frac {W - K + 2P}{S} + 1$
- Convolutional Neural Networks
    - **Model A**: 2 Conv + 2 Max pool + 1 FC
        - Same Padding
    - **Model B**: 2 Conv + 2 Average pool + 1 FC
        - Same Padding
    - **Model C**: 2 Conv + 2 Max pool + 1 FC
        - Valid Padding
- Model Variation in **Code**
    - Modifying only step 3
- Ways to Expand Model’s **Capacity**
    - More convolutions
    - Gradual pooling
    - More fully connected layers
- **GPU** Code
    - 2 things on GPU
        - **model**
        - **variable**
    - Modifying only **Step 4 & Step 7**
- **7 Step** Model Building Recap
    - Step 1: Load Dataset
    - Step 2: Make Dataset Iterable
    - Step 3: Create Model Class
    - Step 4: Instantiate Model Class
    - Step 5: Instantiate Loss Class
    - Step 6: Instantiate Optimizer Class
    - Step 7: Train Model

### *References*
[1] [DOI](https://zenodo.org/badge/139945544.svg)(https://zenodo.org/badge/latestdoi/139945544)