# Convolutional Neural Networks with Pytorch
Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers
![alt text](https://stanford.edu/~shervine/images/architecture-cnn.png)

**Convolution layer (CONV)** ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input $I$ with respect to its dimensions. Its hyperparameters include the filter size $F$ and stride $S$. The resulting output $O$ is called *feature map* or *activation map*.
![alt text](https://stanford.edu/~shervine/images/convolution-layer-a.png)
*Remark: the convolution step can be generalized to the 1D and 3D cases as well.*

**Pooling (POOL)** ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.
* *Max pooling*: 
  Each pooling operation selects the maximum value of the current view
  *  Preserves detected features
  *  Most commonly used
![alt text](https://stanford.edu/~shervine/images/max-pooling-a.png)

* *Average pooling*: Each pooling operation averages the values of the current view
  * Downsamples feature map
  *  Used in LeNet
  ![alt text](https://stanford.edu/~shervine/images/average-pooling-a.png)
  
 **Fully Connected (FC)** ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.
 ![alt text](https://stanford.edu/~shervine/images/fully-connected.png)
 
 References:
  * [Convolutional Neural Networks cheatsheet](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks)
  * [Convolutional Neural Networks](http://cs231n.github.io/convolutional-networks/)


## Import Modules

In [0]:
import torch 
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms


In [0]:
# Device configuration
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')


## Hyper parameters

In [0]:
num_epochs = 5
num_classes = 10
batch_size = 100
learning_rate = 0.001


## MNIST dataset

In [0]:

train_dataset = torchvision.datasets.MNIST(root='../../data/',
                                           train=True, 
                                           transform=transforms.ToTensor(),
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='../../data/',
                                          train=False, 
                                          transform=transforms.ToTensor())


## Data loader

In [0]:

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

## Convolutional neural network

In [0]:
class ConvNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.fc = nn.Linear(7*7*32, num_classes)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out

In [0]:
model = ConvNet(num_classes).to(device)

## Loss and Optimizer

In [0]:

crossentropy = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)


## Train the model

In [0]:
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = crossentropy(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

## Test the model

In [0]:
model.eval()  # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total))


In [0]:
# Save the model checkpoint
torch.save(model.state_dict(), 'model.ckpt')