<a href="https://colab.research.google.com/github/akshatshah91/Game-AI/blob/master/pytorch_feedforward_neural_network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Check Python version

In [174]:
import sys
sys.version

'3.6.9 (default, Jul 17 2020, 12:50:27) \n[GCC 8.4.0]'

# Install PyTorch

In [175]:
# !pip3 install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl 
# !pip3 uninstall torch-0.3.0.post4
# !pip install pytorch
# import torch
# import torchvision

In [176]:
# !pip3 install torchvision

# Import PyTorch

In [177]:
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.autograd import Variable

# Initialize Hyper-parameters

In [178]:
input_size    = 784   # The image size = 28 x 28 = 784
hidden_size1  = 500   # The number of nodes at the hidden layer
hidden_size2  = 250
num_classes   = 10    # The number of output classes. In this case, from 0 to 9
num_epochs    = 5     # The number of times entire dataset is trained
batch_size    = 100   # The size of input data took for one iteration
learning_rate = 1e-3  # The speed of convergence

# Download MNIST Dataset

MNIST is a huge database of handwritten digits (i.e. 0 to 9) that is often used in image classification.

In [179]:
train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

# Load the Dataset

**Note**: We shuffle the loading process of `train_dataset` to make the learning process independent of data order, but the order of `test_loader` remains so as to examine whether we can handle unspecified bias order of inputs.


In [180]:
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

# Build the Feedforward Neural Network

### Feedforward Neural Network Model Structure

The FNN includes two fully-connected layers (i.e. fc1 & fc2) and a non-linear ReLU layer in between. Normally we call this structure **1-hidden layer FNN**, without counting the output layer (fc2) in.

By running the forward pass, the input images (x) can go through the neural network and generate a output (out) demonstrating how are the likabilities it belongs to each of the 10 classes. _For example, a cat image can have 0.8 likability to a dog class and a 0.3 likability to a airplane class._

In [181]:
class Net(nn.Module):
    def __init__(self, input_size, hidden_size1, hidden_size2, num_classes):
        super(Net, self).__init__()                    # Inherited from the parent class nn.Module
        self.fc1 = nn.Linear(input_size, hidden_size1)  # 1st Full-Connected Layer: 784 (input data) -> 500 (hidden node)
        self.relu = nn.ReLU()                          # Non-Linear ReLU Layer: max(0,x)
        self.fc3 = nn.Linear(hidden_size1, hidden_size2)
        self.relu2 = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size2, num_classes) # 2nd Full-Connected Layer: 500 (hidden node) -> 10 (output class)
    
    def forward(self, x):                              # Forward pass: stacking each layer together
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc3(out)
        out = self.relu2(out)
        out = self.fc2(out)
        return out

# Instantiate the FNN

We now create a real FNN based on our structure.

In [182]:
net = Net(input_size, hidden_size1, hidden_size2, num_classes)

# Enable GPU

_**Note**: You could enable this line to run the codes on GPU_

In [183]:
use_cuda = True

In [184]:
if use_cuda and torch.cuda.is_available():
    net.cuda()

# Choose the Loss Function and Optimizer

Loss function (**criterion**) decides how the output can be compared to a class, which determines how good or bad the neural network performs. And the **optimizer** chooses a way to update the weight in order to converge to find the best weights in this neural network.

In [185]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)

# Training the FNN Model

This process might take around 3 to 5 minutes depending on your machine. The detailed explanations are listed as comments (#) in the following codes.

In [186]:
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):   # Load a batch of images with its (index, data, class)
        images = Variable(images.view(-1, 28*28))         # Convert torch tensor to Variable: change image from a vector of size 784 to a matrix of 28 x 28
        labels = Variable(labels)
        
        if use_cuda and torch.cuda.is_available():
            images = images.cuda()
            labels = labels.cuda()
        
        optimizer.zero_grad()                             # Intialize the hidden weight to all zeros
        outputs = net(images)                             # Forward pass: compute the output class given a image
        loss = criterion(outputs, labels)                 # Compute the loss: difference between the output class and the pre-given label
        loss.backward()                                   # Backward pass: compute the weight
        optimizer.step()                                  # Optimizer: update the weights of hidden nodes
        
        if (i+1) % 100 == 0:                              # Logging
            print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
                 %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.data))


Epoch [1/5], Step [100/600], Loss: 0.2622
Epoch [1/5], Step [200/600], Loss: 0.2559
Epoch [1/5], Step [300/600], Loss: 0.2385
Epoch [1/5], Step [400/600], Loss: 0.1566
Epoch [1/5], Step [500/600], Loss: 0.1771
Epoch [1/5], Step [600/600], Loss: 0.2524
Epoch [2/5], Step [100/600], Loss: 0.1553
Epoch [2/5], Step [200/600], Loss: 0.1212
Epoch [2/5], Step [300/600], Loss: 0.0581
Epoch [2/5], Step [400/600], Loss: 0.0878
Epoch [2/5], Step [500/600], Loss: 0.0511
Epoch [2/5], Step [600/600], Loss: 0.0334
Epoch [3/5], Step [100/600], Loss: 0.0283
Epoch [3/5], Step [200/600], Loss: 0.1541
Epoch [3/5], Step [300/600], Loss: 0.0661
Epoch [3/5], Step [400/600], Loss: 0.0564
Epoch [3/5], Step [500/600], Loss: 0.1814
Epoch [3/5], Step [600/600], Loss: 0.0423
Epoch [4/5], Step [100/600], Loss: 0.0050
Epoch [4/5], Step [200/600], Loss: 0.0193
Epoch [4/5], Step [300/600], Loss: 0.0989
Epoch [4/5], Step [400/600], Loss: 0.0324
Epoch [4/5], Step [500/600], Loss: 0.0579
Epoch [4/5], Step [600/600], Loss:

# Testing the FNN Model

Similar to training the neural network, we also need to load batches of test images and collect the outputs. The differences are that:

1. No loss & weights calculation
2. No wights update
3. Has correct prediction calculation


In [187]:
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images.view(-1, 28*28))
    
    if use_cuda and torch.cuda.is_available():
        images = images.cuda()
        labels = labels.cuda()
    
    
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)  # Choose the best class from the output: The class with the best score
    total += labels.size(0)                    # Increment the total count
    correct += (predicted == labels).sum()     # Increment the correct count
    
print('Accuracy of the network on the 10K test images: %d %%' % (100 * correct // total))

Accuracy of the network on the 10K test images: 98 %


# Save the trained FNN Model for future use

We save the trained model as a pickle that can be loaded and used later.

In [188]:
#torch.save(net.state_dict(), 'fnn_model.pkl')

# Congrats

You have done building your first Feedforward Neural Network!

The default variables with 1 hidden layer consistently had about 97%-98% accuracy after 5 epochs. After adding a middle layer and an additional ReLU layer and both increasing and decreasing the nodes in the hidden layers the accuracy after 5 epochs stayed the same. However, I did notice that increasing the sizes of the hidden layers slowed down performance and decreasing the size sped up performance. Furthermore, slightly increasing and decreasing the learning rate decreased accuracy by about 3% so I left it at the default.