# Final Project - Eric Steele, Liam Tiemon, Mate Virag
### Variant 2 - Analysis and Comparison of Gluon, PyTorch and Tensorflow
May 4, 2018  
Dr. Kevin Kirby  
CSC 494

# Installation 
___

### Gluon
___

#### For Mac:
1. Go to terminal
    * Optional: create a new virtual environment
2. Run these commands
    * pip install gluon
    * pip install mxnet

#### For Windows:
1. Go to http://landinghub.visualstudio.com/visual-cpp-build-tools and download and install the C++ compiler.
2. Go to Anaconda prompt
    * Optional: create a new virtual environment
3. Run these commands
    * pip install gluon
    * pip install mxnet
    
___

### PyTorch
___
#### For Mac:
1. Go to terminal
    * Optional: create a new virtual environment
2. Go to PyTorch's website (http://pytorch.org) and specify your desired configuration
3. Run the returned pip or conda command to install PyTorch

#### For Windows:
1. Go to Anaconda prompt
    * Optional: create a new virtual environment
2. Go to PyTorch's website (http://pytorch.org) and specify your desired configuration
3. Run the returned pip or conda command to install PyTorch

# Documentation 

---

## Gluon

* http://gluon.mxnet.io contains most of the information required to get started, had code examples, and good documentation.
    * This helped us get our project up and going. Following their tutorial helped us get our CNN started and from there we were able to change it to our liking. 
* http://mxnet.incubator.apache.org/api/python/index.html contains tutorials and documentation for APIs.
    * It's API documentation helped us determine which APIs were needed for layers in our CNN.
    
## PyTorch

* There isn't much PyTorch documentation besides the base documentation from the developers and a few GitHub repositories.
* PyTorch's [website](https://pytorch.org/tutorials/index.html) has enough code examples to get you started, but not enough to get you in a good place with a CNN
* https://github.com/utkuozbulak/pytorch-custom-dataset-examples
    * This GitHub repository was a huge help in getting NkuMyaDevMaker.py up and running with PyTorch's network.
* https://github.com/pytorch/examples
    * The official PyTorch GitHub repository was useful to implement layer connections, defining the network, making drop out, and how to use the activation functions.

# Ease of Use
___

## Network

___

#### Gluon

In [3]:
#imports necessary
import mxnet as mx
from mxnet import gluon, autograd, ndarray
import numpy as np
import matplotlib.pyplot as plt

# Initialize the model
net = gluon.nn.Sequential()
    
# Declare hyperparameters
convo1_kernels = 20
convo1_kernel_size = (5,5)
convo2_kernels = 40
convo2_kernel_size = (5,5)
pooling = 2

hidden1_neurons = 20
dropout_rate = 0.3
hidden2_neurons = 15

# Define our network
with net.name_scope():
    net.add(gluon.nn.Conv2D(channels=convo1_kernels, kernel_size=convo1_kernel_size, use_bias=True, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=pooling, strides=pooling))
    net.add(gluon.nn.BatchNorm())
    net.add(gluon.nn.Conv2D(channels=convo2_kernels, kernel_size=convo2_kernel_size, use_bias=True, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=pooling, strides=pooling))
    net.add(gluon.nn.Flatten())
    net.add(gluon.nn.Dense(hidden1_neurons, activation="relu", use_bias=True))
    net.add(gluon.nn.Dropout(dropout_rate))
    net.add(gluon.nn.Dense(hidden2_neurons, activation="relu", use_bias=True))
    net.add(gluon.nn.Dense(1, activation="sigmoid", use_bias=True)) # Output layer

Defining a network in Gluon is similar to Tensorflow. It offers the same types of convolutional and dense layers with very similar lists of parameters that can be passed in to them. It also offers features, such as batch normalization, dropout and image flattening just like Tensorflow.

### PyTorch
---

In [1]:
#imports used
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data.dataset import Dataset
from torch.autograd import Variable

"""Class that defines the neural network."""
class Net(nn.Module):
    """Defines the layers in the neural network."""
    def __init__(self, depth, nk, kernel_size, padding, hidden_neurons, nc):
        super(Net, self).__init__()
        # out_channels defines the number of kernels
        self.conv1 = nn.Conv2d(in_channels=depth, out_channels=nk[0], kernel_size=kernel_size, padding=padding)
        self.conv2 = nn.Conv2d(in_channels=nk[0], out_channels=nk[1], kernel_size=kernel_size)
        #self.conv2_drop = nn.Dropout2d()
        # nc is the image size after convolution and pooling
        self.fc1 = nn.Linear(nc * nc * nk[1], hidden_neurons[0])
        self.fc2 = nn.Linear(hidden_neurons[0], hidden_neurons[1])
        # Single value output
        self.fc3 = nn.Linear(hidden_neurons[1], 1)

    """
    Defines the connections and the activation functions between layers, pushes the 
    input patterns through the network and returns the network's output.
    """
    def forward(self, x, pooling):
        # Max pooling over a square window with stride of pool size to avoid overlaps
        # Activatin functions are specified in this function even for the convolutional layer
        x = F.max_pool2d(F.relu(self.conv1(x)), kernel_size=pooling, stride=pooling)
        x = F.max_pool2d(F.relu(self.conv2(x)), kernel_size=pooling, stride=pooling)
        #x = F.max_pool2d(F.relu(self.conv2_drop(self.conv2(x))), kernel_size=pooling, stride=pooling)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.sigmoid(self.fc3(x))
        return x

    """
    Calculates the size of the flat array for the input of the first dense layer 
    after the last convolutional layer.
    """
    def num_flat_features(self, x):
        size = x.size()[1:]  # All dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

Defining a network in PyTorch is a lot different than in Gluon or Tensorflow because it is a low-level framework compared to the other two frameworks we used. PyTorch requires a class inheriting from torch.nn.Module to be implemented when creating a neural network. This class is generally implemented using three functions that define the network. The \__init\__ function defines the layers in the neural network, however, PyTorch defines these layers at a much lower level than Tensorflow or Gluon. For example, PyTorch requires the user to calculate the number of input and output channels for each layer, while Tensorflow and Gluon automatically handled those calculations. Furthermore, PyTorch doesn't allow the user to define the activation function for the layers, instead, it requires the user to push the activations through an activation function manually prior to passing the output as the input of the next layer.

### Built in
___

#### Gluon
* Batching was automatically done based on a parameter passed onto the data loader
* Gluon includes a wide variety of pre implemented basic, convolutional, pooling and activation layers varying from 1D to 3D
* Input channels are automatically computed between layers.
* Batch Normalization and Drop Out functions are pre implemented.
* Flatten image calculations are automatically done.
* You can pass in activation functions, much like tensorflow
* Parameter specification into methods had a wide range of options.

#### PyTorch
* Low level compared to Tensorflow and Gluon
* Batching was automatically done based on a parameter passed onto the data loader

### Implemented
___

#### Accuracy Function

In [1]:
"""
Accuracy function for a two-class classifier. Receieves real numbers where one class
is associated with 0.0 and the other with 1.0. A prediction within 0.33 of the
label is considered a correct result. The function returns the number of
correct classifications across a batch of predictions and labels.
"""
def accuracy(predictions, labels):
    # Convert mxnet NDArrays to numpy NDArrays
    pred = predictions.asnumpy()[:,0]
    lab = labels.asnumpy()[:,0]
    correct = 0
    for i in range(len(pred)):
        if abs(pred[i] - lab[i]) < 0.33:
            correct += 1
    return correct

When attempting to design our own network in Gluon, we initially ran into difficulty with the accuracy function. The built-in version mx.metric.Accuracy is intended for use with one-hot outputs, but we designed our network to use a single output to match the design of the TensorFlow network from HW4. This discrepancy caused our result to consistently be 50% accuracy. Once we identified this issue we wrote our own accuracy function to match the formula used in TfCnn-MyaDev_For_HW4.py, we identified that the networks were in fact learning. We were able to use this function with both Gluon and PyTorch.

#### Dataset and DataLoader

In [3]:
import mxnet as mx

class MyaDevDataset(mx.gluon.data.Dataset):
    def __init__(self, X, Y, transform=None):
        self.X = X                  # NkuMyaDevMaker images
        self.Y = Y                  # NkuMyaDevMaker labels
        self.transform = transform  # Transformation function (optional)
    
    def __len__(self):
        return self.Y.shape[0]
    
    def __getitem__(self, idx):
        item = (self.X[idx], self.Y[idx])
        
        if self.transform:
            item = self.transform(item)
        
        return item

  import OpenSSL.SSL


We initially tried to use Numpy arrays to pass our training and test images to the DataLoader class. We had difficulty identifying the correct shape for that array and elected to write a version of the Dataset class instead. Dataset is very straightforward. The data can be passed to the \__init\__ in any form, the \__len\__ function returns the number of elements in the dataset, and the \__getitem\__ function returns a single element by index (optionally with a transformation applied).

This Dataset is then passed to the DataLoader class along with a batch size. The DataLoader is iterable and handles minibatching, returning tuples of data and labels from the Dataset, split into the specified batch size.

Gluon and PyTorch use 

#### PyTorch
* A class had to be created in order to specify connections between layers and hand made calculations had to be done when inputted into a flatten or max pooling layer.

# Results

## Reimplementing TensorFlow Network

Running a Gluon implementation of the network from TfCnn-MyaDev_For_HW4.py gave results consistently around 95% accuracy with the default hyperparameters provided in the file. This is roughly consistent with the results of running the network in TensorFlow.

The PyTorch implementation of the TfCnn-MyaDev_For_HW4.py network got around 94% accuracy when successful, but approximately one fifth of the time does not improve.

## Designing New Networks

For our own network design in Gluon, we initially tried using SGD as our training algorithm and Mean Squared Error for loss. This was effective but took a long time to produce results. In an effort to make the network learn more quickly we added a batch normalization layer, second convolution layer, dropout layer, and a second hidden layer. There was some improvement, but training was still slow. We then tried to the Adam training algorithm instead and found that it trained very quickly, hitting its accuracy threshold and stopping almost immediately. We changed the network to run for the maximum number of epochs and found that rather than overfitting the network consistently hit accuracies of above 98%.

In PyTorch we started with Adam as the training algorithm. However we began with Mean Absolute Error, mistakenly thinking that PyTorch did not offer Mean Squared Error because the PyTorch version of that function was named differently than in Gluon. Once we identified this we switched to MSE and got significantly better results. We also gradually increased the training set size which also helped. Our final results achieved around 97% accuracy, although that rate fluctuated more than that of our Gluon network.

Across both the existing network and our designs, PyTorch consistently had less reliable and consistent results than Gluon.