<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Imports-and-Data-Loading" data-toc-modified-id="Imports-and-Data-Loading-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Imports and Data Loading</a></span><ul class="toc-item"><li><span><a href="#DataLoader-class" data-toc-modified-id="DataLoader-class-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span><a href="https://pytorch.org/docs/stable/data.html" target="_blank">DataLoader class</a></a></span></li></ul></li><li><span><a href="#Building-a-Neural--Network" data-toc-modified-id="Building-a-Neural--Network-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Building a Neural  Network</a></span><ul class="toc-item"><li><span><a href="#Activation-Functions" data-toc-modified-id="Activation-Functions-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Activation Functions</a></span></li></ul></li><li><span><a href="#Training" data-toc-modified-id="Training-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Training</a></span><ul class="toc-item"><li><span><a href="#Loss-Function" data-toc-modified-id="Loss-Function-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Loss Function</a></span></li><li><span><a href="#Optimization" data-toc-modified-id="Optimization-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Optimization</a></span></li><li><span><a href="#GPU" data-toc-modified-id="GPU-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>GPU</a></span></li><li><span><a href="#Training-Loop" data-toc-modified-id="Training-Loop-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Training Loop</a></span></li></ul></li><li><span><a href="#Predictions" data-toc-modified-id="Predictions-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Predictions</a></span></li><li><span><a href="#Saving-and-Loading-the-Model" data-toc-modified-id="Saving-and-Loading-the-Model-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Saving and Loading the Model</a></span></li><li><span><a href="#References" data-toc-modified-id="References-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>References</a></span></li><li><span><a href="#Next-Steps:" data-toc-modified-id="Next-Steps:-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Next Steps:</a></span><ul class="toc-item"><li><span><a href="#(Computer-Vision)-Cat-Dog" data-toc-modified-id="(Computer-Vision)-Cat-Dog-7.1"><span class="toc-item-num">7.1&nbsp;&nbsp;</span>(Computer Vision) Cat Dog</a></span></li></ul></li></ul></div>

## Imports and Data Loading

In [1]:
# import PyTorch
import torch

# standard DS stack
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import pandas as pd
# embed static images in the ipynb
%matplotlib inline 

# neural network package
import torch.nn as nn 
import torch.nn.functional as F

# computer vision
import torchvision
from torchvision import transforms
from PIL import Image

# dataset loading
from torch.utils.data import Dataset, DataLoader, ConcatDataset

import copy
# import tqdm

As the focus of this tutorial is to understand some foundations about PyTorch and neural networks, we'll be using a small subset of a [dataset](#s0) suitable for a multivariate regression task. 

Dataset source note: book Comment Volume Dataset Data Set.

In [27]:
exec(open("datasets/FB_comments/process_data.py").read())
[A.shape for A in [X_train, Y_train, X_test, Y_test]]

Original dataset shapes
Training set:(40949, 54), Testing set:(10044, 54)
Dataset shapes after PCA and random sampling
X_train.shape:(10237, 10), Y_train.shape:(10237,)
X_test.shape:(5022, 10), Y_test.shape:(5022,)


[(10237, 10), (10237,), (5022, 10), (5022,)]

### [DataLoader class](https://pytorch.org/docs/stable/data.html)

Every dataset, no matter whether what it includes, can interact with PyTorch if it satisfies the following abstract Python class:

```python
class Dataset(object):
    def __getitem__(self, idx):
        """ Retrieve an item from the dataset in a (label, tensor) pair.
        
        Args:
            idx: index
        """
        pass
        
    def __len__(self):
        """ Returns the size of the dataset (len)"""
        pass
```

This is referred to as a map-style dataset in the docs.

In [38]:
class FB_Dataset(Dataset): # inherit from torch's Dataset class.
    def __init__(self):
        # data loading
        self.X = torch.from_numpy(np.vstack([X_train, X_test]))
        self.Y = torch.from_numpy(np.concatenate([Y_train, Y_test]))

        if self.X.shape[0] == self.Y.shape[0]:
            self.n_samples = self.X.shape[0]
        else:
            raise ValueError("Shape mismatch")
        
    def __getitem__(self, idx):
        return self.X[idx], self.Y[idx]
    
    def __len__(self):
        return self.n_samples
        # len(dataset)

fb_dataset = FB_Dataset()

In [None]:
dataloader = DataLoader(dataset=fb_dataset, batch_size=8, shuffle=True)

## Building a Neural  Network

In [23]:
class Net(nn.Module): # class inherits from nn.Module
    def __init__(self):
        super(Net, self).__init__() # initialize nn.Module
        # some fully connected layers w/ linear transformation
        """ nn.Linear(in_features, out_features, bias=True)
        Args:
            in_features: size of each input sample. For input shape (28, 28), 
                we would have in_features = 28 * 28 = 784
            out_features: size of each output sample.
        """
        self.fc1 = nn.Linear(10, 5)
        self.fc2 = nn.Linear(5, 3)
        self.fc3 = nn.Linear(3, 1)
    def forward(self, x): # defines the forward propagation
        x = F.relu(self.fc1(x)) # relu activation function
        x = F.relu(self.fc2(x))
        x = F.log_softmax(self.fc3(x), dim=1)
        # Output layer needs a multiclassifying transformation
        # log softmax works for this
        return F.log_softmax(x, dim=1)
network = Net()
print(network)

Net(
  (fc1): Linear(in_features=10, out_features=5, bias=True)
  (fc2): Linear(in_features=5, out_features=3, bias=True)
  (fc3): Linear(in_features=3, out_features=1, bias=True)
)


This above network is known as a **feedforward neural network** or **multilayer perceptron (MLP)** ([Goodfellow et al., Deep Learning Book](#s1)). It's called feedforward because there are no **feedback** connections in which the outputs of the model are fed back into previous layers.  

When feedforward neural networks are extended to include feedback connections, they are called **recurrent neural netowrks**. 

The **depth** of a network is defined as its number of layers (including the output layer but excluding input layer), while the **width** of a network is defined to be the maximal number of nodes in a layer. This explains the reasoning behind the name, "deep learning".

The terminology for the network structure above is typical called the **network architecture**, which includes how many layers the network contains, how the layers are connected to each other, and how many neurons (a.k.a nodes, a.k.a. units) are in each layer.

### Activation Functions

Activation functions are used to compute hidden layer values.

Here, we make use of the **rectified linear unit (ReLU)** as the activation function, using `F.relu`. This is the default recommendation for the activation function in modern deep learning. Here's [an article](https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/) about ReLU for your reference. 

## Training 

### Loss Function 

Training a network involves passing data through the network, using the **loss function** to ~~determine~~ define a criterion for capturing the similarity or difference between a prediction and an actual target.

Below I'll include common loss functions for various supervised learning tasks:

**Regression**:
- Mean squared error: [`nn.MSELoss`](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss)
- Mean absolute error, or L1: [`nn.L1Loss`](https://pytorch.org/docs/stable/generated/torch.nn.L1Loss.html#torch.nn.L1Loss)

**Binary Classification**:
- Binary cross-entropy: [`nn.BCELoss`](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html#torch.nn.BCELoss)
- Binary cross-entropy with logits: [`nn.BCEWithLogitsLoss`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html#torch.nn.BCEWithLogitsLoss) 

**Multi-class Classification**:
- Cross entropy : [`nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)
- Negative log likelihood: [`nn.NLLLoss`](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss) 

In [None]:
# loss_fn = nn.MSELoss() # ex. regression
# loss_fn = nn.BCELoss() # ex. binary classification
# loss_fn = nn.CrossEntropyLoss() # ex. multi-class classification

### Optimization

The information about the loss function that is gained when we pass data through the network is used to update the weights of the network such that we  minimize the loss function.

In order to perform the updates on the neural network, we use an optimizer. 

In [6]:
# Implement Adam optimizer
optimizer = torch.optim.Adam(network.parameters(), lr=0.01)

TODO: Implement loss function

TODO: Explain choice of loss function. 

The **learning rate**, `lr`, is often a key parameter that needs to be tweaked in order to get a network to learn properly and efficiently.

Adaptive moment estimation (Adam) and stochastic gradient descent (SGD) have been empirically shown to outperform most other optimizers in deep learning networks. 

I decided to use Adam for this tutorial because it, along with RMSProp and AdaGrad, uses an adaptive learning rate, which adapts its updates to each paramter depending on the importance of individual paramters. 

### GPU

PyTorch, by default, does CPU-based calculations. To take advantage of the GPU, the input tensors and model need to be moved to the GPU explicitly with the `to()` method.

`network` is simply an instance of the neural network class written above. 

In [4]:
# GPU Recipe:
if torch.cuda.is_available(): # If PyTorch reports that GPU is available
    device = torch.device("cuda") # device = GPU
else:
    device = torch.device("cpu") # device = CPU

network.to(device) # Copy model to device

Net(
  (fc1): Linear(in_features=10, out_features=5, bias=True)
  (fc2): Linear(in_features=5, out_features=3, bias=True)
  (fc3): Linear(in_features=3, out_features=1, bias=True)
)

### Training Loop

First, some terminology. 

An **epoch** is a single forward and backward pass through ALL of the samples.

A **batch**, then, refers to some subset of an the total dataset, where `batch_size` is the number of samples used in one forward and backward pass.

The **number of iterations** is the number of passes (forward and backward $\implies$ 1 pass) needed to complete a single epoch with each pass using `batch_size` number of samples.

In other words, suppose that we have 50,000 samples and `batch_size=25`, then there are 50,000/25 == 2000 iterations for 1 epoch.

TODO: Explain backpropagation at high level.

TODO: Add practical explanation to code. 

In [20]:
def train(network, loss_fn, train_loader, val_loader,
          n_epochs, optimizer=optimizer, device=device):
    for epoch in range(n_epochs):
        train_loss = 0.0
        val_loss = 0.0
        for batch in train_loader:
            optimizer.zero_grad() # clears gradient buffers of all parameters
            inputs, target = batch
            # transfer batch data to computation device
            inputs = inputs.to(device)
            target = target.to(device)
            output = network(inputs)
            loss = loss_fn(output, target)
            # back propagation
            loss.backward()
            optimizer.step() # update model weights
            training_loss += loss.data.item()
        training_loss /= len(train_iterator)
    pass

SyntaxError: non-default argument follows default argument (<ipython-input-20-fc2bcf27bc13>, line 1)

## Predictions 

## Saving and Loading the Model 

## References
- <a id='s0'> </a> UCI Machine Learning Repository. *Facebook Comment Volume Dataset*. https://archive.ics.uci.edu/ml/datasets/Facebook+Comment+Volume+Dataset# 
- <a id='s1'></a>Goodfellow, I., Bengio, Y., Courville, A. (2016). *Deep learning* (Vol. 1). Cambridge: MIT press.
- Lu, Z., Pu, H., Wang, F., Hu, Z., & Wang, L. (2017). *The expressive power of neural networks: A view from the width*. In Advances in neural information processing systems (pp. 6231-6239).
- Brownlee, J. (2019). A gentle introduction to the rectified linear unit (relu). *Machine Learning Mastery. https://machinelearningmastery.com/rectified-linear-activation-function-fordeep-learning-neural-networks*.


----
## Next Steps:
### (Computer Vision) Cat Dog
  - Save a CNN and use it on this dataset. 
  - Explain fundamental CNN concepts. 
  - Data from [Kaggle competition](https://www.kaggle.com/c/dogs-vs-cats/overview)
  - [jaeboklee](https://www.kaggle.com/jaeboklee/pytorch-cat-vs-dog)

In [17]:
exec(open("datasets/FB_comments/process_data.py").read())
[A.shape for A in [X_train, Y_train, X_test, Y_test]]

Original dataset shapes
Training set:(40949, 54), Testing set:(10044, 54)
Dataset shapes after PCA and random sampling
X_train.shape:(10237, 10), Y_train.shape:(10237,)
X_test.shape:(5022, 53), Y_test.shape:(5022,)


  corr /= X_norms
  return (a < x) & (x < b)
  return (a < x) & (x < b)
  cond2 = cond0 & (x <= _a)


[(10237, 10), (10237,), (5022, 53), (5022,)]