<a href="https://colab.research.google.com/github/caocscar/workshops/blob/master/pytorch/Workshop_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classification Problem

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
import numpy as np
import pandas as pd

print('Torch version', torch.__version__)
print('Pandas version', pd.__version__)
print('Numpy version', np.__version__)

Torch version 1.3.1
Pandas version 0.25.3
Numpy version 1.17.4


The following should say `cuda:0`. If it does not, we need to go to *Edit* -> *Notebook settings* and change it to a `GPU` from `None`. You only have to do this once per notebook.

In [2]:
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
device

'cpu'

Read in dataset

In [0]:
df_train = pd.read_csv('https://raw.githubusercontent.com/greght/Workshop-Keras-DNN/master/ChallengeProblems/iris_training.csv', header=None)
df_val = pd.read_csv('https://raw.githubusercontent.com/greght/Workshop-Keras-DNN/master/ChallengeProblems/iris_test.csv', header=None)

Construct our x,y variables along with the training and validation dataset

In [0]:
x_train = df_train.iloc[:,0:-1]
y_train = df_train.iloc[:,-1]
x_val = df_val.iloc[:,0:-1]
y_val = df_val.iloc[:,-1]

Preprocess our data to go from a `pandas` DataFrame to a `numpy` array to a `torch` tensor.

In [0]:
xtrain = torch.tensor(x_train.to_numpy(), device=device, dtype=torch.float, requires_grad=True)
ytrain = torch.tensor(y_train.to_numpy(), device=device, dtype=torch.long, requires_grad=False)
xval = torch.tensor(x_val.to_numpy(), device=device, dtype=torch.float, requires_grad=True)
yval = torch.tensor(y_val.to_numpy(), device=device, dtype=torch.long, requires_grad=False)

We'll write a python class to define out neural network.

In [0]:
class FourLayerNN(nn.Module):
    def __init__(self, D_in, H1, H2, H3, D_out):
        super().__init__()
        self.linear1 = nn.Linear(D_in, H1)
        self.linear2 = nn.Linear(H1,H2)
        self.linear3 = nn.Linear(H2,H3)
        self.linear4 = nn.Linear(H3,D_out)
        
    def forward(self,x):
        h1_relu = self.linear1(x).clamp(min=0)
        h2_relu = self.linear2(h1_relu).clamp(min=0)
        h3_relu = self.linear3(h2_relu).clamp(min=0)
        y_pred = self.linear4(h3_relu)
        return y_pred

We create an instance of this class

In [7]:
model = FourLayerNN(xtrain.shape[1],1000,500,70,y_train.nunique()).to(device)
model

FourLayerNN(
  (linear1): Linear(in_features=4, out_features=1000, bias=True)
  (linear2): Linear(in_features=1000, out_features=500, bias=True)
  (linear3): Linear(in_features=500, out_features=70, bias=True)
  (linear4): Linear(in_features=70, out_features=3, bias=True)
)

We'll define a template for our `fit_model` function that contains `train`,  `validate`, and `accuracy` functions.

In [0]:
def fit_model(model, loss_fn, optimizer):
    def train(x,y):
        yhat = model(x)
        loss = loss_fn(yhat,y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        return loss.item(), accuracy(yhat,y)
    
    def validate(x,y):
        yhat = model(x)
        loss = loss_fn(yhat,y)
        return loss.item(), accuracy(yhat,y)
    
    def accuracy(yhat,y):
        probs = np.argmax(yhat.cpu().detach().numpy(), axis=1)
        actual = y.cpu().detach().numpy()
        correct = (probs == actual).sum()
        total = y.shape[0]
        return correct / total   
    
    return train, validate

We define our *loss function*, *learning rate*, and our *optimizer*. We pass this to `fit_model` to return our `train` and `validate` functions.

In [0]:
loss_fn = nn.CrossEntropyLoss()
learning_rate = 0.01
optimizer = optim.Adagrad(model.parameters(), lr=learning_rate)
train, validate = fit_model(model, loss_fn, optimizer)

Define a `DataLoader` for our mini-batches.

In [0]:
train_data = TensorDataset(xtrain, ytrain)
train_loader = DataLoader(dataset=train_data, batch_size=60, shuffle=True)

Here is our training loop with mini-batch processing. We have to move each batch onto the GPU. We also should have a `DataLoader` for the validation dataset but we'll skip that in this case since it is so small.

In [11]:
epochs = 2000
for epoch in range(epochs):
    # training
    losses = []
    for i, (xbatch, ybatch) in enumerate(train_loader):
        xbatch = xbatch.to(device)
        ybatch = ybatch.to(device)
        loss, accuracy = train(xbatch, ybatch)
        losses.append(loss)
    training_loss = np.mean(losses)
    training_accuracy = np.mean(accuracy)
    # validation
    validation_loss, validation_accuracy = validate(xval, yval)
    # print intermediate results
    if epoch%100 == 99:
        print(f'{epoch}, {training_loss:.4f}, {training_accuracy:.2f}, {validation_loss:.4f}, {accuracy:.2f}')

99, 0.0790, 0.97, 0.0645, 0.97
199, 0.0817, 0.97, 0.0577, 0.97
299, 0.0537, 1.00, 0.0652, 1.00
399, 0.0497, 0.98, 0.0516, 0.98
499, 0.0403, 1.00, 0.0566, 1.00
599, 0.0382, 0.98, 0.0541, 0.98
699, 0.0382, 0.98, 0.0578, 0.98
799, 0.0355, 0.98, 0.0596, 0.98
899, 0.0338, 0.98, 0.0643, 0.98
999, 0.0385, 1.00, 0.0620, 1.00
1099, 0.0339, 1.00, 0.0672, 1.00
1199, 0.0327, 1.00, 0.0677, 1.00
1299, 0.0293, 1.00, 0.0716, 1.00
1399, 0.0293, 1.00, 0.0717, 1.00
1499, 0.0290, 1.00, 0.0738, 1.00
1599, 0.0267, 1.00, 0.0826, 1.00
1699, 0.0280, 1.00, 0.0815, 1.00
1799, 0.0274, 1.00, 0.0912, 1.00
1899, 0.0253, 0.98, 0.1166, 0.98
1999, 0.0249, 1.00, 0.0899, 1.00


### nn.Sequential

If we wanted to user the simpler `nn.Sequential` function, our model construction would have looked like this.

In [12]:
model_sequential = nn.Sequential(
    nn.Linear(xtrain.shape[1],1000),
    nn.ReLU(),
    nn.Linear(1000,500),
    nn.ReLU(),
    nn.Linear(500,70),
    nn.ReLU(),
    nn.Linear(70,y_train.nunique()),
).to(device)
print(model_sequential)

Sequential(
  (0): Linear(in_features=4, out_features=1000, bias=True)
  (1): ReLU()
  (2): Linear(in_features=1000, out_features=500, bias=True)
  (3): ReLU()
  (4): Linear(in_features=500, out_features=70, bias=True)
  (5): ReLU()
  (6): Linear(in_features=70, out_features=3, bias=True)
)
