## Supervised Learning Part II - PyTorch

The library we've been using on our examples so far, **scikit-learn**, offers many options of "canned" Machine Learning models via an interface where different algorithms can be manipulated as simple objects. It is a great way to get started with Machine Learning and even solve real-life problems using simple to moderately complex models, with moderately large amounts of data.

However, to tackle more data-intensive tasks you need more powerful tools and **scikit-learn's** canned models won't be enough.

This is where **PyTorch** comes in. It has many performance advantages over **scikit-learn**, including but not limited to:

1. It supports GPU acceleration whereas sklearn only supports CPU.

2. It explicitly requires you to define and control the class of functions (we'll call it Models for now on) you'll use to approximate the Target, the loss functions and the optimization solvers as separate objects (some may argue this is an inconvenience). This allows more flexibility to pick the right combination of tools for each type of problem.

3. Its interface for defining Models is more flexible and expressive than sklearn's. This enables you to easily create very complex models. 

**PyTorch** can be seen as a general purpose Machine Learning, or even Scientific Computing framework. However, here we focus on the most common usage of the library: building Neural Networks.

# Neural Networks Primer:

If you look beyond the cool name, you will see that Neural Networks are just another "family" of mathematical function $f(X;\theta)$. But this family of functions is so useful that it sometimes seems almost magical. To add to their allure, Neural Networks can be difficult to write down mathematically depending on how complex their "architeture" is. It is easier to visualize how they work by drawing them as a graph:


As we've seen, writing a Machine Learning algorithm generally involves solving the optimization problem

$\hat{F(X)} = \underset{\theta}{\operatorname{argmin}} L(Y,f(X;\theta))$

This generally requires finding critical points of a function, that is, points where the derivative of the function is 0.

Whether you attempt to do it analytically or numerically... How do you compute the derivatives of $L$ when there's a crazy Neural Network nested in it?

# Backpropagation

For reasons we will not delve into in this workshop, an efficient way of solving the optimization problem above is by combining two numerical algorithms: Backpropagation and variations of Gradient Descent.

Here is a summary of the method with vanilla Gradient Descent:

###TODO: steps of Backprop and GD



To see this in action, let's go back to the Iris Dataset and train a model, this time with **PyTorch**:

In [222]:
from pandas import read_csv
from sklearn.model_selection import train_test_split 
from sklearn.metrics import accuracy_score
import numpy as np
import torch
from torch.autograd import Variable



# LET'S CREATE OUR TRAINING AND TEST SETS

header = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'species']

iris_dataset = read_csv('./data/iris.csv',names = header) 

# ENCODE SPECIES AS CATEGORY NUMBERS 
iris_dataset.loc[iris_dataset.species=='Iris-setosa', 'species'] = 0
iris_dataset.loc[iris_dataset.species=='Iris-versicolor', 'species'] = 1
iris_dataset.loc[iris_dataset.species=='Iris-virginica', 'species'] = 2

X = iris_dataset.values[:,0:4].astype('float32')
Y = iris_dataset.values[:,4].astype('int32')

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.20)

# CONVERT DATASETS TO PYTORCH TENSORS
X_train = Variable(torch.Tensor(X_train).float())
X_test = Variable(torch.Tensor(X_test).float())
Y_train= Variable(torch.Tensor(Y_train).long())
Y_test = Variable(torch.Tensor(Y_test).long())

In [283]:
# DEFINE OUR LOGISTIC REGRESSION MODEL AS A NEURAL NETWORK, INITIALIZE AN OPTIMIZER AND PICK A LOSS FUNCTION


# THERE IS A WAY TO CALL MODELS AS FUNCTIONS LIKE WE DID WITH SKLEARN, BUT CLASSES ARE PREFERABLE
class LogisticRegression(torch.nn.Module):

    def __init__(self):
        super(LogisticRegression, self).__init__()
        self.fc1 = torch.nn.Linear(4, 3)
    def forward(self, X):
        X = self.fc1(X)

        return X  

model = LogisticRegression()

learning_rate = 0.01

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)  
loss_function = torch.nn.CrossEntropyLoss()  



print(model)


LogisticRegression(
  (fc1): Linear(in_features=4, out_features=3, bias=True)
)


In [288]:
# NOW WE TRAIN THE MODEL

for i in range(epochs):
    
    optimizer.zero_grad()
    
    # FORWARD-PROPAGATION     
    Y_hat = model(X_train)

    loss = loss_function(Y_hat, Y_train)
    print("Loss at step", i, "is:" , loss.data.item())
    
    #BACKPROPAGATION
    loss.backward()

    optimizer.step()
    
# NOTICE HOW THE ERROR DECREASES WITH EACH STEP:

Loss at step 0 is: 0.26828524470329285
Loss at step 1 is: 0.26822739839553833
Loss at step 2 is: 0.2681695222854614
Loss at step 3 is: 0.2681117355823517
Loss at step 4 is: 0.26805391907691956
Loss at step 5 is: 0.2679961919784546
Loss at step 6 is: 0.26793843507766724
Loss at step 7 is: 0.26788073778152466
Loss at step 8 is: 0.26782315969467163
Loss at step 9 is: 0.26776549220085144
Loss at step 10 is: 0.2677079439163208
Loss at step 11 is: 0.26765039563179016
Loss at step 12 is: 0.2675929069519043
Loss at step 13 is: 0.26753535866737366
Loss at step 14 is: 0.26747798919677734
Loss at step 15 is: 0.26742058992385864
Loss at step 16 is: 0.2673631012439728
Loss at step 17 is: 0.2673059105873108
Loss at step 18 is: 0.2672484517097473
Loss at step 19 is: 0.2671911120414734
Loss at step 20 is: 0.2671339213848114
Loss at step 21 is: 0.2670767903327942
Loss at step 22 is: 0.26701948046684265
Loss at step 23 is: 0.2669624388217926
Loss at step 24 is: 0.2669053077697754
Loss at step 25 is: 0.2

In [289]:
# HOW DID IT DO ON THE TEST SET?

Y_hat_test = model(X_test)
Y_predicted = torch.max(Y_hat_test, 1).indices

accuracy_score(Y_test, Y_predicted)


0.9666666666666667