# Simple Iris dataset classification with Torch

This notes provide a template to perform multi-class classification with PyTorch. In this example we want to use a 2 layers fully connected network. The loss function that we use, `CrossEntropyLoss()` expects an input of shape `(batch_size, C)` where C is the number of classes (3 in our case). Therefore our network outputs a vector of length 3 for each sample. Beware, as explained later, these numbers do not represent the probabilities of beloning to each class. Those probabilities are calculated for us inside the torch's function `CrossEntropyLoss()`. To get the probabilities of the input sample to belong to each class (a vector of length 3) we use the method `predict_prob()` in the class `fcnn`. It converts the raw network output in numbers that sum up to one, and can be interpreted as probabilities.

In [29]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
from sklearn import datasets
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, roc_auc_score
import numpy as np

In [30]:
class fcnn(nn.Module):
    def __init__(self, input_features=4, hidden_size=5, output_classes=3):
        """ iris dataset has 4 features and 3 flower species (classes) """
        super().__init__()
        self.layer1 = nn.Linear(input_features, hidden_size)
        self.layer2 = nn.Linear(hidden_size, output_classes)       
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = torch.sigmoid(self.layer1(x))
        x = self.layer2(x)
        return x

    def predict_prob(self, vec):
        """ To predict probabilities (scores) we need to transform the output of the 
            network (1x3 array for each sample) to an 1x3 array of numbers that sum up
            to 1 so that they can be interpreted as the probability of each class 
        """
        return self.softmax(vec)

Importing the dataset

In [16]:
iris = datasets.load_iris()
df = np.c_[iris.data, iris.target]

np.random.shuffle(df)

X = df[:, :-1]
y = df[:, -1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .5)

X_train = torch.from_numpy(X_train).float()
X_test = torch.from_numpy(X_test).float()
y_train = torch.from_numpy(y_train).long()
y_test = torch.from_numpy(y_test).long()

First we make an instance of the network using the class *fcnn* above then we define the optimizer. We use the [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent). The third fundamental ingredient is defining the loss function. We use [cross entropy](https://pytorch.org/docs/master/nn.html#torch.nn.CrossEntropyLoss). Different python packages have slightly different definitions of cross-entropy but Torch's [documentation](https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss) tells us that:
1. this function is suited for multi class classification problems.
2. We don't have to use the `softmax` activation at the output because `CrossEntropyLoss()` already performs it. 
3. We don't need to one-hot encode the classes (target vector `y`) but the target vector should be consist of the class indices (in our case 0, 1, 2)

In [17]:
# instantiate the network
net = fcnn()
net = net.float()
# print(net)

# define the optimizer 
optimizer = optim.SGD(net.parameters(), lr=0.1)    
# define the loss
loss_fun = nn.CrossEntropyLoss()

Now we program a loop to train the network with batch training, that is we take a batch of 20 samples at a time and predict their classes and calculate the network parameter updates with `loss.backward()`. At this point we do not update the parameters of the network, but just calculate the gradients that are necessary for the update. We keep accumulating (adding up) the updates until the end of the outer loop, that is, until the end of the epoch and only then we perform the weights/biases update

In [24]:
epochs = 1000
batch_size = 20
epoch_loss = []
losses = []
for epoch in range(epochs):
    optimizer.zero_grad()        
    for i in range(0, X_train.shape[0], batch_size):

        x_b = X_train[i: i + batch_size].float()
        y_b = y_train[i: i + batch_size]

        y_hat = net(x_b)
        loss = loss_fun(y_hat, y_b) 
        loss.backward()
        optimizer.step()
        losses.append(loss)
    
    epoch_loss.append(loss)
    if epoch % 100 == 0:
        print("Epoch {}, average loss = {}".format(epoch, loss/epochs))

# test accuracy
predicted = net(X_test)
_, y_pred = torch.max(predicted, 1)  # predicted classes
scores = net.predict_prob(predicted)  # predicted probabilities of each class for each sample

print('test set accuracy = {:1.3f}'.format(accuracy_score(y_test, y_pred)))
auc_ovr = roc_auc_score(y_true=y_test.detach().numpy(), y_score=scores.detach().numpy(), multi_class="ovr")
auc_ovo = roc_auc_score(y_true=y_test.detach().numpy(), y_score=scores.detach().numpy(), multi_class="ovo")
print('one-versus-rest area under the curve = {:1.3f}'.format(auc_ovr))
print('one-versus-one area under the curve = {:1.3f}'.format(auc_ovo))

Epoch 0, average loss = 5.9978363424306735e-05
Epoch 100, average loss = 6.051295713405125e-05
Epoch 200, average loss = 6.112334813224152e-05
Epoch 300, average loss = 6.179229967528954e-05
Epoch 400, average loss = 6.249862053664401e-05
Epoch 500, average loss = 6.323461275314912e-05
Epoch 600, average loss = 6.402952567441389e-05
Epoch 700, average loss = 6.493267574114725e-05
Epoch 800, average loss = 6.595510785700753e-05
Epoch 900, average loss = 6.7038883571513e-05
test set accuracy = 0.960
one-versus-rest area under the curve = 0.999
one-versus-one area under the curve = 0.999


Confusion matrix:

In [34]:
confusion_matrix(y_test.detach().numpy(), y_pred.detach().numpy())

array([[24,  0,  0],
       [ 0, 25,  1],
       [ 0,  2, 23]], dtype=int64)

Only one sample that belongs to class 2 has been predicted as class 3, and two samples of class 3 have been predicted as class 2.

Let's check manually some test set predictions

In [35]:
print(y_pred[:20])
print(y_test[:20])

tensor([2, 1, 0, 1, 1, 0, 2, 0, 1, 1, 1, 0, 1, 0, 2, 0, 2, 2, 0, 0])
tensor([2, 1, 0, 1, 1, 0, 2, 0, 2, 1, 1, 0, 1, 0, 2, 0, 2, 2, 0, 0])
