# Simple Iris dataset classification with Torch

Multi-class classification of the iris dataset with a fully connected net. The network takes 4 input features and outputs the probability (softmax) of each sample in the training batch to belong to each class. 

In [8]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
from sklearn import datasets
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np


class fcnn(nn.Module):
    def __init__(self, input_features=4, hidden_size=5, output_classes=3):
        """ iris dataset has 4 features and 3 flower species (classes) """
        super().__init__()
        self.layer1 = nn.Linear(input_features, hidden_size)
        self.layer2 = nn.Linear(hidden_size, output_classes)       

    def forward(self, x):
        x = torch.sigmoid(self.layer1(x))
        x = torch.sigmoid(self.layer2(x))
        return x

Let's import the dataset

In [9]:
iris = datasets.load_iris()
df = np.c_[iris.data, iris.target]

np.random.shuffle(df)

X = df[:, :-1]
y = df[:, -1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

X_train = torch.from_numpy(X_train).float()
X_test = torch.from_numpy(X_test).float()
y_train = torch.from_numpy(y_train).long()
y_test = torch.from_numpy(y_test).long()

First we make an instance of the network using the class *fcnn* above then we define the optimizer. We use the [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent). The third fundamental ingredient is defining the loss function. We use [cross entropy](https://pytorch.org/docs/master/nn.html#torch.nn.CrossEntropyLoss). Different python packages have slightly different definitions of cross-entropy but Torch's documentation tells us that this function is suited for multi class classification problems. Moreover we don't have to use the *softmax* activation at the output because `CrossEntropyLoss()` already performs the softmax. 

In [10]:

# instantiate the network
net = fcnn()
net = net.float()
# print(net)

# define the optimizer 
optimizer = optim.SGD(net.parameters(), lr=0.1)    
# define the loss
loss_fun = nn.CrossEntropyLoss()

Now we program a loop to train the network with batch training, that is we take a batch of 20 samples at a time and predict their classes and calculate the network parameter updates with `loss.backward()`. At this point we do not update the parameters of the network, but just calculate the gradients that are necessary for the update. We keep accumulating (adding up) the updates until the end of the outer loop, that is, until the end of the epoch and only then we perform the weights/biases update.

In [11]:
epochs = 1000
batch_size = 20
epoch_loss = []
for epoch in range(epochs):
    optimizer.zero_grad()        
    for i in range(0, X_train.shape[0], batch_size):

        x_b = X_train[i: i + batch_size].float()
        y_b = y_train[i: i + batch_size]

        y_hat = net(x_b)
        loss = loss_fun(y_hat, y_b) 
        loss.backward()
        optimizer.step()

    epoch_loss.append(loss)
    if epoch % 100 == 0:
        print("Epoch {}, loss = {}".format(epoch, loss))

# test accuracy
predicted = net(X_test)
_, y_pred = torch.max(predicted, 1)  # output 1 = max, output 2 = argmax

print('test set accuracy', accuracy_score(y_test.data, y_pred.data))

Epoch 0, loss = 1.0957356691360474
Epoch 100, loss = 0.7854022979736328
Epoch 200, loss = 0.7350257039070129
Epoch 300, loss = 0.6642157435417175
Epoch 400, loss = 0.6370225548744202
Epoch 500, loss = 0.6193720102310181
Epoch 600, loss = 0.6052758097648621
Epoch 700, loss = 0.5946289896965027
Epoch 800, loss = 0.5870515704154968
Epoch 900, loss = 0.5816200971603394
test set accuracy 0.9736842105263158
