**NOTE: This notebook is written for the Google Colab platform, which provides free hardware acceleration. However it can also be run (possibly with minor modifications) as a standard Jupyter notebook, using a local GPU.**

In [None]:
#@title -- Installation of Packages -- { display-mode: "form" }
import sys
!{sys.executable} -m pip install skorch

In [None]:
#@title -- Import of Necessary Packages -- { display-mode: "form" }
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OrdinalEncoder, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score
from skorch import NeuralNetClassifier
import torch.nn as nn
import torch

In [None]:
#@title -- Downloading Data -- { display-mode: "form" }
!mkdir -p output
!mkdir -p data
!wget -nc -O data/iris.csv https://www.dropbox.com/s/v3ptdkv5fvmx5zk/iris.csv?dl=1

# Neural Network Classifiers

This notebook deals with the application of a neural network constructed using the ``keras`` python package to a simple classification task. We will show how a network can be created and trained. We will use a very simple architecture – no convolutional layers, batch normalization or anything like that.

## The Dataset

In this example, we will again be using the Iris dataset, with which we are very familiar by now. We will now load it from the CSV file and split it into the train and test folds:

# Klasifikácia pomocou umelých neurónových sietí

Tento notebook ukazuje, ako sa dá neurónová sieť zostrojená pomocou pythonového balíčka ``pytorch`` aplikovať na jednoduchú klasifikačnú úlohu. Ukážeme, ako sa dá taká sieť vytvoriť a natrénovať. Budeme používať veľmi jednoduchú architektúru – bez konvolučných vrstiev, dávkovej normalizácie a iných podobných špeciálnych vrstiev.

## Načítanie dátovej množiny

V tomto príklade budeme opäť pracovať s dátovou množinou Iris, ktorú už dobre poznáme. Teraz ju načítame z CSV súboru a rozdelíme na tréningovú a testovaciu časť:

In [None]:
#@title -- Loading and Splitting the dataset df_train, df_test -- { display-mode: "form" }

# we load the data from the CSV
df = pd.read_csv("data/iris.csv")
display(df.head())

# we split it into train and test, stratifying by species
df_train, df_test = train_test_split(df, test_size=0.25,
                                     stratify=df['species'],
                                     random_state=4)

As usual, we sort the columns into categorical, numerical and output.

In [None]:
categorical_inputs = []
numeric_inputs = list(df.columns[:-1])
output = ["species"]

The preprocessing that we have standardly applied up till now re-encodes categorical attributes into numbers, by assigning a number to each unique value of the attribute (using the ``OrdinalEncoder`` transformer). In the case of neural networks it will usually be more suitable to use one-hot encoding instead: for each categorical column there will be as many input neurons as there are distinct categorical values and exactly one out of these will be active at any given time. This kind of preprocessing can be achieved using the ``OneHotEncoder`` transformer. The preprocessing for numeric values can remain unchanged.

In [None]:
input_preproc = make_column_transformer(
    (make_pipeline(
        SimpleImputer(strategy='constant', fill_value='MISSING'),
        OneHotEncoder()),
     categorical_inputs),
    
    (make_pipeline(
        SimpleImputer(),
        StandardScaler()),
     numeric_inputs)
)

In [None]:
output_preproc = OrdinalEncoder()

X_train = input_preproc.fit_transform(df_train[categorical_inputs+numeric_inputs])
Y_train = output_preproc.fit_transform(df_train[output]).reshape(-1)

X_test = input_preproc.transform(df_test[categorical_inputs+numeric_inputs])
Y_test = output_preproc.transform(df_test[output]).reshape(-1)

In addition to our standard preprocessing, we will also transform the results into datatypes expected by PyTorch: i.e. into 32-bit floats (inputs) and 64-bit ints (class labels).

In [None]:
X_train = X_train.astype(np.float32)
Y_train = Y_train.astype(np.int64)
X_test = X_test.astype(np.float32)
Y_test = Y_test.astype(np.int64)

## Creating the Neural Network

In order to create our neural net, we will inherit from the base class ``nn.Module``. The way in which all the layers are connected is defined in method ``forward``, which receives the input tensor as its argument and returns the output tensor. All layers with learnable parameters are created in the constructor. Layers which do not have any learnable parameters can be applied directly inside``forward``.

A neural net must have a certain fixed number of input and output neurons. The number of inputs will equal the number of columns in our dataset. The number of columns, on the other hand, will equal the number of classes, since the network is going to return their probabilities. The activation function of the output layer will be the softmax function, which will ensure that the outputs of the last layer will always sum up to 1, so that they can really be interpreted as properly normalized probabilities.

In [None]:
num_inputs = X_train.shape[1]
num_outputs = len(np.unique(Y_train))

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(num_inputs, 50)
        self.fc2 = nn.Linear(50, 50)
        self.fc3 = nn.Linear(50, num_outputs)

    def forward(self, x):
        y = self.fc1(x)
        y = torch.relu(y)
        
        y = self.fc2(y)
        y = torch.relu(y)
        
        y = self.fc3(y)
        y = torch.softmax(y, dim=1)
        
        return y

Our neural net can either use the processor or the graphics card (GPU), if it is available. We can specify the type of device that we intend to use. Our code will automatically select the "cuda" device (the GPU), if it is available, or "cpu" (the processor) if it is not.

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"

The next step is to create the classifier itself – it will be an instance of class ``NeuralNetClassifier`` from the ``skorch`` package, which will wrap our neural net into the standard ``scikit-learn`` interface, which will make it significantly easier to run the training, but which will also enable us to use the net as a part of a pipeline, should we need to.

When constructing the classifier we can enter a number of parameters. We are only going to show a handful (the rest can, of course, be found in the documentation):
* max_epochs: the number of learning epochs (how may times we are going to iterate over the entire dataset);
* batch_size: size of the mini-batches – when -1 is specified, this means that the entire dataset will be used, i.e. we will be training in the full-batch mode;
* optimizer: specifies which optimization method will be used;
* train_split: specifies how the data will be split into training and validation folds – None means that the entire dataset is going to be used for training and there will be no validation set (note that we are not talking about the testing set here, which we have already separated manually – the validation set is used during training, e.g. for early stopping or for hyperparameter tuning etc.);
* device: specifies which device is going to be used (the procesor or cuda). 

If we wanted to parametrize the individual components that make up the classifier, we can use parameters with prefixes in the form of ``prefix__parameter_name``. If, for instance, we wanted to change the learning rate used by the optimizer, we would use the following parameter:
```
optimizer__lr=value
```

In [None]:
net = NeuralNetClassifier(
    Net,
    max_epochs=100,
    batch_size=-1,
    optimizer=torch.optim.Adam,
    train_split=None,
    device=device
)

And now we can finally run the training – the interface is the same as that of all ``scikit-learn`` classifiers.

In [None]:
net.fit(X_train, Y_train)

## Testing

Now that we have trained our model, we need to test its performance.

### On Training Data

In [None]:
y_train = net.predict(X_train)

In [None]:
cm = pd.crosstab(
    output_preproc.inverse_transform(
        Y_train.reshape(-1, 1)).reshape(-1),
    output_preproc.inverse_transform(
        y_train.reshape(-1, 1)).reshape(-1),
    rownames=['actual'],
    colnames=['predicted']
)
print(cm)

In [None]:
acc = accuracy_score(Y_train, y_train)
print("Accuracy = {}".format(acc))

### On Testing Data

In [None]:
y_test = net.predict(X_test)

In [None]:
cm = pd.crosstab(
    output_preproc.inverse_transform(
        Y_test.reshape(-1, 1)).reshape(-1),
    output_preproc.inverse_transform(
        y_test.reshape(-1, 1)).reshape(-1),
    rownames=['actual'],
    colnames=['predicted']
)
print(cm)

In [None]:
acc = accuracy_score(Y_test, y_test)
print("Accuracy = {}".format(acc))