# Project IRIS - Phase 2: Keras</font>

In this tutorial you will discover **how to use Keras+sklearn to develop and evaluate a NN model for a multiclass classification problem**. 

Goals:
* How to load data from CSV and make it available to Keras
* How to prepare multiclass classification data for modeling with NNs
* How to evaluate Keras NN models with scikit-learn

# <font color='blue'>A. Description of the input data

Same as previous notebook.

# <font color='blue'>B. Set-up and data import

Start by importing all classes and functions you will need:

* data loading functionalities from **Pandas** (learn more [here](https://pandas.pydata.org/)) - same as before
* data preparation and model evaluation from **Scikit-learn** - referred to as `sklearn` in the following (learn more [here](https://scikit-learn.org/stable/)) - same as before
* all the functionality we require from **Keras** (learn more [here](https://keras.io/))
* more as needed, e.g. **numpy**, **matplotlib**, .. (learn more [here](https://numpy.org/) and [here](https://matplotlib.org/) respectively)


In [0]:
# pandas
from pandas import read_csv

# sklearn
from sklearn.preprocessing import LabelEncoder
#from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold

# keras
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils

# numpy
import numpy as np

# matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

In [0]:
# fix random seed for reproducibility
seed = 123
np.random.seed(seed)

Download the data.

In [0]:
#today, get it from here for example:
#!wget https://raw.githubusercontent.com/bonacor/CorsoSwComp/master/iris.data.csv
#!ls -trl iris.data.csv
#!head -5 iris.data.csv

import pandas as pd

url = 'https://raw.githubusercontent.com/dbonacorsi/SC_AA1920/master/datasets/iris.data.csv'

names = ['sepal-l', 'sepal-w', 'petal-l', 'petal-w', 'class']
dataset = pd.read_csv(url, names=names)
dataset

Import the data and prepare it.

In [0]:
# load dataset
#dataframe = read_csv("iris.data.csv", header=None)
data = dataset.values
X = data[:,0:4].astype(float)   # columns from 1st to 4th into X
Y = data[:,4]                   # column 5th into Y

In [0]:
len(X)

In [0]:
len(Y)

In [0]:
X

In [0]:
Y

# <font color='blue'>C. Data preparation/preprocessing

We did some data exploration in the previous notebook. Here, we focus a bit more on data preprocessing.

> *(NOTE: different datasets may require different data manipulation/preprocessing. This just applies to this specific case)*



## C1. One-hot encoding the output variable

I need to go from

    Iris-setosa
    Iris-versicolor
    Iris-virginica
    
to

    1, 0, 0
    0, 1, 0
    0, 0, 1
    
I will do it in 2 subsequent steps:
   1. encoding the strings consistently to integers 
   2. convert the vector of integers to a one-hot encoding 


In [0]:
# step 1: encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

Check:

In [0]:
encoder

In [0]:
encoded_Y

In [0]:
# step 2: do one-hot encoding
transformed_Y = np_utils.to_categorical(encoded_Y)

In [0]:
transformed_Y

### <font color='red'>Exercise 1: `fit` or `fit and transform` (in sklearn)?

Can quickly you code step 1 above with the fit and transform paradigm in sklearn, instead of fit?

#### <font color='green'>Solution 1

In [0]:
# INSERT YOUR CODE HERE

# <font color='blue'>D. Define a NN model

## D1. Baseline NN model

You can create a baseline NN - a simple **Fully Connected NN (FCNN)** - for the IRIS multiclass classification problem with just one function:
   * input
       * as per our input dataset, this NN has 4 inputs (X)
   * hidden layer(s)
       * the hidden layer here has 8 nodes, and uses a rectifier (**relu**) activation function, which is a good practice
   * output
       * because we used a one-hot encoding for the dataset, the output layer must create 3 output values, one for each class. We use a **softmax** activation function in the output layer, to ensure the output values are in the range of 0 and 1 and may be used as predicted probabilities: the output value with the largest value will be taken as the class predicted by the model. Finally, the network uses the efficient **adam** GD optimization algorithm with a **logarithmic loss function**, which is called **categorical crossentropy** in Keras.   
   
Hence, the network topology of this simple 1-layer FCNN can be summarized as:

    4 inputs -> 1 hidden layer with 8 nodes -> 3 outputs

and it simplementation in Keras is as simple as follows:

In [0]:
# create model
model = Sequential()
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(3, activation='softmax'))

# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [0]:
model.summary()

In [0]:
%%time
#0 history=model.fit(X, Y, epochs=10)           # <-- error!
#1 
history=model.fit(X, transformed_Y, epochs=10)
#2 history=model.fit(X, transformed_Y, epochs=50)
#3 history=model.fit(X, transformed_Y, epochs=100)
#4 -- more refined? add e.g. batch_size=32, or 10

In [0]:
my_variable=history.history["accuracy"]
plt.plot(range(len(my_variable)),my_variable)

## D2. Train-Test splitting

In [0]:
X_train, X_test, Y_train, Y_test = train_test_split(X, transformed_Y, test_size=0.2, random_state=seed)

In [0]:
# create model
model = Sequential()
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(3, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [0]:
model.summary()

In [0]:
%%time
#1 
history=model.fit(X_train, Y_train, epochs=10, validation_data=(X_test,Y_test))
#2 history=model.fit(X_train, Y_train, epochs=100, validation_data=(X_test,Y_test))
#3 history=model.fit(X_train, Y_train, epochs=500, validation_data=(X_test,Y_test))  <-- signs of overfitting!
#4 history=model.fit(X_train, Y_train, epochs=120, validation_data=(X_test,Y_test), batch_size=32)
#5 history=model.fit(X_train, Y_train, epochs=100, validation_data=(X_test,Y_test), batch_size=10)

In [0]:
variable1=history.history["loss"]
variable2=history.history["val_loss"]
plt.plot(range(len(variable1)),variable1, label='loss')
plt.plot(range(len(variable2)),variable2, label='val_loss')
plt.legend()

## D3. Introduce KerasClassifier



The idea here is to use the Keras library which provides wrapper classes to allow you to use NN models developed with Keras in scikit-learn. Why so? Because you get the best from both: Keras is simple and useful for NN design, and scikit-learn is powerful and versatile for many ML-related tasks.

There is a *KerasClassifier* class in Keras that can be used as an *Estimator* in scikit-learn, the base type of model in the library. We need to actually create our KerasClassifier first, to be used in scikit-learn. KerasClassifier takes the name of a function (the one we wrote above) as an argument, plus arguments that will be passed on to the *fit()* function internally used to train the NN. Here, we pass:

* a number of epochs as 200
* a batch size as 5 

to use when training the model. Debugging is also turned off when training by setting verbose to 0.
    
This function returns the constructed NN model, ready for training.

More info: https://keras.io/scikit-learn-api/

In [0]:
# define a baseline model
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(8, input_dim=4, activation='relu'))
    model.add(Dense(3, activation='softmax'))
    # Compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

In [0]:
estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

In [0]:
estimator

## D4. Evaluate the model with $k$-Fold Cross-Validation

It is time to evaluate our NN model on our training data, a.k.a. the "training" phase.

The scikit-learn library has excellent capability to evaluate models using a suite of techniques. The gold standard for evaluating ML models is **k-fold cross-validation (k-fold CV)**. We do as follows:

1. we define the model evaluation procedure.
      * here, we shuffle the data before partitioning it, and we set the number of folds to 10 (a good default)
     
     
2. we evaluate our model (*estimator*) on our dataset (*X* and *transformed_Y*) using a 10-fold CV procedure (k-fold)

Evaluating the model only takes approximately 10 seconds and returns an object that describes the evaluation of the k=10 constructed models for each of the splits of the dataset. 

The results are summarized as both the mean and standard deviation of the model accuracy on the dataset.

In [0]:
# part 1
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)

In [0]:
%%time
# part 2
results = cross_val_score(estimator, X, transformed_Y, cv=kfold)
print("Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Note that the step above is taking quite some more time than other previous cells..

What we got is a reasonable estimation of the performance of the model on unseen data. It is also within the realm of known top results for this problem.