https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/

# Objectives
- Load data from CSV and make it available to Keras.
- Prepare multi-class classification data for modeling with neural networks using one hot encoding.
- Use Keras neural network models with scikit-learn.
- Define a neural network using Keras for multi-class classification.
- Evaluate Keras neural network models with scikit-learn with k-fold cross validation.

# 1. Problem Description
Use the standard machine learning problem called the [iris flowers dataset](https://archive.ics.uci.edu/ml/datasets/Iris).

All of the 4 input variables are numeric and have the same scale in centimeters. Each instance describes the properties of an observed flower measurements and the output variable is specific iris species.

Multi-class classification problem = more than two classes to be predicted (three flower species) <br>
The three class values require specialized handling.

Iris flower dataset is a well-studied problem -> expect to achieve a model accuracy in the range of 95% to 97% (target)

Download the iris flowers dataset from the UCI Machine Learning repository and place it in current working directory with the filename “iris.csv“.

[Iris Flowers Dataset (iris.csv)](https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv)

# 2. Import Classes and Functions
- functionality from Keras
- data loading from [pandas](https://pandas.pydata.org/)
- data preparation and model evaluation from [scikit-learn](https://scikit-learn.org/stable/)
- spliting data for making predictions from scikit-learn
- generating random seed from numpy

In [1]:
# data handling
import pandas

# keras
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils

# evaluation
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

# spliting dataset
from sklearn.model_selection import train_test_split

# generate random seed
import numpy as np

# 3. Load The Dataset
The dataset can be loaded directly. Because the output variable contains strings, it is easiest to load the data using pandas. Split the attributes (columns) into input variables (X) and output variables (Y).

In [2]:
# load dataset
dataframe = pandas.read_csv("iris.csv", header=None)

# numpy representation of df
dataset = dataframe.values
# all rows, first 4 columns (+ convert to float)
X = dataset[:,0:4].astype(float)
# all rows, 5th column
Y = dataset[:,4]

In [5]:
dataframe

Unnamed: 0,0,1,2,3,4
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


In [6]:
# look at dataset
dataset

array([[5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
       [4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
       [4.7, 3.2, 1.3, 0.2, 'Iris-setosa'],
       [4.6, 3.1, 1.5, 0.2, 'Iris-setosa'],
       [5.0, 3.6, 1.4, 0.2, 'Iris-setosa'],
       [5.4, 3.9, 1.7, 0.4, 'Iris-setosa'],
       [4.6, 3.4, 1.4, 0.3, 'Iris-setosa'],
       [5.0, 3.4, 1.5, 0.2, 'Iris-setosa'],
       [4.4, 2.9, 1.4, 0.2, 'Iris-setosa'],
       [4.9, 3.1, 1.5, 0.1, 'Iris-setosa'],
       [5.4, 3.7, 1.5, 0.2, 'Iris-setosa'],
       [4.8, 3.4, 1.6, 0.2, 'Iris-setosa'],
       [4.8, 3.0, 1.4, 0.1, 'Iris-setosa'],
       [4.3, 3.0, 1.1, 0.1, 'Iris-setosa'],
       [5.8, 4.0, 1.2, 0.2, 'Iris-setosa'],
       [5.7, 4.4, 1.5, 0.4, 'Iris-setosa'],
       [5.4, 3.9, 1.3, 0.4, 'Iris-setosa'],
       [5.1, 3.5, 1.4, 0.3, 'Iris-setosa'],
       [5.7, 3.8, 1.7, 0.3, 'Iris-setosa'],
       [5.1, 3.8, 1.5, 0.3, 'Iris-setosa'],
       [5.4, 3.4, 1.7, 0.2, 'Iris-setosa'],
       [5.1, 3.7, 1.5, 0.4, 'Iris-setosa'],
       [4.6, 3.6, 1.0, 0.2, 'Iri

In [3]:
X

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

In [4]:
Y

array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versic

# 4. Encode The Output Variable
The output variable contains 3 different string values.

When modeling multi-class classification problems using neural networks, it is good practice to reshape output attribute from a vector that contains values for each class value -> matrix with a boolean for each class value and whether or not a given instance has that class value.

This is called [one hot encoding](https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/) or creating dummy variables from a categorical variable.

Example: <br>
3 class values
- Iris-setosa
- Iris-versicolor
- Iris-virginica

Observations turned into a one-hot encoded binary matrix for each data instance:

| Iris-setosa   | Iris-versicolor | Iris-virginica  |
| ------------- |:---------------:| ---------------:|
| 1             | 0               | 0               |
| 0             | 1               | 0               |
| 0             | 0               | 1               |

Encode strings consistently to integers using scikit-learn class `LabelEncoder()`. <br>
Convert vector of integers to a one hot encoding using  Keras function `to_categorical()`.

In [7]:
# encode class values as integers
encoder = LabelEncoder()
# convert labels into binary data
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

In [8]:
dummy_y

array([[1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0

# 5. Define The Neural Network Model
The Keras library provides wrapper classes to use neural network models developed with Keras in scikit-learn.

There is a `KerasClassifier` class in Keras that can be used as an Estimator in scikit-learn (base type of model). <br>
The `KerasClassifier()` takes the name of a function as an argument. This function must return the constructed neural network model, ready for training.

Network topology of this one-layer neural network: <br>
4 inputs -> [8 hidden nodes] -> 3 outputs

Below is a function that creates a baseline neural network. It creates a fully connected network with 1 hidden layer that contains 8 neurons.

The hidden layer uses a rectifier activation function (reLU). <br>
One-hot encoding for iris dataset -> output layer must create 3 output values, one for each class <br>
The output value with the largest value will be the class predicted by the model.

`softmax` activation function in the output layer ensures the output values are in the range of 0 and 1 and may be used as predicted probabilities.

Network uses the efficient `Adam` gradient descent optimization algorithm with a logarithmic loss function, `categorical_crossentropy` in Keras.

In [9]:
# define baseline model
def baseline_model():
    # create model
    model = Sequential() # start
    model.add(Dense(8, input_dim=4, activation='relu')) # hidden
    model.add(Dense(3, activation='softmax')) # output
    
    # Compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

Pass optional arguments in the construction of the KerasClassifier class to the `fit()` function internally used to train the neural network. 
This case:
- number of epochs = 200
- batch size = 5
- verbose = 0 -> debugging turned off

In [10]:
estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

# 6. Evaluate The Model with k-Fold Cross Validation
Evaluate neural network model on training data.

The scikit-learn has excellent capability to evaluate models using a suite of techniques. The gold standard for evaluating machine learning models is k-fold cross validation.

Define model evaluation procedure. <br>
Set number of folds to be 10 (default) and shuffle data before partitioning it.

In [11]:
kfold = KFold(n_splits=10, shuffle=True)

Evaluate model (`estimator`) on dataset (`X` and `dummy_y`) using a 10-fold cross-validation procedure (`kfold`).

Evaluating model returns an object that describes the evaluation of the 10 constructed models for each of the splits of the dataset.

In [12]:
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Baseline: 98.00% (4.27%)


The results are summarized as both the mean and standard deviation of the model accuracy on the dataset.

# 7. Make Predictions
Split the dataset, train on 67% and make predictions on 33%. Since output class value is encoded as integers, predictions are integers. Use `encoder.inverse_transform()` to turn the predicted integers back into strings.

In [22]:
from sklearn.model_selection import train_test_split
import numpy as np

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# split dataset
X_train, X_test, Y_train, Y_test = train_test_split(X, dummy_y, test_size=0.33, random_state=seed)
# fit model
estimator.fit(X_train, Y_train)
# make predictions
predictions = estimator.predict(X_test)

# output predictions (integers)
print(predictions)
# output predictions (strings)
print(encoder.inverse_transform(predictions))

[2 1 0 1 2 0 1 1 0 1 1 1 0 2 0 1 2 2 0 0 1 2 1 2 2 2 1 1 2 2 2 1 0 2 1 0 0
 0 0 2 2 1 2 2 1 0 1 1 2 0]
['Iris-virginica' 'Iris-versicolor' 'Iris-setosa' 'Iris-versicolor'
 'Iris-virginica' 'Iris-setosa' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-setosa' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-setosa' 'Iris-virginica' 'Iris-setosa' 'Iris-versicolor'
 'Iris-virginica' 'Iris-virginica' 'Iris-setosa' 'Iris-setosa'
 'Iris-versicolor' 'Iris-virginica' 'Iris-versicolor' 'Iris-virginica'
 'Iris-virginica' 'Iris-virginica' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-virginica' 'Iris-virginica' 'Iris-virginica' 'Iris-versicolor'
 'Iris-setosa' 'Iris-virginica' 'Iris-versicolor' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-virginica'
 'Iris-virginica' 'Iris-versicolor' 'Iris-virginica' 'Iris-virginica'
 'Iris-versicolor' 'Iris-setosa' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-virginica' 'Iris-setosa']
