# ðŸ¤” Multi-Class Classification Tutorial with the ```Keras``` Deep Learning Library ðŸ¤”

* We're using the **Iris Dataset**. It's good for practicing with Neural Networks because all 4 input variables are numeric and have the same scale (cm).
* This is a **multi-class classification** problem, i.e. there are **more than 2 classes to be predicted**. (There are 3 flower species.)
* We can expect to achieve model accuracy in the range of **95-97%**.

In [1]:
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn import datasets
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

Using TensorFlow backend.
  return f(*args, **kwds)


### Initialize random number generator

Let's **seed** the random number generator by initializing with a constant value (let's go with **7**.)

Seeding assures us our future results will be perfectly replicable.

In [2]:
seed = 7
numpy.random.seed(seed)

### Load the dataset

In [3]:
iris = datasets.load_iris()
X, y = iris.data, iris.target

### Encode the output variable

In [4]:
#encode class values as integers
encoder = LabelEncoder()
encoder.fit(y)
encoded_y = encoder.transform(y)

#convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_y)

# Define the Neural Network model

* The ```Keras``` library provides **wrapper classes**, which let us use NN models developed in ```Keras``` in ```scikit-learn```.
* The ```KerasClassifier``` class can be used as an Estimator in ```scikit-learn```, the base type of model in the library.
* ```KerasClassifier``` takes the **name of a function** as an argument. This function must return the constructed NN model, ready for training.

About the code:
* The function below will create a **baseline NN** for the iris classification problem. It creates a simple but fully connected network with **one hidden layer**, which consists of **8 neurons**.
* The hidden layer uses a **rectifier activation function**. Since we used **one hot encoding** for the dataset, the **output layer** must create **3 output values** (*one for each class*).
* The **largest output value** will be taken as the class predicted by the model!

Network topology: ```4 inputs -> [8 hidden nodes] -> 3 outputs```

Notes on code:
* We use a "*softmax*" activation function in the **output layer**. This ensures output values are in the range of 0 and 1 and may be used as **predicted probabilities**.
* The network uses the efficient **Adam gradient descent optimization algorithm** with a **logarithmic loss function**. In ```Keras```, it's called "*categorical_crossentropy*".

### Baseline model

In [5]:
def baseline_model():
    model = Sequential()
    model.add(Dense(8, input_dim=4, activation='relu'))
    model.add(Dense(3, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

With the baseline model function written, we can now create our ```KerasClassifier``` to use in ```scikit-learn```!

```KerasClassifier``` will be passed on to the ```fit()``` function internally used to train the NN. Let's pass in the following arguments when training the model:
* **number of epochs** set to 200
* **batch size** set to 5
* **verbose** set to 0 (*to turn debugging off when training*)

### Create the ```KerasClassifier``` model

In [6]:
estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

# Evaluate the model with k-fold cross validation

First, let's define the model evaluation procedure. Let's set the number of folds to 10 (a good default number) and to shuffle the data before partitioning it.

In [7]:
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)

Now let's evaluate our model (```estimator```) on our dataset (```X``` and ```dummy_y```) using a 10-fold **cross-validation procedure**.

Model evaluation takes about ten seconds. It returns the metric (here, the metric is *accuracy*) describing the evaluation of the 10 constructed models for each of the splits in the dataset.

In [8]:
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print('Baseline: %.2f%% (%.2f%%)' % (results.mean()*100, results.std()*100))

KeyboardInterrupt: 

*Source*: https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/