## The Iris Dataset
In this tutorial we will create a neural network to classify 3 different types of Iris (Setosa, Versicolor and Virginica) based on their sepal length, sepal width, petal length and petal width.

![Irises](http://dataaspirant.com/wp-content/uploads/2017/01/irises.png)

This is a multi class classification problem. It is similar to the Pima Indian's binary classification tutorial, but with three classes to predict instead of two.

### Import dependencies
Start by importing the dependencies we will need for the project

In [None]:
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
from uoa_mlaas import use_cpu
use_cpu()

### Set seed
Set a seed value so that when we repeatedly run our code we will get the same result. Using the same seed is important when you want to compare algorithms.

In [None]:
seed = 7
np.random.seed(seed)

### Import data
The Iris dataset contains four features from 150 different Iris flowers. The features in the dataset are described below.

* Sepal length (cm)
* Sepal width (cm)
* Petal length (cm)
* Petal width (cm)
* Class: Iris setosa, Iris versicolor or Iris virginica

Sepals are the part of a flower that protect and support the petals. The petals surround the reproductive parts of the flower.

![Iris labeled](http://terpconnect.umd.edu/~petersd/666/html/iris_with_labels.jpg)

A snapshot of the dataset is illustrated below (not in order).

|Sepal Length|Sepal Width|Petal Length|Petal Width|Class|
|---|---|---|---|-----------|
|5.1|3.5|1.4|0.2|Iris-setosa|
|4.9|3.0|1.4|0.2|Iris-setosa|
|7.0|3.2|4.7|1.4|Iris-versicolor|
|6.4|3.2|4.5|1.5|Iris-versicolor|
|6.3|3.3|6.0|2.5|Iris-virginica|
|5.8|2.7|5.1|1.9|Iris-virginica|

To load this data into memory, use the `np.loadtxt` function. The data type (`dtype`) is set to `str` because our input data is a mix of numbers and strings. This will be dealt with when we split the data.

In [None]:
data = np.loadtxt('data/iris.csv', delimiter=",", dtype=str)
print(data)

Separate the data into input (X) and output (y) variables.

Note that we convert the input data into floats.

In [None]:
X = data[:, 0:4].astype(float)
y = data[:, 4]

If you look carefully at the target values, you will notice that they are strings, i.e. 'Iris-setosa', 'Iris-versicolor' and 'Iris-virginica'.

Keras needs numbers or matrices to work with, so we will need to reformat the target values.

The problem with converting the class values to numbers (e.g. 'Iris-setosa' becomes 0, 'Iris-versicolor' 1 etc) is that it implies that the target values are ordinal. That is, 'Iris-setosa' is somehow less than 'Iris-versicolor'.

A better way to represent classes in a multi-class classification problem, is to 'one hot encode' the target values. An example is shown below. A matrix of zeros is generated. Each row corresponds to a sample and each column corresponds to a particular class. A 1 is placed into the column to incidicate the class that it belongs too.

|Iris-setosa|Iris-versicolor|Iris-virginica|
|---|---|---|
|1|0|0|
|0|1|0|
|0|0|1|

One hot encoding is a two step process. First encode the target values (y) into an array of numbers using the `LabelEncoder` from scikit-learn and then one hot encode the numbers with the `np_utils.to_categorical` function.

In [None]:
y_encoded = LabelEncoder().fit(y).transform(y) # Convert the classes into numbers
y_one_hot_encoded = np_utils.to_categorical(y_encoded) # One hot encode the numbers

Like the previous tutorial, use the `train_test_split` function from scikit-learn to split the input and target data into training and test datasets.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y_one_hot_encoded, test_size=0.33, random_state=seed)

### Create the model
The code snippet below creates a very basic neural network model, with three layers: an input layer, a hidden layer and an output layer.

The first layer is a fully connected `Dense` layer. We use four neurons in the hidden layer and have 4 input neurons for the 4 features.

The last layer has 3 neurons, one for each class.

In [None]:
model = Sequential()
model.add(Dense(4, input_dim=4, activation='relu', kernel_initializer='normal'))
model.add(Dense(3, activation='sigmoid', kernel_initializer='normal'))

### Compile the model
We then compile the model. The loss function is set to `categorical_crossentropy` (different from the loss function used in the binary classification tutorial) because we are performing multi-class classification.

In [None]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

### Fit the model
Now that we have compiled the model, we can train it with the data we prepared earlier. We are using more epochs but a smaller batch size than the previous tutorial.

In [None]:
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=200, batch_size=5)

### Evaluate the model
Now that we have trained our model, we can evaluate the performance on the test data.

In [None]:
scores = model.evaluate(X_test, y_test)
print("\n\n{0}: {1:.2f}%".format(model.metrics_names[1], scores[1]*100))