<!--NAVIGATION-->

<a href="https://colab.research.google.com/github/bpesquet/machine-learning-katas/blob/master/classic-datasets/Iris.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>


# Kata: Iris Dataset

| Learning type | Activity type | Objective |
| - | - | - |
| Supervised | Multiclass classification | Identify a flower's class |

## Instructions

This is a self-correcting exercise generated by [nbgrader](https://github.com/jupyter/nbgrader). 

Complete the cells beginning with `# YOUR CODE HERE` and run the subsequent cells to check your code.

## About the dataset

[Iris](https://archive.ics.uci.edu/ml/datasets/iris) is a well-known multiclass dataset. It contains 3 classes of flowers with 50 examples each. There are a total of 4 features for each flower.

![](images/Iris-versicolor-21_1.jpg)

## Package setup

In [None]:
# Import needed packages
# You may add or remove packages should you need them
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.datasets import load_iris
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import *
from keras.utils import to_categorical

# Display plots inline and change plot resolution to retina
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
# Set Seaborn aesthetic parameters to defaults
sns.set()

## Utility functions

In [None]:
def plot_loss_acc(history):
    """Plot training and (optionally) validation loss and accuracy"""

    loss = history.history['loss']
    epochs = range(1, len(loss) + 1)

    plt.figure(figsize=(10, 10))

    plt.subplot(2, 1, 1)
    plt.plot(epochs, loss, '.--', label='Training loss')
    final_loss = loss[-1]
    title = 'Training loss: {:.4f}'.format(final_loss)
    plt.ylabel('Loss')
    if 'val_loss' in history.history:
        val_loss = history.history['val_loss']
        plt.plot(epochs, val_loss, 'o-', label='Validation loss')
        final_val_loss = val_loss[-1]
        title += ', Validation loss: {:.4f}'.format(final_val_loss)
    plt.title(title)
    plt.legend()

    acc = history.history['acc']

    plt.subplot(2, 1, 2)
    plt.plot(epochs, acc, '.--', label='Training acc')
    final_acc = acc[-1]
    title = 'Training accuracy: {:.2f}%'.format(final_acc * 100)
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    if 'val_acc' in history.history:
        val_acc = history.history['val_acc']
        plt.plot(epochs, val_acc, 'o-', label='Validation acc')
        final_val_acc = val_acc[-1]
        title += ', Validation accuracy: {:.2f}%'.format(final_val_acc * 100)
    plt.title(title)
    plt.legend()

## Step 1: Loading the data

In [None]:
# Load the Iris dataset included with scikit-learn
dataset = load_iris()

# Put data in a pandas DataFrame
df_iris = pd.DataFrame(dataset.data, columns=dataset.feature_names)
# Add target and class to DataFrame
df_iris['target'] = dataset.target
df_iris['class'] = dataset.target_names[dataset.target]
# Show 10 random samples
df_iris.sample(n=10)

### Question

Store training input data in a variable named `x_train` and training targets in a variable named `y_train`.

In [None]:
# YOUR CODE HERE

In [None]:
print(f'x_train: {x_train.shape}. y_train: {y_train.shape}')
print(f'Labels: {y_train}')
assert x_train.shape == (150,4)
assert y_train.shape == (150,)

## Step 2: Preparing the data

### Question

Reshape `y_train` to one-hot encode the targets in a (150,3) matrix.

In [None]:
# YOUR CODE HERE

In [None]:
# Show a sample of encoded targets
df_iris_labels = pd.DataFrame(y_train)
df_iris_labels.sample(n=10)

In [None]:
print(f'y_train: {y_train.shape}')
assert y_train.shape == (150,3)
assert np.array_equal([1,0,0], y_train[0])
assert np.array_equal([0,1,0], y_train[50])
assert np.array_equal([0,0,1], y_train[100])

## Step 3: Training a model

### Question

Train a model on the data to obtain a training accuracy > 93%. Store the training history in a variable named `history`.

Tip: for best results, use the Adam optimizer with a learning rate of 0.1.

```python
model.compile(Adam(lr=0.1), loss='categorical_crossentropy', metrics=['accuracy'])
```

In [None]:
# YOUR CODE HERE

In [None]:
# Plot training history
plot_loss_acc(history)

In [None]:
# Retrieve final accuracy
final_acc = history.history['acc'][-1]
# Assert final accuracy
assert final_acc > 0.93