# Basic Classification: Classify Iris Flower

This is a commonly used example for teaching how to build Classification ML models.

The dataset is simple. There are four training features whcich are measurements of Iris Flower sepal and petal widths and lengths. The label is the Species of flower. 

For those who don't remember their middle school botany class. The petals are the colorful parts of the flowers, and the sepal are the green leaves below the petals. 

## Install and import the required packages

In [95]:
# Use seaborn for pairplot.
!pip install -q seaborn

In [96]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers

# Make NumPy printouts easier to read.
np.set_printoptions(precision=3, suppress=True)

print(tf.__version__)

## Get the data
First download and import the dataset. There are two CSV files, one for training and one for testing. 

In [98]:
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

train_path = tf.keras.utils.get_file(
    "iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path = tf.keras.utils.get_file(
    "iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")

train_dataset = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test_dataset = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)

Just display the first few records of the training data, and then the test data. 

In [None]:
train_dataset[:5]

In [None]:
test_dataset[:5]

Output the columns. Note, the Species column is the label, and the rest are the features. 

In [None]:
train_dataset.columns

## Inspect the data

Use the Seaborn pairplot() function to review the joint distribution of the pairs of columns from the training set.

You can see that there are patterns in the features that can be used to predict the species. 

In [None]:
sns.pairplot(train_dataset[CSV_COLUMN_NAMES], diag_kind='kde')

The Pandas Dataframe describe() method is useful to check the overall statistics of the data. 

In [None]:
train_dataset.describe().transpose()

## Split features from labels

Separate the target value—the "label"—from the features. This label is the value that you will train the model to predict.

In [104]:
train_features = train_dataset.copy()
test_features = test_dataset.copy()

train_labels = train_features.pop('Species')
test_labels = test_features.pop('Species')

In [None]:
train_features[:5]

In [None]:
train_labels[:5]

### Build and Train a Simple Clasification Model

Note, in the first layer the activiation function (linear). The Softmax layer is used to return the probability of each class being true for each example. 

In [107]:
classification_model = tf.keras.Sequential([
   tf.keras.layers.Dense(8, input_dim=4, activation='linear'),
   tf.keras.layers.Dense(3, activation='softmax'),
])

The model needs to be compiled prior to training. 

In [None]:
classification_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

classification_model.summary()

Now, train the model using the fit() function. 

Note, the training history is being collected in the history variable. After train, this is used to review what happened during during. 

In [None]:
%%time
history = classification_model.fit(train_features, 
                                   train_labels, 
                                   epochs=100)


## Evaluating and Testing the Model

Training is complete. Now view the loss and accuracy using the history variable. 

In [None]:
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.head()
hist.tail()

In [111]:
def plot_loss(history):
  plt.plot(history.history['loss'], label='loss')
  plt.plot(history.history['accuracy'], label='accuracy')
  plt.ylim([0, 2])
  plt.xlabel('Epoch')
  plt.ylabel('Error')
  plt.legend()
  plt.grid(True)

In [None]:
plot_loss(history)

Use the evaluate() function, passing in the test data, to see how well the model does at predition. 

In [None]:
test_loss, test_acc = classification_model.evaluate(test_features,  test_labels, verbose=2)

print('\nTest accuracy:', test_acc)
print('\nTest loss:', test_loss)

## Using the Model for Inference

Use the predict() function to get predictions from the model. Recasl the last layer was the Softmax layer. The predictions are the probabilities of each example being the species (0, 1, or 2)

In [None]:
predictions = classification_model.predict(test_features)

In [115]:
predictions[:5]

array([[0.147, 0.37 , 0.483],
       [0.05 , 0.354, 0.597],
       [0.655, 0.192, 0.153],
       [0.161, 0.363, 0.476],
       [0.141, 0.38 , 0.48 ]], dtype=float32)

In [None]:
test_labels[:5]