<br>

---

# 2 - Classification

---
<br>

In this example, we will build a flower classifier of 3 classes:
* Setosa
* Versicolor
* Virginica

Based on 4 numerical variables:
* Sepal length
* Sepal width
* Petal length
* Petal width

<br>

---

## 1.0 - Importing Modules

---
<br>

In [1]:
from __future__ import absolute_import, division, print_function, unicode_literals

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import clear_output
from six.moves import urllib

import tensorflow.compat.v2.feature_column as fc
import tensorflow as tf

<br>

---

## 2.0 - Dataset

---
<br>

The dataset is obtained through the submodule `keras`, responsible for deep learning stuff, from `tensorflow`.

In [2]:
# By default the CSV file header is messy. Let's use our own header
COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']  
# The CSV encodes species with 0, 1 and 2. Use this array to decode
SPECIES = ['Setosa', 'Versicolor', 'Virginica']                                          

In [5]:
# Download the dataset from online source into our local machine
train_path = tf.keras.utils.get_file('iris_training.csv', 'https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv')
test_path = tf.keras.utils.get_file('iris_test.csv', 'https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv')

clear_output()

print("Iris training file location " + train_path)
print("Iris test file location " + test_path)

Iris training file location C:\Users\Acer\.keras\datasets\iris_training.csv
Iris test file location C:\Users\Acer\.keras\datasets\iris_test.csv


In [6]:
# Read the csv into pandas dataframe
train_data = pd.read_csv(train_path, names=COLUMN_NAMES, header=0)
test_data = pd.read_csv(test_path, names=COLUMN_NAMES, header=0)

In [7]:
# Separate the labels (Species) from the dataframe itself which consist of features (columns)
train_y = train_data.pop('Species')
test_y = test_data.pop('Species')

In [8]:
# Let's see our data
print(train_data.shape)       # 120 rows, 4 columns
train_data.head()

(120, 4)


Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
0,6.4,2.8,5.6,2.2
1,5.0,2.3,3.3,1.0
2,4.9,2.5,4.5,1.7
3,4.9,3.1,1.5,0.1
4,5.7,3.8,1.7,0.3


In [10]:
# Labels. Note that they are integers (0,1,2), and we need to map them into meaningful classes 
# like Versicolor later
train_y.head()        

0    2
1    1
2    2
3    0
4    0
Name: Species, dtype: int64

<br>

---

## 3.0 - Input Function

---
<br>

In [11]:
# Instead of like last input function where we specify the epochs and shuffle, this time they are both triggered by a boolean parameter
# isTraining. If isTraining is true, then do shuffle and have multiple epochs
# Also, now input function no need to have the input function to return another function
def input_function( features, labels, isTraining=True, batch_size=256):
  # Convert input into datasets
  dataset = tf.data.Dataset.from_tensor_slices( (dict(features), labels) )

  # If in training mode, shuffle and epochs
  if isTraining:
    dataset = dataset.shuffle(1000).repeat()

  return dataset.batch(batch_size)

<br>

---

## 4.0 - Feature Columns

---
<br>

We have 4 numeric columns
* Sepal Length
* Sepal Width
* Petal Length
* Petal Width

In [12]:
# We simply has 4 features, which are all in unit length, so they are numeric column, and not categorical
feature_columns = []

for key in train_data.keys():
  print(key)
  feature_columns.append(tf.feature_column.numeric_column(key) )

print()
feature_columns

SepalLength
SepalWidth
PetalLength
PetalWidth



[NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

<br>

---

## 5.0 - Building The Model

---
<br>

* There are various classification models, but some of the well known ones are:
  * `DNNClassifier` (Deep neural Network)
  * `LinearClassifier` - Works like linear regression except it classifies


* In this case we chose `DNNClassifier` because the features (sepal length etc...) might not just be related to labels (Versicolor, Virginica...) via linear relationship

In [13]:
# Remember that tf.estimator has a ton of pre made models

# DNNClassifiers need to have a hidden_units specified. It is a building architecture
# of the DNN. Remember that the neural network has layers:
#       Input layers - Hidden layers - Output layers
classifier = tf.estimator.DNNClassifier( 
    feature_columns=feature_columns,
    hidden_units=[30,10],
    n_classes=3 )         

clear_output()

In [14]:
# Train the model. Remember that input function parameter need to have a function which has
# no parameter, but inside it must know about the training data and labels. So use lambda
# as outer function for it
#
# "steps" is similar to epochs, but it means to go though precisely 5000 items instead of like
# iterate the datasets N times.
classifier.train(
    input_fn = lambda: input_function(train_data, train_y),
    steps = 5000
)

Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\Acer\AppData\Local\Temp\tmpmdvo7o08\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 2.3024757, step = 0
INFO:tensorflow:global_step/sec: 559.49
INFO:tensorflow:loss = 1.564131, step = 100 (0.180 sec)
INFO:tensorflow:global_step/sec: 788.829
INFO:tensorflow:loss = 1.2945862, step = 200 (0.126 sec)
INFO:

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x29dfd5918b0>

<br>

---

## 6.0 - Predicting User Inputs

---
<br>

Our model is done training. Let's obtain user input and predict the class of flower!

In [15]:
# Remember, even though user input is only size 1 (unlike test datasets), it also requires an input function of its own
# We can do that!
def user_input_function(features, batch_size=256):
  return tf.data.Dataset.from_tensor_slices( dict(features) ).batch(batch_size)
features = COLUMN_NAMES[:4]
user_input = {}


print("Please enter the numeric value as prompted:")
for feature in features:
  user_input[feature] = [float( input(feature + ": ") )]       # Array of size 1. Only 1 item to predict

# The user input obtained and will be predicted
print(user_input, end='\n\n')

# Remember: It returns a generator. We use list()
predicted_res = list( classifier.predict(input_fn=lambda: user_input_function(user_input) ) )


Please enter the numeric value as prompted:
SepalLength: 2.4
SepalWidth: 2.6
PetalLength: 6.5
PetalWidth: 6.3
{'SepalLength': [2.4], 'SepalWidth': [2.6], 'PetalLength': [6.5], 'PetalWidth': [6.3]}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\Acer\AppData\Local\Temp\tmpmdvo7o08\model.ckpt-5000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


In [16]:
# The prediction is still in integer, map it to meaningful class name like Versicolor
predicted_id = predicted_res[0]['class_ids'][0]
predicted_flower = SPECIES[predicted_id]
predicted_prob = predicted_res[0]['probabilities'][predicted_id]

print(f'Predicted: {predicted_flower} with certainty of {predicted_prob * 100}%')

Predicted: Virginica with certainty of 91.11234545707703%
