<a href="https://colab.research.google.com/github/betoval/learning-tensorflow/blob/master/estimator-iris-problem.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
%tensorflow_version 2.x
import tensorflow as tf
import pandas as pd
import numpy as np

The aim of this example is to solve the iris classification problem using `tf.estimator`. We will follow the example outlined in the tensorflow page.

We want to classify Iris flowers into three different species based on the size of their sepals and petals.

The iris dataset contains four features and one label. In supervised learning the label is the *answer* or *result* of an example.

The four features of the flowers are:

1. sepal length

2. sepal width

3. petal length

4. petal width



**DATA SETUP**

In [0]:
#begin by defining constants for parsing the data
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica'] #the three species of Iris flowers

In [27]:
#We download the Iris data using keras and Pandas
#Getting train dataset
train_path = tf.keras.utils.get_file("iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")

#Getting test dataset
test_path = tf.keras.utils.get_file("iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")

#Reading dataset using pandas
train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)

#visualizing data using pandas
train.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
0,6.4,2.8,5.6,2.2,2
1,5.0,2.3,3.3,1.0,1
2,4.9,2.5,4.5,1.7,2
3,4.9,3.1,1.5,0.1,0
4,5.7,3.8,1.7,0.3,0


In [28]:
#Remove the label which the model will be trained to predict (that is, 'Species')

train_y = train.pop('Species')
test_y = test.pop('Species')

train.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
0,6.4,2.8,5.6,2.2
1,5.0,2.3,3.3,1.0
2,4.9,2.5,4.5,1.7
3,4.9,3.1,1.5,0.1
4,5.7,3.8,1.7,0.3


**Programming with estimators**

To write a TensorFlow program based on pre-made Estimators, we must perform the following steps:

1. Create input functions.

2. Define the model's feature columns.

3. Instantiate an Estimator

4. Call one or more methods on the Estimator object, passing the appropiate input function as the source data.

We already encountered these steps in the fashion_mnist example.

**Input Functions**

An input function returns a tf.data.Dataset object which outputs the following two-element tuple:

1. `features` (Python dictionary)

2. `label`

Here's an example of what the input function does



```
def input_evaluation_set():
    features = {'SepalLength': np.array([6.4, 5.0]),
                'SepalWidth':  np.array([2.8, 2.3]),
                'PetalLength': np.array([5.6, 3.3]),
                'PetalWidth':  np.array([2.2, 1.0])}
    labels = np.array([2, 1])
    return features, labels
```



In [0]:
def input_fn(features, labels, training=True, batch_size=256):
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
    # Shuffle and repeat if you are in training mode.
    if training:
        dataset = dataset.shuffle(1000).repeat()
    return dataset.batch(batch_size)

**Define feature columns**

a feature column is an object that describes how the model shoul use input data from the features dictionary.

For the Iris data, the 4 raw features (sepal and petal stuff) are numeric values.

In this case, we'll tell the Estimator to represent the 4 features as 32-bit floating-point values.

In [0]:
my_feature_columns =[]
for key in train.keys():
  my_feature_columns.append(tf.feature_column.numeric_column(key=key))

**Instantiate an Estimator**

To solve the Iris classification problem, we'll use a pre-made classifier Estimator.

In [48]:
#Build a DNN with 2 hidden layers with 30 and 10 hidden neurons each.
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    #Two hidden layers of 30 and 10 neurons
    hidden_units=[30,10],
    #The model must choose between 3 classes
    n_classes=3
)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpbuwh2fqp', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


**Train the model**

Train the model by calling the Estimator's `train` method


In [60]:
classifier.train(input_fn=lambda: input_fn(train, train_y, training=True), steps=5000)

INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpbuwh2fqp/model.ckpt-10000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 10000...
INFO:tensorflow:Saving checkpoints for 10000 into /tmp/tmpbuwh2fqp/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 10000...
INFO:tensorflow:loss = 0.3622114, step = 10000
INFO:tensorflow:global_step/sec: 453.22
INFO:tensorflow:loss = 0.35941434, step = 10100 (0.224 sec)
INF

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7f4e331622e8>

**Evaluate the model**

We have trained the model. Now, it is time to obtain some statistics on its performance


In [61]:
eval_result = classifier.evaluate(input_fn=lambda: input_fn(test, test_y, training=False))
print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-04-10T01:12:09Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpbuwh2fqp/model.ckpt-15000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.29633s
INFO:tensorflow:Finished evaluation at 2020-04-10-01:12:09
INFO:tensorflow:Saving dict for global step 15000: accuracy = 0.96666664, average_loss = 0.32711506, global_step = 15000, loss = 0.32711506
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 15000: /tmp/tmpbuwh2fqp/model.ckpt-15000

Test set accuracy: 0.9

**Make Predictions**

We can use the trained model to predict the species of an Iris flower based on some unlabeled measurements. Below, we use the predict method, which returns a Python iterable, yielding a dictionary of prediction results for each example.

In [0]:
expected = ['Setosa', 'Versicolor', 'Virginica'] 
#new data
predict_x = {
    'SepalLength': [5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1],
}
#we define a function but now without labels because that's what we want the model to predict
def input_fn(features, batch_size=256):
    #Convert the inputs to a Dataset without labels.
    return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)

predictions = classifier.predict(
    input_fn=lambda: input_fn(predict_x))


The following code prints predictions and their probabilities

In [74]:
for pred_dict, expec in zip(predictions, expected):
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print('Prediction is "{}" ({:.1f}%), expected "{}"'.format(
        SPECIES[class_id], 100 * probability, expec))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpbuwh2fqp/model.ckpt-15000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Prediction is "Setosa" (92.4%), expected "Setosa"
Prediction is "Versicolor" (64.1%), expected "Versicolor"
Prediction is "Virginica" (67.3%), expected "Virginica"
