<center> <h1>Iris Classification with TensorFlow</h1> </center>

In this Jupyter notebook we want to learn the basics of the higher level machine learning TensorFlow API (tf.contrib.learn). There are many built-in features which makes it easy to configure, train and evaluate a model.

As an example we focus on the famous [*Iris Flower Dataset*](https://en.wikipedia.org/wiki/Iris_flower_data_set)

  Iris Setosa (0)   | Iris Versicolor (1)   | Iris Virginicar (2)
  -------------  | -------------   |--------------
  ![alt](IrisClassifier/Kosaciec_szczecinkowaty_Iris_setosa.jpg "Logo Title Text 1")   | ![alt](IrisClassifier/220px-Iris_versicolor_3.jpg "Logo Title Text 1") | ![alt](IrisClassifier/220px-Iris_virginica.jpg "Logo Title Text 1")
  
We want to build a neural network which acts as a classifier and after it is trained assigns the feature vector which consists of the sepal and tepal width and length to one of the three different Iris species.

First, let us import the relevant packages and the training and test data.

In [1]:
import tensorflow as tf
import numpy as np

IRIS_TRAINING = "/home/jodahr/Jupyter/IrisClassifier/iris_training.csv"
IRIS_TEST = "/home/jodahr/Jupyter/IrisClassifier/iris_test.csv"

# Load datasets.
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=IRIS_TRAINING,
    target_dtype=np.int,
    features_dtype=np.float32)

test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=IRIS_TEST,
    target_dtype=np.int,
    features_dtype=np.float32)

# print test set to show the structure
print(test_set)

Dataset(data=array([[ 5.9000001 ,  3.        ,  4.19999981,  1.5       ],
       [ 6.9000001 ,  3.0999999 ,  5.4000001 ,  2.0999999 ],
       [ 5.0999999 ,  3.29999995,  1.70000005,  0.5       ],
       [ 6.        ,  3.4000001 ,  4.5       ,  1.60000002],
       [ 5.5       ,  2.5       ,  4.        ,  1.29999995],
       [ 6.19999981,  2.9000001 ,  4.30000019,  1.29999995],
       [ 5.5       ,  4.19999981,  1.39999998,  0.2       ],
       [ 6.30000019,  2.79999995,  5.0999999 ,  1.5       ],
       [ 5.5999999 ,  3.        ,  4.0999999 ,  1.29999995],
       [ 6.69999981,  2.5       ,  5.80000019,  1.79999995],
       [ 7.0999999 ,  3.        ,  5.9000001 ,  2.0999999 ],
       [ 4.30000019,  3.        ,  1.10000002,  0.1       ],
       [ 5.5999999 ,  2.79999995,  4.9000001 ,  2.        ],
       [ 5.5       ,  2.29999995,  4.        ,  1.29999995],
       [ 6.        ,  2.20000005,  4.        ,  1.        ],
       [ 5.0999999 ,  3.5       ,  1.39999998,  0.2       ],
       [ 5.

As you can see it is very easy to load the data in the desired form (Labeled Point) including a feature vector (data) and the label (target). If more feature engineering is needed you can have a look [here](https://www.tensorflow.org/get_started/input_fn).

Next, we can already configure the model. First we need to define the feature columns. For more information you can have a look at [https://www.tensorflow.org/api_docs/python/tf/contrib/layers/real_valued_column](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/real_valued_column).

In [2]:
# Specify that all features have real-value data
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=4)]

# Build 3 layer DNN with 10, 20, 10 units respectively.
# default: activation_fn=tf.nn.relu
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
                                            hidden_units=[10, 20, 10],
                                            n_classes=3,
                                            activation_fn=tf.nn.sigmoid,
                                            model_dir="/tmp/iris_model")

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_task_type': None, '_task_id': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f3ce853d5f8>, '_master': '', '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_environment': 'local', '_is_chief': True, '_evaluation_master': '', '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_secs': 600, '_session_config': None, '_save_checkpoints_steps': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': '/tmp/iris_model'}


In [3]:
# Define the training inputs
def get_train_inputs():
    x = tf.constant(training_set.data)
    y = tf.constant(training_set.target)
    return x, y

# tuple containg feature vector and label
print(get_train_inputs())

(<tf.Tensor 'Const:0' shape=(120, 4) dtype=float32>, <tf.Tensor 'Const_1:0' shape=(120,) dtype=int64>)


In [4]:
# Fit model.
classifier.fit(input_fn=get_train_inputs, steps=2000)

# Same as
#classifier.fit(x=training_set.data, y=training_set.target, steps=1000)
#classifier.fit(x=training_set.data, y=training_set.target, steps=1000)

Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from /tmp/iris_model/model.ckpt-4000
INFO:tensorflow:Saving checkpoints for 4001 into /tmp/iris_model/model.ckpt.
INFO:tensorflow:loss = 0.0467828, step = 4001
INFO:tensorflow:global_step/sec: 647.449
INFO:tensorflow:loss = 0.0462221, step = 4101 (0.155 sec)
INFO:tensorflow:global_step/sec: 647.38
INFO:tensorflow:loss = 0.0456386, step = 4201 (0.154 sec)
INFO:tensorflow:global_step/sec: 629.875
INFO:tensorflow:loss = 0.045146, step = 4301 (0.159 sec)
INFO:tensorflow:global_step/sec: 648.569
INFO:tensorflow:loss = 0.0446019, step = 4401 (0.154 sec)
INFO:tensorflow:global_step/sec: 627

DNNClassifier(params={'head': <tensorflow.contrib.learn.python.learn.estimators.head._MultiClassHead object at 0x7f3ce853d278>, 'hidden_units': [10, 20, 10], 'feature_columns': (_RealValuedColumn(column_name='', dimension=4, default_value=None, dtype=tf.float32, normalizer=None),), 'optimizer': None, 'activation_fn': <function sigmoid at 0x7f3d03585510>, 'dropout': None, 'gradient_clip_norm': None, 'embedding_lr_multipliers': None, 'input_layer_min_slice_size': None})

In [5]:
# Define the test inputs
def get_test_inputs():
    x = tf.constant(test_set.data)
    y = tf.constant(test_set.target)

    return x, y

# Evaluate accuracy.
accuracy_score = classifier.evaluate(input_fn=get_test_inputs,
                                     steps=1)["accuracy"]

print("\nTest Accuracy: {0:f}\n".format(accuracy_score))

Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.
INFO:tensorflow:Starting evaluation at 2017-07-02-20:53:09
INFO:tensorflow:Restoring parameters from /tmp/iris_model/model.ckpt-6000
INFO:tensorflow:Evaluation [1/1]
INFO:tensorflow:Finished evaluation at 2017-07-02-20:53:09
INFO:tensorflow:Saving dict for global step 6000: accuracy = 0.966667, global_step = 6000, loss = 0.0772428

Test Accuracy: 0.966667



In [6]:
# Classify two new flower samples.
def new_samples():
    return np.array(
    [[6.4, 3.2, 4.5, 1.5],
    [6.4,2.8,5.6,2.2]], dtype=np.float32)

def test():
    return test_set.data

predictions = list(classifier.predict(input_fn=test))
print(
    "New Samples, Class Predictions:    {}\n"
    .format(predictions))

print(test_set.target)

Instructions for updating:
Please switch to predict_classes, or set `outputs` argument.
INFO:tensorflow:Restoring parameters from /tmp/iris_model/model.ckpt-6000
New Samples, Class Predictions:    [1, 2, 0, 1, 1, 1, 0, 1, 1, 2, 2, 0, 2, 1, 1, 0, 1, 0, 0, 2, 0, 1, 2, 1, 1, 1, 0, 1, 2, 1]

[1 2 0 1 1 1 0 2 1 2 2 0 2 1 1 0 1 0 0 2 0 1 2 1 1 1 0 1 2 1]


In [7]:
logs_path = "/tmp/iris_model"

In [8]:
def TB(cleanup=False):
    import webbrowser
    webbrowser.open('http://127.0.1.1:6006')

    !tensorboard --logdir="/tmp/iris_model"

    if cleanup:
        !rm -R logs/

In [None]:
TB(1)

Starting TensorBoard b'54' at http://jodahr-Lenovo-IdeaPad-Y510P:6006
(Press CTRL+C to quit)
