## Classification
The task of classification is to predict the probability of a specific datapoint is within all the different classes it could be.

In this task we will use Iris dataset to predict the type of flower. 

In [1]:
from __future__ import absolute_import, division, print_function, unicode_literals

import pandas as pd #pandas is a data analytic tool, useful to easy manipulate data

import tensorflow as tf

## Dataset
There are 3 different species of Iris flowers in the dataset.


*   Setosa
*   Versicolor
*   Virginica

The information about each flower is as follows:

*  sepal length
*  sepal width
*  petal length
*  petal width




In [3]:
CSV_COLUMN_NAMES=['SepalLength','SepalWidth','PetalLength','PetalWidth','Species']
SPECIES=['Setosa','Versicolor','Virginica']

In [7]:
train_ds_url = "http://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv"
test_ds_url = "http://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv"
train_path= tf.keras.utils.get_file("iris_training.csv",train_ds_url)
test_path= tf.keras.utils.get_file("iris_test.csv",test_ds_url)
train=pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test=pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)

Downloading data from http://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv
Downloading data from http://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv


In [8]:
train.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
0,6.4,2.8,5.6,2.2,2
1,5.0,2.3,3.3,1.0,1
2,4.9,2.5,4.5,1.7,2
3,4.9,3.1,1.5,0.1,0
4,5.7,3.8,1.7,0.3,0


In [9]:
# Get the labels
train_y= train.pop('Species')
test_y= test.pop('Species')

## Input function

In [10]:
# a function to create the dataset in a form that meets the requirments
def input_function(features, labels, training=True, batch_size=256): 
  ds=tf.data.Dataset.from_tensor_slices((dict(features), labels))
  if training:
    ds=ds.shuffle(1000).repeat() #randomize order of data
  return ds.batch(batch_size)

## Feature columns

In [11]:
feature_cols=[]
for key in train.keys(): # train.keys: return the column names
  feature_cols.append(tf.feature_column.numeric_column(key=key))
print(feature_cols)

[NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]


## Building the classification model

There are pre-built classification models in Tensorflow, some of them:
*  DNNClassifier(Deep Neural Network)
*  LinearClassifier: works similary to the linear regression model.

Tensorflow recommend to use DNNClassifier.



In [13]:
# Building a DNN with 2 hidden layers, the first with 30 hidden nodes and the second with 10 hidden nodes.
classifier= tf.estimator.DNNClassifier(
    feature_columns=feature_cols,
    hidden_units=[30,10],
    n_classes=3
)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpzz2coqod', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


## Training

In [14]:
classifier.train(input_fn=lambda: input_function(train,train_y),steps=5000)
# A training step is one gradient update. 
# In one step batch_size examples are processed. 
# An epoch consists of one full cycle through the training data. This is usually many steps.

Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpzz2coqod/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 1.386832, step = 0
INFO:tensorflow:global_step/sec: 345.524
INFO:tensorflow:loss = 0.99469614, step = 100 (0.294 sec)
INFO:tensorflow:global_step/sec: 453.732
INFO:tensorflow:loss = 0.9130411, step = 200 (0.220 sec)
INFO:tensorflow:global_step/sec

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7f068ea5aa90>

## Evaluating


In [16]:
eval_result =classifier.evaluate(input_fn=lambda:input_function(test,test_y,training=False))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2022-04-11T09:04:39
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpzz2coqod/model.ckpt-5000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.56982s
INFO:tensorflow:Finished evaluation at 2022-04-11-09:04:39
INFO:tensorflow:Saving dict for global step 5000: accuracy = 0.9, average_loss = 0.500116, global_step = 5000, loss = 0.500116
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 5000: /tmp/tmpzz2coqod/model.ckpt-5000


In [21]:
print("Test accuracy: {accuracy: 0.3f} ".format(**eval_result))

Test accuracy:  0.900 


## Predection

In [26]:
def input_fun(features,batch_size=256):
  return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)

features=['SepalLength','SepalWidth','PetalLength','PetalWidth']
predict={}
print("Please type numeric values as prompoted.")

for feature in features: 
  valid=True
  while valid:
    val=input(feature + ": ")
    if not val.isdigit():valid=False

  predict[feature]=[float(val)]

predections=classifier.predict(input_fn=lambda: input_fun(predict))
for pred_dict in predections:
  print(pred_dict)
  class_id=pred_dict['class_ids'][0]
  probability= pred_dict['probabilities'][class_id]
  print('Predictions is "{}" ({:.1f}%)'.format(SPECIES[class_id], 100*probability))

Please type numeric values as prompoted.
SepalLength: 1.1
SepalWidth: 2.1
PetalLength: 3.1
PetalWidth: 4.1
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpzz2coqod/model.ckpt-5000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
{'logits': array([-1.0505493 , -0.18821499,  0.31681746], dtype=float32), 'probabilities': array([0.1371049 , 0.3247581 , 0.53813696], dtype=float32), 'class_ids': array([2]), 'classes': array([b'2'], dtype=object), 'all_class_ids': array([0, 1, 2], dtype=int32), 'all_classes': array([b'0', b'1', b'2'], dtype=object)}
Predictions is "Virginica" (53.8%)
