**Classification**

Regression used to predict numeric value, classification is used to separate data points into of different classes

We'll use TensorFlow estimator to classify flowers of iris dataset

tensorflow guide link: https://www.tensorflow.org/tutorials/estimator/premade

In [None]:
%tensorflow_version 2.x

Colab only includes TensorFlow 2.x; %tensorflow_version has no effect.


In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import pandas as pd


**Dataset**

The specific dataset separates flowers into 3 different classes of species

1.   Setosa
2.   Versicolor
3.   Virginica

The information about each flower is:

1.   sepal.length
2.   sepal.width
3.   petal.length
4.   petal.width





In [None]:
# lets define some constants

CSV_COLUMNS = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

In [None]:
# use keras (module inside of tensorflow) to grab our datasets and read them into pandas dataframe

train_path = tf.keras.utils.get_file(
    "iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv"
)

test_path = tf.keras.utils.get_file(
    "iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv"
)

train = pd.read_csv(train_path, names=CSV_COLUMNS, header=0)
test = pd.read_csv(test_path, names=CSV_COLUMNS, header=0)

In [None]:
train.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
0,6.4,2.8,5.6,2.2,2
1,5.0,2.3,3.3,1.0,1
2,4.9,2.5,4.5,1.7,2
3,4.9,3.1,1.5,0.1,0
4,5.7,3.8,1.7,0.3,0


Now we can pop species column off and use that as our label

In [None]:
y_train = train.pop("Species")
y_test = test.pop("Species")
y_train.head()

0    2
1    1
2    2
3    0
4    0
Name: Species, dtype: int64

**Feature columns of data**

Convert cateogorical feature columns of dataset to numeric value feature columns using tf.feature_column.categorical_column_with_vocabulary_list(feature_columns, vocabulary)

this tensorflow's method maps all unique values in a column to unique numeric values

vocabulary contains all that unique values

for numeric columns convert datatype of the values to float32 using

tf.feature_column.numeric_column(feature, dtype=tf.float32)

In [None]:
feature_columns = []
for key in train.keys():
  feature_columns.append(tf.feature_column.numeric_column(key=key))

print(feature_columns)

[NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]


**Input Function**


here also we need to create function object to feed data to model for that we can use python lambda function or define method as similar in last model.


In [None]:
#input function
def input_fn(features, labels, training=True, batch_size=256):
  #convert inputs to datasets
  dataset = tf.data.Dataset.from_tensor_slices((dict(features),labels))

  # while training shuffle the data then feed to model
  if training:
    dataset = dataset.shuffle(1000).repeat()

  return dataset.batch(batch_size)

**Building MOdel**

Now choose model from tensorflow's estimator module

*   DNNClassifier - Deep Neural Network
*   LinearClassifier

choice of model depends on the correspondence in data, for this iris dataset we may not able to find a linear correspondence, so DNNClassifier is good chioce



In [None]:
# build a DNN with 2 hidden layer with 30 and 10 hidden nodes each

classifier = tf.estimator.DNNClassifier(
    feature_columns=feature_columns,

    # two hidden layers of 30 and 10 nodes respectively.
    hidden_units=[30,10],

    # the model must choose between 3 classes.
    n_classes=3
)



Above is deep neural net with 2 hidden layers having 30 and 10 hidden nodes respectively. this is no of neurons the tensorflow official tutorial uses, no. of neurons is an arbitrary no. and many experiments and tests are usually done to determine the best choice for these values.

**Training of model**



In [None]:
classifier.train(
    input_fn=lambda: input_fn(train, y_train, training=True),
    steps=5000
)
# using lambda function is to avoid creating an inner function previously

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7d8ed51c4d00>

**Steps argument**
This simply tells classifier to run for 5000 steps. by modifying this and seeing results changing. but keeping it higher is not always good practise


**Evaluation**



In [None]:
eval_result = classifier.evaluate(
    input_fn = lambda: input_fn(test, y_test, training=False)
)
print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))


Test set accuracy: 0.900



**NOTE**

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting an TensorFlow1.x model to TensorFlow 2.x.

To change all layers to have dtype float64 by default, call 'tf.keras.backend.set_floatx('float64')


**Predictions**

We have trained model, to use it to make predictions.

In [None]:
def input_fn(features, batch_size=256):
    # Convert the inputs to a Dataset without labels.
    return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)

features = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']
predict = {}

print("Please type numeric values as prompted.")
for feature in features:
  valid = True
  while valid:
    val = input(feature + ": ")
    if not val.isdigit(): valid = False

  predict[feature] = [float(val)]

predictions = classifier.predict(input_fn=lambda: input_fn(predict))
for pred_dict in predictions:
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print('Prediction is "{}" ({:.1f}%)'.format(
        SPECIES[class_id], 100 * probability))


Please type numeric values as prompted.
SepalLength: 2.4
SepalWidth: 2.4
PetalLength: 2.4
PetalWidth: 2.4
Prediction is "Virginica" (50.9%)


In [None]:
print(predictions)

<generator object Estimator.predict at 0x7d8ebfdaec70>
