<a href="https://colab.research.google.com/github/Joekr-HaHa/Tensorflow-Classification-Model/blob/main/TensorflowClassification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Classification**

Instead of getting our model to predict percentages or numeric values, we will instead get it to predict a class given a certain data point.

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import pandas as pd

We'll be using a flower dataset

In [None]:
CSV_COLUMN_NAMES=['SepalLength','SepalWidth','PetalLength','PetalWidth','Species']
SPECIES=['Setosa','Versicolor','Virginica']

In [None]:
train_path=tf.keras.utils.get_file("iris_training.csv","https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path=tf.keras.utils.get_file("iris_test.csv","https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")

train=pd.read_csv(train_path, names=CSV_COLUMN_NAMES,header=0)
test=pd.read_csv(test_path,names=CSV_COLUMN_NAMES,header=0)
#using keras to grab our datasets and read them into a pandas dataframe

See what dataframe looks like

In [None]:
train.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
0,6.4,2.8,5.6,2.2,2
1,5.0,2.3,3.3,1.0,1
2,4.9,2.5,4.5,1.7,2
3,4.9,3.1,1.5,0.1,0
4,5.7,3.8,1.7,0.3,0


Now pop off the species column and use that as our label

In [None]:
train_y=train.pop('Species')
test_y=test.pop('Species')
train.head() #species column gone now

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
0,6.4,2.8,5.6,2.2
1,5.0,2.3,3.3,1.0
2,4.9,2.5,4.5,1.7
3,4.9,3.1,1.5,0.1
4,5.7,3.8,1.7,0.3


## Input Function

Easier here as no epochs and the data is numeric so no conversion needs to be done.

In [None]:
def input_fn(features,labels,training=True,batch_size=256):
  #convert the inputs to a dataset
  dataset=tf.data.Dataset.from_tensor_slices((dict(features),labels))

  #shuffle and repeat if in training mode
  if training:
    dataset=dataset.shuffle(1000).repeat()
  
  return dataset.batch(batch_size)

# Feature Columns

In [None]:
feature_columns=[]
for key in train.keys():
  feature_columns.append(tf.feature_column.numeric_column(key=key))
print(feature_columns)

[NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]


# Building a Deep Neural Network


In [None]:
#build a DNN with 2 hidden layers with 30 and 10 hidden nodes each
classifier=tf.estimator.DNNClassifier(
    feature_columns=feature_columns,
    #two hidden layers of 30 and 10
    hidden_units=[30,10],
    #model must choose between three classes
    n_classes=3
)


# Training

Remember, an input function needs to return a function object when training the model.
Here we use the input function we defined above, then return it as a function object, using lambda.

In [None]:
classifier.train(
  input_fn=lambda:input_fn(train,train_y,training=True),
  steps=5000)

# Evaluating

Evaluate the classifier, passing test dataset, with the test y values to see if it can correctly classify.
The pred_dict dictionary has elements probabilities in which it has an array of the probabilities of which class it is, the value class_id returns the position of the class with the highest percentage, we then find the class in that position in our SPECIES array and return it with the corresponding percentage.

In [None]:
evaluation=classifier.evaluate(input_fn=lambda:input_fn(test,test_y,training=False))
print("\nTest set accuracy: {accuracy:0.3f}\n".format(**evaluation))

In [None]:
def input_fn(features,batch_size=256):
  #convert the inputs to a dataset without labels
  return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)

features=['SepalLength','SepalWidth','PetalLength','PetalWidth']
predict={}

print("Type numeric values as prompted")
for feature in features:
  valid=True
  while valid:
    val=input(feature+": ")
    if not val.isdigit():valid=False
  
  predict[feature]=[float(val)]

predictions=classifier.predict(input_fn=lambda: input_fn(predict))
for pred_dict in predictions:
  print(pred_dict)
  class_id=pred_dict['class_ids'][0]
  probability=pred_dict['probabilities'][class_id]

  print('Prediction is "{}" ({:.1f}%)'.format(SPECIES[class_id],100*probability))

Type numeric values as prompted
SepalLength: 6.9
SepalWidth: 3.1
PetalLength: 5.4
PetalWidth: 2.1
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp8r00pc8s/model.ckpt-5000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
{'logits': array([-3.936913 , -1.5887439, -1.1499507], dtype=float32), 'probabilities': array([0.03610365, 0.3778749 , 0.5860214 ], dtype=float32), 'class_ids': array([2]), 'classes': array([b'2'], dtype=object), 'all_class_ids': array([0, 1, 2], dtype=int32), 'all_classes': array([b'0', b'1', b'2'], dtype=object)}
Prediction is "Virginica" (58.6%)
