# TensorFlow with Estimators

As we saw previously how to build a full Multi-Layer Perceptron model with full Sessions in Tensorflow. Unfortunately this was an extremely involved process. However developers have created Estimators that have an easier to use workflow!

It is much easier to use, but you sacrifice some level of customization of your model. Let's go ahead and explore it!

These estimator objects allows us to quickly create models without needing to manually define the Graph as we did in last notebook.

When working with MNIST dataset we had to manually define the graph and session, which is not how it's done when developers do it.

## Estimator Steps :
    1. Read in data (normalize if necessary)
    2. Train/Test split the data, just like SkLearn
    3. Create Estimator Feature Columns (a list of specialized feature columns)
    4. Create Input Estimator function (organizing training data)
    5. Train Estimator model
    6. Predict with new Test Input Function

We'll go over performing all of the above steps with TF estimator objects.

## Get the Data

We will the iris data set.

Let's get the data:

In [1]:
import pandas as pd
df = pd.read_csv("iris.csv")

In [2]:
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0


In [3]:
df.columns = ['sepal_length','sepal_width','petal_length','petal_width','target']
# Reassigning column names, as it produces an error due to spaces in TF estimators

In [4]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0


In [5]:
X = df.drop('target',axis=1)
y = df['target'].apply(int) # Making target column as integer, other values can be floating points without any issues.
# Classes are in an organized manner right now, we would have to shuffle during training, else there may be test won't 
# make sense.

## Train Test Split

In [6]:
from sklearn.model_selection import train_test_split

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Estimators

Let's show you how to use the simpler Estimator interface!

In [8]:
import tensorflow as tf

## Feature Columns

In [9]:
X.columns

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='object')

In [10]:
feat_cols = []

for col in X.columns:
    feat_cols.append(tf.feature_column.numeric_column(col))

In [11]:
feat_cols # Specialized numeric column objects where there is a key, and key syncs up with column name inside the Pd DF.

[NumericColumn(key='sepal_length', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='sepal_width', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='petal_length', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='petal_width', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

## Input Function

In [44]:
# there is also a pandas_input_fn we'll see in the exercise!!
# We create 2 input function, one for training and another for evaluation, both are similar with train,test data being different
# input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train,batch_size=10,num_epochs=5,shuffle=True)
input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train,batch_size=20,num_epochs=5,shuffle=True)

- batch_size can be chosen to serve data to NN in batches, to avoid sending all data at once as that network may just crash when trying to compute the gradient. Play around with it, if getting error like empty or None predictions in TensorFlow.

- Number of epochs, epoch is when we have gone through all of our training data one time. What above means that num_epoch is basically if we have gone over every single training point atleast 5 times then we are done training tf.estimator.

- shuffle to have random data as target labels here are sorted in order, we already did a shuffle above when we used train_test_split by default, but doesn't hurt.

In [45]:
# classifier = tf.estimator.DNNClassifier(hidden_units=[10, 20, 10], n_classes=3,feature_columns=feat_cols)
classifier = tf.estimator.DNNClassifier(hidden_units=[10, 20, 10,15,20,25,30], n_classes=3,feature_columns=feat_cols)

# Creating the estimator, DNN = Deep Neural Network

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\tmpn0z75spy', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000002A76638B8D0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


- Hidden units, Define a list, where each number in the list defines a layer and number of neurons in that layer. [10, 20, 10] means that there is input layer, then hidden layer 0 with 10 neurons, after that hidden layer 1 with 20 neurons, after that is hidden layer 2 with 10 neurons followed by output layer. 

- n_classes, Tells how many classes are expected, as there are 3 species we set this to 3.

- Feature columns, list of numeric columns we created.

In [46]:
classifier.train(input_fn=input_func,steps=50) # Define how many steps you want to train for 
# Training the estimator

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\ADMINI~1\AppData\Local\Temp\tmpn0z75spy\model.ckpt.
INFO:tensorflow:loss = 23.804474, step = 0
INFO:tensorflow:Saving checkpoints for 27 into C:\Users\ADMINI~1\AppData\Local\Temp\tmpn0z75spy\model.ckpt.
INFO:tensorflow:Loss for final step: 2.072151.


<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifier at 0x2a76638b400>

Crucial Steps in the above process are :

- Creating the feature columns list
- Creating the input function
- Creating the estimator object
- Training the estimator

It might look a lot of these are single liners, but there is a lot going on behind the scenes, like creation of session and graph that we previously created manually in the last notebook. This now, makes the process much simpler.

Once, the above steps are done classifier is trained we are in a very similar position as calling .fit on SciKit-Learn model.

## Model Evaluation

**Use the predict method from the classifier model to create predictions from X_test**

In [47]:
pred_fn = tf.estimator.inputs.pandas_input_fn(x=X_test,batch_size=len(X_test),shuffle=False)

- Creating an input function similar to what we made above, here we pass in test data and do not pass y.

- We do not pass y_test as we compare this against y_test, using error matrix.

- We feed in all the test in one go, as we do not train here but just run the inputs of X_test and see the result, as our classifier has already been trained on train data, and see what prediction is given with X_test as input.

- Shuffle is false as there is no logical need to shuffle anything, anymore.

In [48]:
note_predictions = list(classifier.predict(input_fn=pred_fn))
# We create predictions in a list, where we have classifier.predict with input_fn is prediction function we created.
# Classifier is just a generator and needs to be typecast into a list to see the results.
# Calls the mode, creates the graphs and finishes up running the classifier on pred_fn as input.

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\ADMINI~1\AppData\Local\Temp\tmpn0z75spy\model.ckpt-27
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


In [49]:
note_predictions # We get a list of dictionaries.
#Gives class_ids the class it actually predicted, along with probabilities for each class and other properties.
# See how in probability it has chances of being in class 0, 1 or 2
# There are logits as well, if you want to look at it from a more mathematical POV.

[{'logits': array([-3.6043298,  2.7369003,  2.5462687], dtype=float32),
  'probabilities': array([0.00096386, 0.54698634, 0.45204976], dtype=float32),
  'class_ids': array([1], dtype=int64),
  'classes': array([b'1'], dtype=object)},
 {'logits': array([ 3.0936365, -1.7291112, -3.0589406], dtype=float32),
  'probabilities': array([0.98992985, 0.00796364, 0.00210656], dtype=float32),
  'class_ids': array([0], dtype=int64),
  'classes': array([b'0'], dtype=object)},
 {'logits': array([-2.276264 ,  1.7506222,  1.5753479], dtype=float32),
  'probabilities': array([0.00960109, 0.53848654, 0.45191237], dtype=float32),
  'class_ids': array([1], dtype=int64),
  'classes': array([b'1'], dtype=object)},
 {'logits': array([-2.4964437,  1.9128118,  1.7357835], dtype=float32),
  'probabilities': array([0.00657554, 0.5405638 , 0.45286062], dtype=float32),
  'class_ids': array([1], dtype=int64),
  'classes': array([b'1'], dtype=object)},
 {'logits': array([ 3.0629327, -1.70687  , -3.0233636], dtype=fl

In [50]:
note_predictions[0]

{'logits': array([-3.6043298,  2.7369003,  2.5462687], dtype=float32),
 'probabilities': array([0.00096386, 0.54698634, 0.45204976], dtype=float32),
 'class_ids': array([1], dtype=int64),
 'classes': array([b'1'], dtype=object)}

In [51]:
final_preds  = []
for pred in note_predictions:
    final_preds.append(pred['class_ids'][0])

**Now create a classification report and a Confusion Matrix. Does anything stand out to you?**

In [23]:
from sklearn.metrics import classification_report,confusion_matrix

In [24]:
print(confusion_matrix(y_test,final_preds))
# Below is the result for batch_size=10,hidden_units=[10, 20, 10]

[[10  0  0]
 [ 0 10  6]
 [ 0  0 19]]


In [26]:
print(classification_report(y_test,final_preds))
#Try playing with batch size and hidden units upon training, maybe crazy values to see if overfits or not and much more. and
# see how it affects your final result.
# Below is the result for batch_size=10,hidden_units=[10, 20, 10]

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.62      0.77        16
           2       0.76      1.00      0.86        19

   micro avg       0.87      0.87      0.87        45
   macro avg       0.92      0.88      0.88        45
weighted avg       0.90      0.87      0.86        45



In [34]:
# Batch size same, training layers increased
print(confusion_matrix(y_test,final_preds))
print("\n")
print(classification_report(y_test,final_preds))
#Try playing with batch size and hidden units upon training, maybe crazy values to see if overfits or not and much more. and
# see how it affects your final result.
# Below is the result for batch_size=10 ,hidden_units= [10, 20, 10,15,20,25,30]

# CONCLUSION : INCREASED ACCURACY over batch_size = 10 and hidden_units = [10,20,10]. MOST ACCURATE OUT OF ALL.

[[10  0  0]
 [ 0 14  2]
 [ 0  0 19]]


              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.88      0.93        16
           2       0.90      1.00      0.95        19

   micro avg       0.96      0.96      0.96        45
   macro avg       0.97      0.96      0.96        45
weighted avg       0.96      0.96      0.96        45



In [43]:
# Increasing batch size to double 20, and keeping hidden layers as 10,20,10
print(confusion_matrix(y_test,final_preds))
print("\n")
print(classification_report(y_test,final_preds))
#Try playing with batch size and hidden units upon training, maybe crazy values to see if overfits or not and much more. and
# see how it affects your final result.
# Below is the result for batch_size=20 ,hidden_units= [10,20,10]

# CONCLUSION : DECREASE IN ACCURACY

[[10  0  0]
 [ 0 16  0]
 [ 0 18  1]]


              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       0.47      1.00      0.64        16
           2       1.00      0.05      0.10        19

   micro avg       0.60      0.60      0.60        45
   macro avg       0.82      0.68      0.58        45
weighted avg       0.81      0.60      0.49        45



In [52]:
# Increasing batch size to double 20, and keeping hidden layers as [10,20,10,15,20,25,30]
print(confusion_matrix(y_test,final_preds))
print("\n")
print(classification_report(y_test,final_preds))
#Try playing with batch size and hidden units upon training, maybe crazy values to see if overfits or not and much more. and
# see how it affects your final result.
# Below is the result for batch_size=20 ,hidden_units= [10,20,10,15,20,25,30]

# CONCLUSION : POOR ACCURACY.

[[10  0  0]
 [ 0 16  0]
 [ 0 19  0]]


              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       0.46      1.00      0.63        16
           2       0.00      0.00      0.00        19

   micro avg       0.58      0.58      0.58        45
   macro avg       0.49      0.67      0.54        45
weighted avg       0.38      0.58      0.45        45

