# Premade Estimators

**requirements**

    tensorflow==1.7.0

Geting the right version of tensorflow by

    $ pip install --upgrade tensorflow==1.7.0

In this notebook, we use the tensorflow high-level API:

  * [Estimator](https://www.tensorflow.org/programmers_guide/estimators), which represent a complete model. The Estimator API provides methods to train the model, to judge the model's accuracy, and to generate predictions.
    
  * [Database](https://www.tensorflow.org/get_started/datasets_quickstart), which build a data input pipeline. The Dataset API has methods to load and manipulate data, and feed it into your model. The Dataset API meshes well with the Estimators API.

To write a TensorFlow program based on pre-made Estimators, you must perform the following tasks:

  * Create one or more input functions.
  * Define the model's feature columns.
  * Instantiate an Estimator, specifying the feature columns and various hyperparameters.
  * Call one or more methods on the Estimator object, passing the appropriate input function as the source of the data.

## Preparation

In [1]:
import tensorflow as tf
import numpy as np
import pandas as pd

# load eample dataset
def input_evaluation_set():
    features = pd.DataFrame({'SepalLength': np.array([6.4, 5.0, 3.2, 6.5]),
                'SepalWidth':  np.array([2.8, 2.3, 5.3, 6.4]),
                'PetalLength': np.array([5.6, 3.3, 6.4, 3.3]),
                'PetalWidth':  np.array([2.2, 1.0, 6.4, 1.3])})
    labels = np.array([2, 1, 0, 2])
    return features, labels

def input_train_set():
    features = pd.DataFrame({'SepalLength': np.array([3.2, 5.3]),
                'SepalWidth':  np.array([3.8, 4.3]),
                'PetalLength': np.array([5.6, 3.3]),
                'PetalWidth':  np.array([2.2, 1.0])})
    labels = np.array([1, 2])
    return features, labels

train_X, train_y = input_train_set()
test_X, test_y = input_evaluation_set()

## Create input functions

An input function is a function that returns a `tf.data.Dataset` object which outputs the following two-element tuple:

  * feature - A python dict in which
    * Each key is the name of a feature.
    * Each value in an array containing all of that feature's values.
  
  * label - An array containing the values of the label for every example.

`tf.data.Dataset` is an Iterator Object.

In [2]:
# example of input function
def my_input_fn(features, labels, batch_size):
    """An input function for training"""
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle, repeat, and batch the examples.
    return dataset.shuffle(1000).repeat().batch(batch_size)

## Define the feature columns

A [feature column](https://developers.google.com/machine-learning/glossary/#feature_columns) is an object describing how the model should use raw input data from the features dictionary.  The [tf.feature_column](https://www.tensorflow.org/api_docs/python/tf/feature_column) module provides many options for representing data to the model.

In [3]:
# Feature columns describe how to use the input.
my_feature_columns = []
for key in train_X.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

## Instantiate an estimator

TensorFlow provides several pre-made classifier Estimators, including:

  * `tf.estimator.DNNClassifier` for deep models that perform multi-class classification.
  * `tf.estimator.DNNLinearCombinedClassifier` for wide & deep models.
  * `tf.estimator.LinearClassifier` for classifiers based on linear models
  
`tf.estimator.DNNClassifier` seems like the best choice. Here's how we instantiated this Estimator:

In [4]:
# Build a DNN with 2 hidden layers and 10 nodes in each hidden layer.
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    # Two hidden layers of 10 nodes each.
    hidden_units=[10, 10],
    # The model must choose between 3 classes.
    n_classes=3)



## Train, Evaluate, and Predict

In [5]:
# Train the model
classifier.train(
    input_fn=lambda:my_input_fn(train_X, train_y, 1),
    steps=10  #  tells the method to stop training after a number of training steps.
)

<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x117e98080>

In [None]:
# Evaluate the model
eval_y = classifier.evaluate(
    input_fn = lambda: my_input_fn(test_X, test_y, batch_size=1)
)
print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_y))

In [6]:
# predict
pred_y = classifier.predict(
    input_fn = lambda: my_input_fn(test_X, test_y, batch_size=1)
)

In [10]:
print(next(pred_y))

{'logits': array([-0.97360694, -1.5043912 ,  1.5658324 ], dtype=float32), 'probabilities': array([0.07012276, 0.04124224, 0.88863504], dtype=float32), 'class_ids': array([2]), 'classes': array([b'2'], dtype=object)}
