## The general outline of many TensorFlow programs:

    1) Import and parse the data sets.
    2) Create feature columns to describe the data.
    3) Select the type of model
    4) Train the model.
    5) Evaluate the model's effectiveness.
    6) Let the trained model make predictions.


### 1)  Import and parsing

Keras is an open-sourced machine learning library; tf.keras is a TensorFlow implementation of Keras. The premade_estimator.py program only accesses one tf.keras function; namely, the tf.keras.utils.get_file convenience function, which copies a remote CSV file to a local file system

In [1]:
import tensorflow as tf
import pandas as pd

  return f(*args, **kwds)


#### Getting a file from url and storing in cache

**tf.keras.utils.get_file**

**get_file**(fname, origin, untar=False, md5_hash=None, file_hash=None, cache_subdir='datasets', hash_algorithm='auto', extract=False, archive_format='auto', cache_dir=None) 

___
    Downloads a file from a URL if it not already in the cache.
    
    By default the file at the url `origin` is downloaded to the
    cache_dir `~/.keras`, placed in the cache_subdir `datasets`,
    and given the filename `fname`. The final location of a file
    `example.txt` would therefore be `~/.keras/datasets/example.txt`

In [2]:
TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv"
# Create a local copy of the training set.
train_path = tf.keras.utils.get_file(fname=TRAIN_URL.split('/')[-1],
                                         origin=TRAIN_URL)
print("Train path :", train_path)

CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth',
                    'PetalLength', 'PetalWidth', 'Species']
# Parse the local CSV file.
train = pd.read_csv(filepath_or_buffer=train_path,
                    names=CSV_COLUMN_NAMES,  # list of column names
                    header=0  # ignore the first row of the CSV file.
                    )
train[:5]

Train path : /home/sankaran/.keras/datasets/iris_training.csv


Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
0,6.4,2.8,5.6,2.2,2
1,5.0,2.3,3.3,1.0,1
2,4.9,2.5,4.5,1.7,2
3,4.9,3.1,1.5,0.1,0
4,5.7,3.8,1.7,0.3,0


### 2) Describe the data

**a) Separate the independent and dependent variables**

Assigns from **right to left**

In [3]:
    # 1. Assign the DataFrame's labels (the right-most column) to train_label.
    # 2. Delete (pop) the labels from the DataFrame.
    # 3. Assign the remainder of the DataFrame to train_features
    train_features, train_label = train, train.pop('Species')
    train_features[:5]

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
0,6.4,2.8,5.6,2.2
1,5.0,2.3,3.3,1.0
2,4.9,2.5,4.5,1.7
3,4.9,3.1,1.5,0.1
4,5.7,3.8,1.7,0.3


**b) Create feature columns** 

When you build an Estimator model, you pass it a list of feature columns that describes each of the features you want the model to use. The tf.feature_column module provides many options for representing data to the model.

For Iris, the 4 raw features are numeric values, so we'll build a list of feature columns to tell the Estimator model to represent each of the four features as 32-bit floating-point values

In [4]:
# Create feature columns for all features.
my_feature_columns = []
for key in train_features.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))
my_feature_columns

[_NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

### 3) Select model type

To specify a model type, instantiate an **Estimator** class. TensorFlow provides two categories of Estimators:

    1) pre-made Estimators, which someone else has already written for you.
    2) custom Estimators, which you must code yourself, at least partially.
    
TensorFlow provides several pre-made classifier Estimators, including:

    1) tf.estimator.DNNClassifier for deep models that perform multi-class classification.
    2) tf.estimator.DNNLinearCombinedClassifier for wide & deep models.
    3) tf.estimator.LinearClassifier for classifiers based on linear models.

    
Instantiating a tf.Estimator.DNNClassifier **creates a framework for learning the model**. It handles the details of initialization, logging, saving and restoring, and many other features so you can concentrate on your model. 


**Using pre-made estimator - DNNClassifier**

_init__(

    hidden_units,
    feature_columns,
    model_dir=None,
    n_classes=2,
    weight_column=None,
    label_vocabulary=None,
    optimizer='Adagrad',
    activation_fn=tf.nn.relu,
    dropout=None,
    input_layer_partitioner=None,
    config=None,
    warm_start_from=None,
    loss_reduction=losses.Reduction.SUM
)

> **hidden_units**: Iterable of number hidden units per layer. All layers are fully connected. Ex. [64, 32] means first layer has 64 nodes and second one has 32.

> **feature_columns** : An iterable containing all the feature columns used by the model. All items in the set should be instances of classes derived from _FeatureColumn.

> **optimizer**: An instance of tf.Optimizer used to train the model. **Defaults to Adagrad** optimizer.

> **activation_fn**: Activation function applied to each layer. If None, will use **tf.nn.relu**.

> **dropout**: When not None, the probability we will drop out a given coordinate.

In [18]:
    # Build 2 hidden layer DNN with 10, 10 units respectively.
    classifier = tf.estimator.DNNClassifier(
        feature_columns=my_feature_columns,
        # Two hidden layers of 10 nodes each.
        hidden_units=[10, 10],
        # The model must choose between 3 classes.
        n_classes=3)
    classifier.model_dir

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_master': '', '_tf_random_seed': None, '_num_ps_replicas': 0, '_log_step_count_steps': 100, '_save_checkpoints_secs': 600, '_model_dir': '/tmp/tmpg61wvmwu', '_task_type': 'worker', '_session_config': None, '_is_chief': True, '_save_summary_steps': 100, '_service': None, '_global_id_in_cluster': 0, '_keep_checkpoint_max': 5, '_task_id': 0, '_keep_checkpoint_every_n_hours': 10000, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fc2640963c8>, '_num_worker_replicas': 1, '_save_checkpoints_steps': None, '_evaluation_master': ''}


'/tmp/tmpg61wvmwu'

### 4) Train the model

Basically, we've wired a network but haven't yet let data flow through it. To train the neural network, call the Estimator object's train method.

**a) create an input function**

**Dataset** object is used as **input function** to Estimator models. An input function is a function that returns a **tf.data.Dataset** object which outputs the following two-element tuple:

    1) features - A Python dictionary in which:
        Each key is the name of a feature.
        Each value is an array containing all of that feature's values.
    2) label - An array containing the values of the label for every example.

just to demonstrate the format of the input function, here's a simple implementation:

```python
import numpy as np

def input_evaluation_set():
    features = {'SepalLength': np.array([6.4, 5.0]),
                'SepalWidth':  np.array([2.8, 2.3]),
                'PetalLength': np.array([5.6, 3.3]),
                'PetalWidth':  np.array([2.2, 1.0])}
    labels = np.array([2, 1])
    return features, labels
```

Your input function may generate the features dictionary and label list any way you like. However, it is recommended using TensorFlow's Dataset API, which can parse and manage all sorts of data

In [19]:
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((dict(train_features), train_label))
dataset = dataset.shuffle(1000).repeat().batch(10)
dataset

<BatchDataset shapes: ({PetalWidth: (?,), SepalWidth: (?,), SepalLength: (?,), PetalLength: (?,)}, (?,)), types: ({PetalWidth: tf.float64, SepalWidth: tf.float64, SepalLength: tf.float64, PetalLength: tf.float64}, tf.int64)>

**b) Call classifier.train**

Here we wrap up our input_fn call in a lambda to capture the arguments while providing an input function that takes no arguments, as expected by the Estimator. The steps argument tells the method to stop training after a number of training steps.

In [20]:
import code.iris_data as iris_data
# Train the Model.
classifier.train(
    input_fn=lambda:iris_data.train_input_fn(train_features, train_label, 10),
    steps=1000)


INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into /tmp/tmpg61wvmwu/model.ckpt.
INFO:tensorflow:step = 0, loss = 17.946484
INFO:tensorflow:global_step/sec: 833.627
INFO:tensorflow:step = 100, loss = 2.6058917 (0.120 sec)
INFO:tensorflow:global_step/sec: 1064.7
INFO:tensorflow:step = 200, loss = 0.78486764 (0.094 sec)
INFO:tensorflow:global_step/sec: 1085.52
INFO:tensorflow:step = 300, loss = 0.78320974 (0.092 sec)
INFO:tensorflow:global_step/sec: 1055.94
INFO:tensorflow:step = 400, loss = 1.0279917 (0.095 sec)
INFO:tensorflow:global_step/sec: 1044.49
INFO:tensorflow:step = 500, loss = 2.2164102 (0.095 sec)
INFO:tensorflow:global_step/sec: 1044.84
INFO:tensorflow:step = 600, loss = 0.48744252 (0.096 sec)
INFO:tensorflow:global_step/sec: 988.635
INFO:tensorflo

<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x7fc2640c0668>

### 5) Evaluate

Now that the model has been trained, we can get some statistics on its performance. The following code block evaluates the accuracy of the trained model on the test data

In [21]:
TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"
test_path = tf.keras.utils.get_file(TEST_URL.split('/')[-1], TEST_URL)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)
test_x, test_y = test, test.pop('Species')

# Evaluate the model.
eval_result = classifier.evaluate(
    input_fn=lambda:iris_data.eval_input_fn(test_x, test_y,1))


INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-03-05-08:20:17
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpg61wvmwu/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-03-05-08:20:17
INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.93333334, average_loss = 0.06811817, global_step = 1000, loss = 0.06811817


In [22]:
eval_result

{'accuracy': 0.93333334,
 'average_loss': 0.06811817,
 'global_step': 1000,
 'loss': 0.06811817}

### 6) Making predictions (inferring) from the trained model

The predict method returns a Python iterable, yielding a dictionary of prediction results for each example.

In [26]:
# Generate predictions from the model
expected = ['Setosa', 'Versicolor', 'Virginica']
predict_x = {
    'SepalLength': [5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1],
}

predictions = classifier.predict(
    input_fn=lambda:iris_data.eval_input_fn(predict_x,None,
                                            batch_size=1))
predictions

<generator object predict at 0x7fc250c153f0>

In [24]:
for p in predictions:
   print(p.keys())

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpg61wvmwu/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
dict_keys(['logits', 'class_ids', 'probabilities', 'classes'])
dict_keys(['logits', 'class_ids', 'probabilities', 'classes'])
dict_keys(['logits', 'class_ids', 'probabilities', 'classes'])


In [27]:
for pred_dict, expec in zip(predictions, expected):
    template = ('\nPrediction is "{}" ({:.1f}%), expected "{}"')

    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print(template.format(iris_data.SPECIES[class_id],
                          100 * probability, expec))


INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpg61wvmwu/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.

Prediction is "Setosa" (99.8%), expected "Setosa"

Prediction is "Versicolor" (99.5%), expected "Versicolor"

Prediction is "Virginica" (87.8%), expected "Virginica"
