## part 1 - iris data tutorial

https://www.tensorflow.org/get_started/premade_estimators

In [9]:
import tensorflow as tf 
import numpy as np
import pandas as pd

ARCHITECTURE: 

```
High level TF APIs (Estimators)
^
Mid level TF APIs (Layers, Datasets, and Metrics)
^
Low level TF APIs (Python, Go, etc.)
^
TF Kernel  = TF Distributed Execution engine 
```

#### "What is a tensor?" - 

it's an n-dimensional array - mostly tensors contain ints or floats. only tensors are passed through the nodes in the TF computation graph, hence tensorflow. 

Can write a tensorflow program by using the *estimators API* (training and evaluating data), and the *datasets API* (helps you build a data input pipeline and load it into the estimator)

### Irises  example

task = build a model to classify data about iris flowers into 3 species based on their "sepals" (green part) and petals

how = the input data has 4 "features" per flower: the length/width of both petals and sepals.

there are species `0` (Setosa), `1` (versi), and `2` (virginia).


### classifier algorithm 

deep neural network with: 
- 2 hidden layers
- 10 nodes per layer 
- a flower's four features go into the NN...
- each of the 4 features feed into all 10 nodes of the "first" layer 
- every node in the first layer then talks to every node in the second layer 
- then every node in the second layer agrees on a probability that the flower is type `0`, `1`, or `2`. all probs adding up to 1 of course
- each of these probabilities ^ is a prediction. 

### using a premade estimator 

you need to...
1. create 1+ "input functions"
2. define the model's feature columns 
3. instantiate the estimator and pass in the feature columns 
4. train, evaluate, predict!

### 0. Load training data from csv 

In [19]:
TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv"
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth',
                    'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

y_name='Species'
train_path = tf.keras.utils.get_file(TRAIN_URL.split('/')[-1], TRAIN_URL)
train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
train_x, train_y = train, train.pop(y_name)

In [20]:
train_x[:5]

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
0,6.4,2.8,5.6,2.2
1,5.0,2.3,3.3,1.0
2,4.9,2.5,4.5,1.7
3,4.9,3.1,1.5,0.1
4,5.7,3.8,1.7,0.3


In [22]:
train_y[:5] # y labels each x where x is the 4 features for a training iris 

0    2
1    1
2    2
3    0
4    0
Name: Species, dtype: int64

### 1. Input Functions 

Input functions = any functions that create tf.Datasets! 
A `tf.Dataset` must contain a dict of `features` and a `label`. For instance, if the label is "spam" or "not spam," features might include sender name, and subject line. 

For the iris data, this function builds the `tf.Dataset`

In [5]:
def train_input_fn(features, labels, batch_size):
    """An input function for training"""
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle, repeat, and batch the examples.
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)

    # Return the dataset.
    return dataset

### 2. Feature Columns

a feature column corresponds to one feature and it tells the Estimator how to use a feature.

in the iris example, with 4 features, we'll have 4 feature columns and tell the Estimator to use each as a 32-bit floating-point value. 

In [23]:
my_feature_columns = []
for key in train_x.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

In [24]:
my_feature_columns

[_NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

### 3. Instantiate the Estimator

In [25]:
# instantiate a Deep Neural Network (DNN) estimator with 2 layers, 10 nodes each...

In [26]:
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    # Two hidden layers of 10 nodes each.
    hidden_units=[10, 10], # every elt in hidden_units is a layer. with "N" nodes. 
    # The model must choose between 3 classes.
    n_classes=3)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/hc/1t72w11d7tng3tbp88nqd6280000gn/T/tmpyir0blu3', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x11dc37518>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


### 4. Train, Evaluate, Predict!

So the classifier (estimator object) has our feature columns - it knows what features to expect. And it knows how the model is structured. 

But we're still holding the raw training data -- so we pass X (features) and Y (labels) into our input function, and build the tf-readable Dataset. 

The model trains on the Dataset. 

In [29]:
classifier.train(
    input_fn=lambda:train_input_fn(train_x, train_y,
                                             200),
    steps=1000)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/hc/1t72w11d7tng3tbp88nqd6280000gn/T/tmpyir0blu3/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1001 into /var/folders/hc/1t72w11d7tng3tbp88nqd6280000gn/T/tmpyir0blu3/model.ckpt.
INFO:tensorflow:loss = 11.569528, step = 1001
INFO:tensorflow:global_step/sec: 446.146
INFO:tensorflow:loss = 11.185089, step = 1101 (0.226 sec)
INFO:tensorflow:global_step/sec: 601.518
INFO:tensorflow:loss = 12.393882, step = 1201 (0.166 sec)
INFO:tensorflow:global_step/sec: 610.143
INFO:tensorflow:loss = 9.095687, step = 1301 (0.164 sec)
INFO:tensorflow:global_step/sec: 626.186
INFO:tensorflow:loss = 11.570629, step = 1401 (0.160 sec)
INFO:tensorflow:global_step/sec: 607.829
INFO:tensorflow:loss = 11.474016, step =

<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x11dc37550>

### Test the model on the test data (evaluate)

In [38]:
TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"
test_path = tf.keras.utils.get_file(TEST_URL.split('/')[-1], TEST_URL)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)
test_x, test_y = test, test.pop(y_name)

In [35]:
def eval_input_fn(features, labels, batch_size):
    features=dict(features)
    if labels is None:
        # No labels, use only features.
        inputs = features
    else:
        inputs = (features, labels)

    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices(inputs)

    # Batch the examples
    assert batch_size is not None, "batch_size must not be None"
    dataset = dataset.batch(batch_size)

    # Return the dataset.
    return dataset


In [37]:
eval_result = classifier.evaluate(
    input_fn=lambda:eval_input_fn(test_x, test_y, 100))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-04-22-22:58:40
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/hc/1t72w11d7tng3tbp88nqd6280000gn/T/tmpyir0blu3/model.ckpt-2000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-04-22-22:58:40
INFO:tensorflow:Saving dict for global step 2000: accuracy = 0.96666664, average_loss = 0.054136667, global_step = 2000, loss = 1.6241

Test set accuracy: 0.967



### predict new (unlabeled) data 

give the model 4 features, no label... this example has 3 unlabeld flowers to predict.

In [59]:
# Generate predictions from the model
expected = ['Setosa', 'Versicolor', 'Virginica']
predict_x = {
    'SepalLength': [5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1],
}

predictions = classifier.predict(
    input_fn=lambda:eval_input_fn(predict_x, None, batch_size=100)) # pass in no labels!
predictions

<generator object Estimator.predict at 0x11e035ca8>

In [58]:
for pred_dict, expec in zip(predictions, expected):
    template = ('\nPrediction is "{}" ({:.1f}%), expected "{}"')

    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print(template.format(SPECIES[class_id],
                          100 * probability, expec))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/hc/1t72w11d7tng3tbp88nqd6280000gn/T/tmpyir0blu3/model.ckpt-2000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.

Prediction is "Setosa" (99.8%), expected "Setosa"

Prediction is "Versicolor" (99.9%), expected "Versicolor"

Prediction is "Virginica" (98.8%), expected "Virginica"
