# Deep Learning Fundamentals<br />
<h2><p style="color:darkred">3 - Estimators in TensorFlow</p></h2>

## TensorFlow Programming Environment

<center><img src="./Images/tensorflow_programming_environment_new.png"  /></center>

## Beyond TensorFlow
<center><img src="./Images/machine-learning-next-1.png"  /></center>

## Load data

Let us re-load the data!

In [1]:
import scipy.io
data = scipy.io.loadmat('./Data/credit_card_data.mat')
X = data['X']
y = data['y']

# preprocessing: replacing missing values
from sklearn import preprocessing
inputer = preprocessing.Imputer(strategy="median", verbose=2)
X = inputer.fit_transform(X)

# preprocessing: feature normalization
import sklearn.preprocessing
scaler = sklearn.preprocessing.MinMaxScaler()
X = scaler.fit_transform(X)

# model training: splitting dataset into training and test set
from sklearn import model_selection
(X_trn, X_tst, y_trn, y_tst) = model_selection.train_test_split(X, y, test_size=0.25, stratify=y)

In [2]:
# Additional casting step
import numpy as np
y_trn = y_trn.astype(np.int32)
y_tst = y_tst.astype(np.int32)

## Training an estimator with TensorFlow

### Handling the input pipeline

In [3]:
import tensorflow as tf
tf.reset_default_graph()

<center><h2>TensorFlow Dataset</h2></center>

<center><img src="./Images/dataset_classes.png"  /></center>

https://www.tensorflow.org/get_started/premade_estimators

<center><h2>Feature columns</h2></center>

<center><img src="./Images/inputs_to_model_bridge.jpg"  /></center>

https://www.tensorflow.org/get_started/feature_columns

In [4]:
# Define the training input function
def train_input_fcn(features, labels):
    dataset = tf.data.Dataset.from_tensor_slices(({"x": features}, labels))
    dataset = dataset.shuffle(1000).repeat().batch(32)
    return dataset.make_one_shot_iterator().get_next()

In [5]:
# Define feature columns
feature_columns = [tf.feature_column.numeric_column(key="x", shape=X_trn.shape[1])]

### Train the classifier

In [6]:
# Initialize classifier
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns, n_classes=2, hidden_units=[10], model_dir='tmp')

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'tmp', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x1a23fd8ef0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [12]:
# Train classifier
classifier.train(input_fn=lambda:train_input_fcn(X_trn, y_trn.astype(np.int32)), steps=1000)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from tmp/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1001 into tmp/model.ckpt.
INFO:tensorflow:loss = 16.65686, step = 1001
INFO:tensorflow:global_step/sec: 283.115
INFO:tensorflow:loss = 13.629836, step = 1101 (0.353 sec)
INFO:tensorflow:global_step/sec: 728.098
INFO:tensorflow:loss = 10.856381, step = 1201 (0.137 sec)
INFO:tensorflow:global_step/sec: 703.769
INFO:tensorflow:loss = 10.916653, step = 1301 (0.142 sec)
INFO:tensorflow:global_step/sec: 697.198
INFO:tensorflow:loss = 16.148895, step = 1401 (0.143 sec)
INFO:tensorflow:global_step/sec: 696.898
INFO:tensorflow:loss = 15.39549, step = 1501 (0.143 sec)
INFO:tensorflow:global_step/sec: 731.182
INFO:tensorflow:loss = 12.93605, step = 1601 (0.137 sec)

<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x1a23fe21d0>

**Additional**: launch the TensorBoard from the console with "tensorboard --logdir=tmp".

### Evaluate the model

In [8]:
# Define input function
def eval_input_fcn(features, labels):
    dataset = tf.data.Dataset.from_tensor_slices(({"x": features}, labels))
    dataset = dataset.batch(32) # Note the absence of "repeat"
    return dataset.make_one_shot_iterator().get_next()

In [9]:
print(classifier.evaluate(input_fn=lambda:eval_input_fcn(X_tst, y_tst)))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-05-01-08:00:09
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from tmp/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-05-01-08:00:11
INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.8124, accuracy_baseline = 0.7788, auc = 0.74362123, auc_precision_recall = 0.5211195, average_loss = 0.4523286, global_step = 1000, label/mean = 0.2212, loss = 14.436019, prediction/mean = 0.24366973
{'accuracy': 0.8124, 'accuracy_baseline': 0.7788, 'auc': 0.74362123, 'auc_precision_recall': 0.5211195, 'average_loss': 0.4523286, 'label/mean': 0.2212, 'loss': 14.436019, 'prediction/mean': 0.24366973, 'global_step': 1000}


### Get a prediction

In [10]:
# Define input function
def predict_input_fcn(features):
    dataset = tf.data.Dataset.from_tensor_slices(({"x": features}))
    dataset = dataset.batch(32)
    return dataset.make_one_shot_iterator().get_next()

In [11]:
print(list(classifier.predict(input_fn=lambda:predict_input_fcn(X_tst[0:10]))))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from tmp/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
[{'logits': array([-1.3834748], dtype=float32), 'logistic': array([0.2004515], dtype=float32), 'probabilities': array([0.7995485 , 0.20045151], dtype=float32), 'class_ids': array([0]), 'classes': array([b'0'], dtype=object)}, {'logits': array([-2.1329706], dtype=float32), 'logistic': array([0.10593331], dtype=float32), 'probabilities': array([0.89406663, 0.10593331], dtype=float32), 'class_ids': array([0]), 'classes': array([b'0'], dtype=object)}, {'logits': array([-0.69265795], dtype=float32), 'logistic': array([0.33344206], dtype=float32), 'probabilities': array([0.66655797, 0.33344206], dtype=float32), 'class_ids': array([0]), 'classes': array([b'0'], dtype=object)}, {'logits': array([-1.6186552], dtype=float32), 'logistic': array([