### Tensorflow Estimators ###

A high level Tensorflow API that greatly simplify ML programming. It encapsulates following things

1. Training
2. Evaluating
3. Prediction
4. Export for Serving

#### Structure of a Pre-Made Estimators Programs ####

It typically consists of following 4 steps

1. Convert CSV data into Tensorflow Records
2. Define the Feature columns
3. Create an relevant Algorithm
4. Call a Training, Evaluation and Inference Method
5. Export a serving function

### Import necessary libraries ###

In [17]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import pandas as pd
import tensorflow as tf

### Load the dataset ###

In [18]:
training_data = pd.read_csv('FinalData/finaldata.csv')
test_data = pd.read_csv('FinalData/Testfinaldata.csv')

train_filename = 'FinalData/finaldata.csv'
test_filename = 'FinalData/TestFinaldata.csv'


print("Training Dataset Shape:{}".format(training_data.shape))
print("Test Dataset Shape:{}".format(test_data.shape))

Training Dataset Shape:(12776, 28)
Test Dataset Shape:(1530, 26)


### csv Input function ###

In [23]:
def csv_input_fn(features, labels, batch_size):
    
    #converts the inputs to dataset
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
    #shuffle
    dataset =  dataset.shuffle(1000).repeat().batch(batch_size)
    
    return dataset

def eval_input_fn(features, labels, batch_size):
    
    #input function for validation
    features = dict(features)
    
    if labels is None:
        #no labels only features
        inputs = features
    else:
        inputs = (features, labels)
        
    #convert inputs into dataset
    dataset = tf.data.Dataset.from_tensor_slices(inputs)
    
    assert batch_size is not None, "Batch Size must not be None"
    dataset = dataset.batch(batch_size)
    
    return dataset

### Build an Estimator ###

In [26]:
#convert into train, test
from sklearn.cross_validation import train_test_split

#IMPORT COLUMNS
important_columns = ['AC', 'AF', 'AR', 'AS', 'AST', 'AY', 'HC', 'HF',
       'HR', 'HS', 'HST', 'HTAG', 'HTHG', 'HY','ht_label', 'at_label', 'league_label', 
        'HTCT', 'ATCT','HTWP', 'ATWP']


#get input and output features
X_all = training_data[important_columns]
y_all = training_data['ftr_label']

# Shuffle and split the dataset into training and testing set.
X_train, X_test, y_train, y_test = train_test_split(X_all, y_all, 
                                                    test_size = 100,
                                                    random_state = 2,
                                                    stratify = y_all)

In [27]:
#feature columns
my_feature_columns = []

for key in important_columns:
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

In [37]:
#classifier
classifier = tf.estimator.DNNClassifier(
    feature_columns = my_feature_columns,
    #three hidden layers of 30 nodes each
    hidden_units = [30, 30, 10],
    #model must choose between 3 classes
    n_classes=3
)

#train the model
training_result = classifier.train(
    input_fn = lambda: csv_input_fn(X_train,y_train, 16),
    steps = 6000
)


#evaluate the model
eval_result = classifier.evaluate(
    input_fn = lambda: eval_input_fn(X_test, y_test, 8)
)

print("Test Accuracy:{accuracy: 0.3f}\n".format(**eval_result))

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_train_distribute': None, '_task_id': 0, '_keep_checkpoint_every_n_hours': 10000, '_is_chief': True, '_save_checkpoints_secs': 600, '_model_dir': 'C:\\Users\\MADHIV~1\\AppData\\Local\\Temp\\tmp4hit4amy', '_service': None, '_master': '', '_evaluation_master': '', '_global_id_in_cluster': 0, '_save_checkpoints_steps': None, '_keep_checkpoint_max': 5, '_log_step_count_steps': 100, '_device_fn': None, '_num_worker_replicas': 1, '_task_type': 'worker', '_save_summary_steps': 100, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000235B50CCD30>, '_num_ps_replicas': 0, '_tf_random_seed': None, '_session_config': None}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 

Wow!!!! Test Accuracy is **70%**. Now I can say this model is Minimum Viable Model. 

### Serve an export function ###

To serve predictions from this model, we need to export this model and save it in a folder.

In [38]:
def serve_input_fn():
    #create an future place holder
    feature_placeholder = {}
    
    for cols in important_columns:
        feature_placeholder.update({cols: tf.placeholder(tf.float32, [None])})
    
    features = {
        key: tf.expand_dims(tensor, -1)
        for key, tensor in feature_placeholder.items()
    }
    
    return tf.estimator.export.ServingInputReceiver(features, feature_placeholder)

In [49]:
expected = ['AwayTeam', 'Draw', 'HomeTeam']


predict_x = {
    'AC':[5.0, 7.0, 1.0, 4.0],
    'AF':[17.0, 16.0, 12.0, 15.0],
    'AR':[0.0, 0.0, 0.0, 0.0],
    'AS':[19.0, 13.0, 9.0, 11.0],
    'AST':[4.0, 1.0, 2.0, 2.0],
    'AY':[2.0, 3.0, 1.0, 3.0],
    'HC':[4.0, 3.0, 5.0, 6.0],
    'HF':[13.0, 18.0, 18.0, 17.0],
    'HR':[0.0, 0.0, 0.0, 0.0],
    'HS':[13.0, 11.0, 10.0, 14.0],
    'HST':[8.0, 5.0, 3.0, 3.0],
    'HTAG':[0.0, 0.0, 0.0, 0.0],
    'HTHG':[2.0, 1.0, 0.0, 0.0],
    'HY':[1.0, 4.0, 2.0, 1.0],
    'ht_label':[1.0, 6.0, 8.0, 9.0],
    'at_label':[2.0, 3.0, 1.0, 3.0],
    'league_label':[0.0, 0.0, 0.0, 0.0],
    'HTCT':[1.0, 1.0, 0.0, 0.0],
    'ATCT':[0.0, 0.0, 0.0, 0.0],
    'HTWP':[0.21, 0.56, 0.31, 0.40],
    'ATWP':[0.53, 0.23, 0.31, 0.71],
}

predictions = classifier.predict(
    input_fn = lambda : eval_input_fn(predict_x, labels=None, batch_size=4)
)

template = ('\n Prediction is "{}" ({:1f}%)')

for pred, expec in zip(predictions, expected):
    class_id = pred['class_ids'][0]
    probability = pred['probabilities'][class_id]
    
    print(template.format(expected[class_id], 100 * probability))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\MADHIV~1\AppData\Local\Temp\tmp4hit4amy\model.ckpt-6000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.

 Prediction is "HomeTeam" (81.725407%)

 Prediction is "HomeTeam" (80.895269%)

 Prediction is "Draw" (42.565513%)


So, our model predicted the Probability of winning Team. For first, HomeTeam has high **HTWP(Home Team Wining Probability)** which is correct! For last data point, the model predicted **Draw** even though **ATWP** is high!If we want to improve the accuracy further, we can come up with new derived feature called **Team Possession**. Since, we have only **Half Time** goal data, to predict next half match goal Possession attribute helps us to predictBecause, possession represents in what rate the ball is within the team itself. So, if possession rate is higher we can compare with Team winning probability and able to decide either the team wins the match in second half.If you find this attribute  useful, please try this feature, train the above model, share the accuracy results.