# Building Victory  Net

In this project, I am reading in matches from Destiny's Curcible -- the Player Versus Player arena.
The dataset consists of individual matches between team's Alpha and Bravo. The matches are zero-sum therefore in predicting victory -- the alphaVictory column of the data -- and there are only two outcomes. Alpha wins (0) or you losses (1). The complementary outcomes can be generated since bravoVictory = 1 - alphaVictory.

The data has already been preprocessed and flattened prior to building the network.

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import LabelBinarizer
import math
%matplotlib inline
import matplotlib.pyplot as plt

## Reading and Batching

In [2]:
from sklearn.cross_validation import train_test_split

In [3]:
def split_preprocessed_training_batch(batch_size, npdf, nplabels):
    """
    Load the Preprocessed Training data and return them in batches of <batch_size> or less
    """
    
#     df = pd.read_csv("indexByMatch_batchUdacity_205_CLEANED_NORM.csv", index_col = "Unnamed: 0")
#     df.drop("Unnamed: 0", axis=1, inplace=True)
#     labels = df["alphaVictory"]
#     df.drop("alphaVictory", axis=1, inplace=True)
#     #labels.head(n=10)
    
#     nplabels = labels.values
#     npdf = df.values


#     X_train = pd.read_csv("X_Train_indexByMatch_103.csv", index_col="Unnamed: 0")
#     y_train = pd.read_csv("y_train_indexByMatch_103.csv")

    # Return the training data in batches of size <batch_size> or less
    return batch_features_labels(npdf, nplabels, batch_size)

In [4]:
def batch_features_labels(features, labels, batch_size):
    """
    Split features and labels into batches
    """
    for start in range(0, len(features), batch_size):
        end = min(start + batch_size, len(features))
        yield features[start:end], labels[start:end]

In [5]:
# x, y = load_preprocessed_training_batch(batch_size=128)
# print(x.shape, y.shape)
# y.head()

In [6]:
def normalize(df):
    result = df.copy()
    for feature_name in df.columns:
        max_value = df[feature_name].max()
        min_value = df[feature_name].min()
        result[feature_name] = (df[feature_name] - min_value) / (max_value - min_value)
    return result

Note: The normalization has already been done at the earlier step in the pipeline. 

In [7]:
def OneHotEncode(x):
    
    lb = LabelBinarizer()
    lb.fit(np.array(range(0, 1)))
#    lb.classes_
    x = lb.transform(x)
    return x

I am not sure that I need the one hot encoding since the column that we're trying to predict -- alphaVictory -- is made up entirely of 1s and 0s. 

## Input

The neural network needs to read the match data, one-hot encoded labels, and dropout keep probability. Implement the following functions
* Implement neural_net_image_input
  * Return a TF Placeholder
  * Set the shape using image_shape with batch size set to None.
  * Name the TensorFlow placeholder "x" using the TensorFlow name parameter in the TF Placeholder.
* Implement neural_net_label_input
  * Return a TF Placeholder
  * Set the shape using n_classes with batch size set to None.
  * Name the TensorFlow placeholder "y" using the TensorFlow name parameter in the TF Placeholder.
* Implement neural_net_keep_prob_input
  * Return a TF Placeholder for dropout keep probability.
  * Name the TensorFlow placeholder "keep_prob" using the TensorFlow name parameter in the TF Placeholder.
* Implement learningRate
  * return a TF Placeholder
  * name the TF placeholder "learningRate" using the name parameter in the TF Placeholder.

In [8]:
def neural_net_match_input(input_shape):
    """
    Return a Tensor for a batch of match input
    : image_shape: Shape of the the data for a given match -- flattened
    : return: Tensor for match input.
    """
    # TODO: Implement Function
    x = tf.placeholder(tf.float32, shape=[None, input_shape], name="x")
    return x


def neural_net_label_input(n_classes):
    """
    Return a Tensor for a batch of label input
    : n_classes: Number of classes
    : return: Tensor for label input.
    """
    # TODO: Implement Function
    y = tf.placeholder(tf.float32, shape=[None, n_classes], name="y")
    return y


def neural_net_keep_prob_input():
    """
    Return a Tensor for keep probability
    : return: Tensor for keep probability.
    """
    # TODO: Implement Function
    keep_prob = tf.placeholder(tf.float32, shape=None, name="keep_prob")
    return keep_prob

def learningRate():
    """
    Return a Tensor for learning Rate
    : return: Tensor for learning Rate.
    """
    # TODO: Implement Function
    learningRate = tf.placeholder(tf.float32, shape=None, name="learningRate")
    return learningRate

## Building the network

#### Output Layer

In [9]:
def output(x_tensor, num_outputs):
    
    """
    Apply an output layer to x_tensor using weight and bias
    : x_tensor: A 2-D tensor where the first dimension is batch size.
    : num_outputs: The number of output that the new tensor should be.
    : return: A 2-D tensor where the second dimension is num_outputs.
    """
    # TODO: Implement Function
    input_dims = x_tensor.get_shape().as_list()
    #print(input_dims)
    filter_shape = [input_dims[1], num_outputs]
    #print(filter_shape)
    
    std = math.sqrt( 2/ (filter_shape[0] * filter_shape[1]) ) #sqrt(2/(filterHeight*filterWidth #*filterDepth))    
    
    weight = tf.Variable(tf.truncated_normal(filter_shape, stddev=std))
    #print(weight)
    bias = tf.Variable(tf.zeros(num_outputs))
    #print(bias)
        
    fc2 = tf.nn.xw_plus_b(x_tensor, weights=weight, biases=bias)
    return fc2

#### Fully connected Layers

In [10]:
def fully_conn(x_tensor, num_outputs):
    
    fc1 = tf.nn.relu(output(x_tensor, num_outputs))
    
    return fc1

## Creating the Multi_Layer_Perceptron

In [11]:
def MLP(x, keep_prob):
    
    flattened = x
    
    # TODO: Apply 1, 2, or 3 Fully Connected Layers
    #    Play around with different number of outputs
    # Function Definition from Above:
    #   fully_conn(x_tensor, num_outputs)
    
    fc1 = fully_conn(flattened, num_outputs=1024)
    drop1 = tf.nn.dropout(fc1, keep_prob)
    fc2 = fully_conn(drop1, num_outputs=512)
    drop2 = tf.nn.dropout(fc2, keep_prob)
    fc3 = fully_conn(drop2, num_outputs=128)
    drop3 = tf.nn.dropout(fc3, keep_prob)
    fc4 = fully_conn(drop3, num_outputs=64)
    drop4 = tf.nn.dropout(fc4, keep_prob)
    fc5 = fully_conn(drop4, num_outputs=32)
    
    # TODO: Apply an Output Layer
    #    Set this to the number of classes
    # Function Definition from Above:
    #   output(x_tensor, num_outputs)
    
    logits = output(fc5, num_outputs=1)
    
    # TODO: return output
    return logits

## Build the Neural Network

In [25]:
##############################
## Build the Neural Network ##
##############################

# Remove previous weights, bias, inputs, etc..
tf.reset_default_graph()

# Inputs
x = neural_net_match_input(input_shape = 204)
y = neural_net_label_input(n_classes = 1)
keep_prob = neural_net_keep_prob_input()
#learningRate = learningRate()

# Model
logits = MLP(x, keep_prob)

# Name logits Tensor, so that is can be loaded from disk after training
logits = tf.identity(logits, name='logits')

# Loss and Optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))

#Adagrad optimizer.
#adaGradOpt = tf.train.AdagradOptimizer(learning_rate = learningRate, name=adaGradOpt).minimize(cost)

optimizer = tf.train.AdamOptimizer().minimize(cost)

# Accuracy
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name='accuracy')

#tests.test_conv_net(conv_net)

## Bringing in the data

In [24]:
df = pd.read_csv("indexByMatch_batchUdacity_205_CLEANED_NORM.csv", index_col="Unnamed: 0")
#df.drop("Unnamed: 0", axis=1, inplace=True)
labels = df["alphaVictory"]
df.drop("alphaVictory", axis=1, inplace=True)
#labels.head(n=10)

nplabels = labels.values
npdf = df.values

X_train, X_test, y_train, y_test = train_test_split(npdf, nplabels, test_size=0.2, random_state=444)

newX_train = np.reshape(X_train, (-1, 204))
newy_train = np.reshape(y_train, (-1, 1))
newX_test = np.reshape(X_test, (-1, 204))
newy_test = np.reshape(y_test, (-1, 1))

In [26]:
print(npdf.shape)
print(nplabels.shape)
print(newX_train.shape)
print(newy_train.shape)
print(newX_test.shape)
print(newy_test.shape)

(124236, 204)
(124236,)
(99388, 204)
(99388, 1)
(24848, 204)
(24848, 1)


## Training the network

In [27]:
def train_neural_network(session, optimizer, keep_probability, feature_batch, label_batch):
    """
    Optimize the session on a batch of images and labels
    : session: Current TensorFlow session
    : optimizer: TensorFlow optimizer function
    : keep_probability: keep probability
    : feature_batch: Batch of Numpy image data
    : label_batch: Batch of Numpy label data
    """
    # TODO: Implement Function
    
    session.run(optimizer, feed_dict={x:feature_batch, y:label_batch, keep_prob:keep_probability})
    
    return


# Showing stats

Note: newX_test and newy_test are global variables. They're the testing set and are instantiated when the data is read into the program and split into training/testing sets. 

In [28]:
def print_stats(session, feature_batch, label_batch, cost, accuracy):
    """
    Print information about loss and validation accuracy
    : session: Current TensorFlow session
    : feature_batch: Batch of Numpy image data
    : label_batch: Batch of Numpy label data
    : cost: TensorFlow cost function
    : accuracy: TensorFlow accuracy function
    """
    
    # TODO: Implement Function

    loss = session.run(cost, feed_dict={x: feature_batch, y:label_batch,  keep_prob: 1.00})
    acc_ans = session.run(accuracy, feed_dict={x:newX_test, y:newy_test, keep_prob: 1.00})
    print("")
    print("cost/loss:  ", loss)
    print("accuracy:   ", acc_ans)
       
    pass

## Hyperparameters

* Set epochs to the number of iterations until the network stops learning or start overfitting
* Set batch_size to the highest number that your machine has memory for. Most people set them to common sizes of memory:
  * 64
  * 128
  * 256
  * ...
* Set keep_probability to the probability of keeping a node using dropout

In [29]:
# TODO: Tune Parameters
epochs = 10
batch_size = 128
keep_probability = 1.00
#learningRate = 1e-6

# Training on single (and smaller) testing batch

Instead of training the neural network on all the matches of data, let's use a single small test batch. This should save time while you iterate on the model to get a better accuracy. Once the final validation accuracy is 50% or greater, run the model on all the data in the next section.

In [30]:
print('Checking the Training on the testing batch...')

with tf.Session() as sess:
    # Initializing the variables
    sess.run(tf.global_variables_initializer())
    
    # Training cycle
    for epoch in range(epochs):
        batch_i = "Full dataset"
        for batch_features, batch_labels in split_preprocessed_training_batch(batch_size, newX_train, newy_train):
            train_neural_network(sess, optimizer, keep_probability, batch_features, batch_labels)
        print('Epoch {:>2}, Destiny Matches Batch {}:  '.format(epoch + 1, batch_i), end='')
        print_stats(sess, batch_features, batch_labels, cost, accuracy)

Checking the Training on the testing batch...
Epoch  1, Destiny Matches Batch Full dataset:  
cost/loss:   0.0
accuracy:    1.0
Epoch  2, Destiny Matches Batch Full dataset:  
cost/loss:   0.0
accuracy:    1.0
Epoch  3, Destiny Matches Batch Full dataset:  
cost/loss:   0.0
accuracy:    1.0
Epoch  4, Destiny Matches Batch Full dataset:  
cost/loss:   0.0
accuracy:    1.0
Epoch  5, Destiny Matches Batch Full dataset:  
cost/loss:   0.0
accuracy:    1.0
Epoch  6, Destiny Matches Batch Full dataset:  
cost/loss:   0.0
accuracy:    1.0
Epoch  7, Destiny Matches Batch Full dataset:  
cost/loss:   0.0
accuracy:    1.0
Epoch  8, Destiny Matches Batch Full dataset:  
cost/loss:   0.0
accuracy:    1.0
Epoch  9, Destiny Matches Batch Full dataset:  
cost/loss:   0.0
accuracy:    1.0
Epoch 10, Destiny Matches Batch Full dataset:  
cost/loss:   0.0
accuracy:    1.0
