### 1. Loading Data

The first step of training a machine learning algorithm is loading the training data. 
- Preload data into memory
    - The simplest method is to preload all your data into memory and pass it to TensorFlow as a single array
    - Simply __read your data file into an array__ to TensorFlow (size up to computer's available memory)
    - As long as the data ends up in a multidimensional array then you're good
        - Using pandas and preprocessing the data
    
    
- Feed data step by step
    - Write code that feeds your training data step-by-step into TensorFlow as TensorFlow requests it 
    - TensorFlow calls the data loader function whenever it needs the next chunk of data which gives you more control
    - Easier to process large datasets since it loads one chunk at a time
    - Have to write all of the code yourself
    
    
- Set up a custom data pipeline
    - This is the best option when you are working with enormous datasets like millions of images. A data pipeline allows TensorFlow to manage loading data into memory itself as it needs it
    - Data pipeline only loads data into memory in small chunks which means that it can work with large datasets
    - Requires writing TensorFlow-specific code
    - Big advantage of building a data pipeline is that you can take advantage of __parallel processing__ across multiple CPUs
    - Can have several threads running at the same time to load and preprocess data
    - Training process doesn't have to stop and wait while the next chunk of data is loaded for the next training pass

In [None]:
import os
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf

# Clear session
tf.keras.backend.clear_session()

# Turn off TensorFlow warning messages in program output
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# Load the training data
training_data_df = pd.read_csv('Data/sales_data_training.csv', dtype=float) # expect float-point numbers

# Get X and Y data
X_training = training_data_df.drop('total_earnings', axis=1).values # call .values to get back the result as an array
Y_training = training_data_df[['total_earnings']].values

# Load testing data
test_data_df = pd.read_csv('Data/sales_data_test.csv', dtype=float) # expect float-point numbers

# Get X and y test data
X_testing = test_data_df.drop('total_earnings', axis=1).values # .values for an array
Y_testing = test_data_df[['total_earnings']].values

''' 
  Scale the data (normalizing it to a range)
  
  The scaler scales the data by multiplying it by a constant number and adding a constant number.
'''

# Preprocess data
X_scaler = MinMaxScaler(feature_range=(0, 1))
Y_scaler = MinMaxScaler(feature_range=(0, 1))

# Scale the data using scaler
X_scaled_training = X_scaler.fit_transform(X_training)
Y_scaled_training = Y_scaler.fit_transform(Y_training)

# Scale testing
X_scaled_testing = X_scaler.transform(X_testing)
Y_scaled_testing = Y_scaler.transform(Y_testing)

# Print shape
print(X_scaled_testing.shape)
print(Y_scaled_testing.shape)

print('Note: Y values were scaled by multiplying by {:.10f} and adding {:4f}'.format(Y_scaler.scale_[0], Y_scaler.min_[0]))

## 2. Define model layers, initialize, predict, & save

Our neural network should accept nine floating point numbers as the input for making predictions. But each time we want a new prediction the specific values we pass in will be different. So we can use a placeholder node to represent that. 

In [None]:
################################################
####### Define the model
################################################

# Define model parameters
learning_rate = 0.001
learning_epochs = 100
display_step = 5

# Define how many inputs and outputs are in our neural network
number_of_inputs = 9
number_of_outputs = 1

# Define how many neurons we want in each layer of our neural network
layer_1_nodes = 50
layer_2_nodes = 100
layer_3_nodes = 50

# Define input layer (new variable scope)
# "None" tells TensorFlow our neural network can mix up batches of any size and number_of_inputs tells it to 
#  expect nine values for each record in the batch
with tf.variable_scope('input'):
    X = tf.placeholder(tf.float32, shape=(None, number_of_inputs))
    
# Layer 1
with tf.variable_scope('layer_1'):
    weights = tf.get_variable('weights1', shape=[number_of_inputs, layer_1_nodes], initializer=tf.contrib.layers.xavier_initializer())
    biases = tf.get_variable('biases1', shape=[layer_1_nodes], initializer=tf.zeros_initializer)
    layer_1_output = tf.nn.relu(tf.matmul(X, weights) + biases) # matrix multiplication and a standard rectified linear unit 

# Layer 2
with tf.variable_scope('layer_2'):
    weights = tf.get_variable('weights2', shape=[layer_1_nodes, layer_2_nodes], initializer=tf.contrib.layers.xavier_initializer())
    biases = tf.get_variable('biases2', shape=[layer_2_nodes], initializer=tf.zeros_initializer())
    layer_2_output = tf.nn.relu(tf.matmul(layer_1_output, weights) + biases)
    
# Layer 3
with tf.variable_scope('layer_3'):
    weights = tf.get_variable('weights3', shape=[layer_2_nodes, layer_3_nodes], initializer=tf.contrib.layers.xavier_initializer())
    biases = tf.get_variable('biases3', shape=[layer_3_nodes], initializer=tf.zeros_initializer())
    layer_3_output = tf.nn.relu(tf.matmul(layer_2_output, weights) + biases)
    
# Output layer   
with tf.variable_scope('output'):
    weights = tf.get_variable('weights4', shape=[layer_3_nodes, number_of_outputs], initializer=tf.contrib.layers.xavier_initializer())
    biases = tf.get_variable('biases4', shape=[number_of_outputs], initializer=tf.zeros_initializer())
    prediction = tf.matmul(layer_3_output, weights) + biases
    
# Define the cost function
with tf.variable_scope('cost'):
    Y = tf.placeholder(tf.float32, shape=(None, 1))
    cost = tf.reduce_mean(tf.squared_difference(prediction, Y))
    
# Define the optimizer function (train & optimize)
with tf.variable_scope('train'):
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

'''
    Logging Session
    
    Log our metrics so we are able to view it on TensorBoard. Add scaler objects that will represent the value
    we are logging (tf.summary.scaler()) and create a new node for the object. Use tf.summary.merge_all() for multiple,
'''    
    
# Create a summary operation to log the progress of the network (variable to hold the logs)
with tf.variable_scope('logging'):
    
    # Add a scaler object to represent the value we are logging
    tf.summary.scaler('current_cost', cost)
    
    # Execute all summary nodes in the graph withou explicitly lsiting them all
    summary = tf.summary.merge_all() # helper function

# Create a saver object
saver = tf.train.Saver()

################################################
####### Initialize Session
################################################

'''
    Training Session
    
    For session.run, the first command we always run is the built in command to tell TensorFlow to initialize 
    all variables in our graph to their default values. The command that's called tf.global_variables_initializer. 
    Now that all the variables in our graph are initialized, we're ready to create our training loop. 
'''

# Initialize a training session after defining the model
with tf.Session() as session:
    
    # Run the global variable initializer to initialize all variables/layers in the nn
    session.run(tf.global_variables_initializer()) # executes commands by calling session.run & pass in global initializer
    
    # Create log file writers to record the training progress - store train and test in different folders
    training_writer = tf.summary.FileWriter('./logs/training', session.graph)
    testing_writer = tf.summary.FileWriter('./logs/testing', session.graph)
    
    # Run the optimizer
    for epoch in range(learning_epochs):
        
        # Add training data
        session.run('optimizer', feed_dict={X: X_scaled_training, Y: Y_scaled_training}) # pass in operation 
        
        # Log the progress after every 5 epochs; add in logging variables & cost
        if epoch % 5 == 0:
            training_cost, training_summary = session.run([cost, summary], feed_dict={X: X_scaled_training, Y: Y_scaled_training})
            testing_cost, testing_summary = session.run([cost, summary], feed_dict={X: X_scaled_testing, Y: Y_scaled_testing})
            print(epoch, training_cost, testing_cost)
            
            # Write the current accuracy score by running the 'cost' operation
            training_writer.add_summary(training_summary, epoch) # writes cost per epoch to the graph
            testing_writer.add_summary(testing_summary, epoch)
            
        # Training complete
        print('Training is now complete.')
        
        final_training_cost = session.run(cost, feed_dict={X: X_scaled_training, Y: Y_scaled_training})
        final_testing_cost = session.run(cost, feed_dict={X: X_scaled_testing, Y: Yscaled_testing})
        
        print('Final Training Cost: {}'.format(final_training_cost))
        print('Final Testing Cost: {}'.format(final_testing_cost))

################################################
####### Prediction
################################################  
    
    # Create a new session for predictions; run the 'prediction' operation (output layer)
    Y_predicted_scaled = session.run(prediction, feed_dict={X: X_scaled_testing})
    
    # Rescale back to the original units
    Y_predicted = Y_scaler.inverse_transform(Y_predicted_scaled)
    
    # Compare the first value of the real & predicted earnings
    real_earnings = test_data_df['total_earnings'].values[0] # first value
    predicted_earnings = Y_predicted[0][0]
    
    print("The actual earnings of Game #1 were ${}".format(real_earnings))
    print("Our neural network predicted earnings of ${}".format(predicted_earnings))   
    
    # Save the model
    save_path = saver.save(session, 'logs/trained_model.ckpt')
    print('Model Saved: {}'.format(save_path))
    

In terminal: tensorboard --logdir=filepath/logs

## Total Steps

1. Get training data from CSV files
2. Get X & Y features
3. Preprocess the data using the training dataset using - MinMaxScaler()
4. Scale the training data using - fit_transform()
5. Scale the testing data using - transform()
6. Define the model structure
    1. The first layer will have 50 nodes
    2. The second layer will have 100 nodes
    3. The third layer will have 50 nodes
7. Define cost and optimizer functions
8. Define model logging variables for TensorBoard
9. Initialize a training session
10. Train the model
11. Predict using new data
12. Save the model object
13. Open Tensorboard (Optional) - tensorboard --logdir=filepath/logs

## Build Model for Production Deployment
### There's several ways we can use this model. 

1. First, if we want to initialize all the variables to their default values, we call this init operator. 
2. If we want to generate an output we can pass an input data and then call the output operation. 
3. If we want to train the network we can call the train operator. If we export this model to a file using the normal way of saving model checkpoint files, 

In [None]:
############################################################
########## Build Model for Production Deployment
############################################################
    
    # Create a new saved model builder object
    model_builder = tf.saved_model_builder.SavedModelBuilder('exported_model')
    
    '''
    First, we have a Python dictionary with a key called inputs. In this dictionary, we'll list each tensor 
    that needs to be filled in when their model is run. Our model takes in one tensor with nine values as input, 
    so that means our model will have one input.
    '''
    
    # Our model takes in one tensor with nine values as input that means our mode will have one input
    inputs = {
        'input': tf.saved_model.utils.build_tensor_info(X)
    }
    
    # Output is a single tensor with one value
    outputs = {
        'earnings': tf.saved_model.utils.build_tensor_info(prediction)
    }
    
    '''
    A signature def is sort of like a function or method declaration in the 
    programming language. We're telling TensorFlow that to run the model it should call a certain 
    function with certain parameters
    '''
    
    # Define signature def
    signature_def = tf.saved_model.signature_def_utils.build_signature_def(
        inputs=inputs,
        outputs=outputs,
        method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME
    )
    
    '''
    Use meta graph as the structure of the computational graph and the variables are the 
    values we set on each nodein the graph. This is telling TensorFlow that we want to export everything
    '''
    
    # Configure model builder on how the model is exported
    model_builder.add_meta_graph_and_variables(
        session,
        tags=[tf.saved_model.tag_constants.SERVING],
        signature_def_map={
            tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: signature_def
        }
    )
    
    # Save the model builder
    model_builder.save()