# Sales Prediction


Example code and text mostly lifted from  https://www.lynda.com/Google-TensorFlow-tutorials/


## Imports

In [1]:
# fullwidth notebook cells
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

In [2]:

# suppress tensorflow FutureWarning
import warnings
with warnings.catch_warnings():
    warnings.filterwarnings("ignore",category=FutureWarning)
    import h5py
import tensorflow as tf

# for reading & slicing data
import pandas as pd

# for data preprocessing
from sklearn.preprocessing import MinMaxScaler

  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)


***

## Loading Data

### training data

`sales_data_training.csv` : Dataset of video games sold by an imaginary video game retailer. 

> We'll use this data to train the neural network that will predict how much money we can expect future video games to earn based on our historical data.

In [3]:
training_data_path = "data/sales_data_training.csv"
training_data_df = pd.read_csv(training_data_path, dtype = float)

# get a sense of the features present in the data
training_data_df.head()

Unnamed: 0,critic_rating,is_action,is_exclusive_to_us,is_portable,is_role_playing,is_sequel,is_sports,suitable_for_kids,total_earnings,unit_price
0,3.5,1.0,0.0,1.0,0.0,1.0,0.0,0.0,132717.0,59.99
1,4.5,0.0,0.0,0.0,0.0,1.0,1.0,0.0,83407.0,49.99
2,3.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,62423.0,49.99
3,4.5,1.0,0.0,0.0,0.0,0.0,0.0,1.0,69889.0,39.99
4,4.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,161382.0,59.99


> We'll use TensorFlow to build a neural network that tries to predict the total earnings of a new game, based on the other characteristics.

*Separate the training data*
 - Split the training data into the inputs `X_training` and their known outputs `Y_training`

> The X group is data about each video game that we'll pass into the neural network, and the Y group are the values we want to predict.

In [4]:
# drop the total_earnings column from the X data (data to train with)
# axis = 1 drops the column
X_training = training_data_df.drop('total_earnings', axis = 1).values

# retain only the total_earnings column (value to predict) for the Y data
Y_training = training_data_df[['total_earnings']].values

### testing data

`sales_data_test.csv`: Another dataset of videogames.

> Load in and seperate the testing data using the same logic as above

In [5]:
testing_data_path = "data/sales_data_test.csv"
testing_data_df = pd.read_csv(testing_data_path, dtype = float)

X_testing = testing_data_df.drop('total_earnings', axis = 1).values
Y_testing = testing_data_df[['total_earnings']].values

> The machine learning system will only get to see the training data set during the training phase. Then we'll use this test data to check the accuracy of the predictions from our neural network.

***

## Preprocessing Data

> We need to pre-process our data. If the numbers in one column are large but the numbers in another column are small, the neural network training won't work very well. 

> In order to train the neural network, we want to scale all the numbers in each column of our data set to be between the value of 0 and 1.  One way we can do this is to use the MinMaxScaler object from the popular scikit-learn library.

In [6]:
# Create scalars for inputs and outputs
X_scaler = MinMaxScaler(feature_range = (0,1))
Y_scaler = MinMaxScaler(feature_range = (0,1))

`fit_transform()`: fit to our data, and then transform the data using that fit. 

> The scaler fits the data by multiplying it by a constant and adding a constant.

In [7]:
# Scale training input and output
X_scaled_training = X_scaler.fit_transform(X_training)
Y_scaled_training = Y_scaler.fit_transform(Y_training)

`transform()`: transform the data using the pre-computed fit
> We want to make sure the test data is scaled by the same amount as the training data. 

In [8]:
# Scale testing data using same scaler
X_scaled_testing = X_scaler.transform(X_testing)
Y_scaled_testing = Y_scaler.transform(Y_testing)

`scale_`: scalaing factor

`min_`: additive term

> The predictions made with the NN will be scaled. We'll need to transform back to our original units and scale.

In [9]:
msg = 'Note: Y values were scaled by multiplying by {:.10f} and adding {:.4f}'
print(msg.format(Y_scaler.scale_[0], Y_scaler.min_[0]))

Note: Y values were scaled by multiplying by 0.0000036968 and adding -0.1159


***

## Define a model

> Our training data set has nine input features, so we need nine inputs in our neural network. 
<br>We are only predicting a single value, so have only one output.

In [10]:
# Define how many inputs and outputs are in our neural network
number_of_inputs = 9
number_of_outputs = 1


> Let's have three layers in their neural network that will train to find the relationship between the inputs and the output. There are many different types of layers you can use in the neural network, but we're going to use the most straightforward type, a fully connected neural network layer. That means that every node in each layer is connected to every node in the following layer.

> The first layer will have 50 nodes, the second layer will have 100 nodes, and the third layer will have 50 nodes again. To me, these layer sizes seem like a good starting point, but it's just a guess. Once the neural network is coded we can test out different layer sizes to see what layer size gives us the best accuracy.

![title](img/sales_prediction_model_diagram.png)

In [11]:
# define how many neurons we want in each layer of our NN
layer_1_nodes = 50
layer_2_nodes = 100
layer_3_nodes = 50

> TODO: Explain the below cell

In [12]:
# Define model parameters
learning_rate = 0.001
training_epochs = 100
display_step = 5

***

### Define the layers of the NN

##### Input layer

`variable_scope()`: A context manager for defining operations that creates variables (layers)

> Normally in Python we organize our code by creating new functions. In TensorFlow we use variable scopes.

<br>
> Any variables we create within the scope "input" will automatically get a prefix of "input" to their name internally in TensorFlow.

> TensorFlow has the ability to generate diagrams of the computational graph. By putting our nodes into scopes it helps TensorFlow generate more useful diagrams that are easier to understand. Everything within the same scope will be grouped together within the diagram.

`placeholder()`: Inserts a placeholder for a tensor that will be always fed

`X`: The input to our NN, a placeholder.

> Our neural network should accept nine floating point numbers as the input for making predictions, but each time we want a new prediction the specific values we pass in will be different. We use a placeholder node to represent that.

<br>
> When we create a new node we need to tell it what type of tensor it will accept. The data we are passing into our network will be floating point numbers, `tf.float32`. 

<br>
`shape = (None, number_of_inputs)`: The shape of the tensor for the model to expect.
> 
 - `None` tells TensorFlow our neural network can mix up batches of any size
 - `number_of_inputs` tells it to expect nine values for each record in the batch.

In [13]:
# Input Layer
with tf.variable_scope('input'):
    X  = tf.placeholder(tf.float32, shape = (None, number_of_inputs))

#### Fully connected layers

> Each fully connected layer of the neural network has three parts.
 - A bias value for each node
 - A weight value for each connection between each node and the nodes in the previous layer.
 - An activation function that outputs the result of the layer.

##### Layer 1

`get_variable()`: Creates a new variable or gets an existing variable. 

##### biases
`biases`: store the bias values for each node
>  This will be a variable instead of a placeholder because we want TensorFlow to remember the value over time.

`shape = [layer_1_nodes]`

> There's one bias value for each node in this layer, so the shape should be the same as the number of nodes in the layer.

`initializer = tf.zeros_initializer()`

> We need to tell TensorFlow the initial value of this variable. We can tell TensorFlow how to set the initial value of a variable by passing it one of TensorFlow's built-in initializer functions. We want the bias values for each node to default to zero, so use `tf.zeros_initializer()`

##### weights

`weights`: Store the weights for this layer

`shape  = [number_of_inputs, layer_1_nodes]`
> We want to have one weight for each node's connection to each node in the previous layer.  We'll say shape equals an array, one side of the array will be number of inputs, and the other side will be layer_1_nodes.

`initializer = tf.contrib.layers.xavier_initializer()`
> With neural networks, a lot of research has gone into the best initial values to use for weights. A good choice is an algorithm called Xavier initialization. 

##### layer output
`tf.matmul()`: Multiplies two matrices

`tf.nn.relu()`: Recified linear unit https://en.wikipedia.org/wiki/Rectifier_(neural_networks)

`layer_1_output`
> The last part of defining this layer is multiplying the weights by the inputs and calling an activation function.

> We're going to use matrix multiplication and a standard rectified linear unit or relu activation function. 

> We multiply the inputs, X, by the weights in this layer. To that we'll add the biases. Then we wrap that with a call to `relu()`.

In [14]:
# Layer 1
with tf.variable_scope('layer_1'):
    
    biases = tf.get_variable(name = "biases1",
                             shape = [layer_1_nodes],
                             initializer = tf.zeros_initializer())
    
    weights = tf.get_variable(name = "weights1",
                              shape  = [number_of_inputs, layer_1_nodes],
                         initializer = tf.contrib.layers.xavier_initializer())


    layer_1_output = tf.nn.relu(tf.matmul(X, weights) + biases)

  return f(*args, **kwds)


##### Layer 2 
> Similar to layer 1
 - change names
 - change shapes
 - change `matmul()` to take in the previous layer

In [15]:
# Layer 2
with tf.variable_scope('layer_2'):
    
    biases = tf.get_variable(name = "biases2",
                             shape = [layer_2_nodes],
                             initializer = tf.zeros_initializer())
    
    weights = tf.get_variable(name = "weights2",
                              shape  = [layer_1_nodes, layer_2_nodes],
                         initializer = tf.contrib.layers.xavier_initializer())


    layer_2_output = tf.nn.relu(tf.matmul(layer_1_output, weights) + biases)

##### Layer 3
> Similar to layer 2
 - change names
 - change shapes
 - change `matmul()` to take in the previous layer

In [16]:
# Layer 3
with tf.variable_scope('layer_3'):
    
    biases = tf.get_variable(name = "biases3",
                             shape = [layer_3_nodes],
                             initializer = tf.zeros_initializer())
    
    weights = tf.get_variable(name = "weights3",
                              shape  = [layer_2_nodes, layer_3_nodes],
                         initializer = tf.contrib.layers.xavier_initializer())


    layer_3_output = tf.nn.relu(tf.matmul(layer_2_output, weights) + biases)

##### Output layer 
> Similar to layer 3
 - change names
 - change shapes - use number_of_outputs
 - change `matmul()` to take in the previous layer

In [17]:
# Output layer

with tf.variable_scope('output'):
    
    biases = tf.get_variable(name = "biases_out",
                             shape = [number_of_outputs],
                             initializer = tf.zeros_initializer())
    
    weights = tf.get_variable(name = "weights_out",
                              shape  = [layer_3_nodes, number_of_outputs],
                         initializer = tf.contrib.layers.xavier_initializer())


    prediction = tf.nn.relu(tf.matmul(layer_3_output, weights) + biases)

#### Define the cost function that the NN will use to measure predictions

`Y`: A node for the expected value that we'll feed in during training

> Just like the input values it will be a placeholder node because we'll feed in a new value each time.
 - For the shape in this case we'll pass in `(None, 1)` since there's just one single output.

`cost`
> A cost function, also called a lost function tells us how wrong the neural network is when trying to predict the correct output for a single piece of training data.

`squared_difference()`: Element wise (x-y)(x-y)
`reduce_mean()`: Mean of elements across tensor dimensions
> To measure the cost we'll calculate the mean squared error between what the neural network predicted and what we expected it to calculate. To do that we'll call the `squared_difference()` function and pass in the actual predication and the expected value.

> To turn it into a mean square difference, we want to get the average value of that difference. So we'll wrap that with a call to `reduce_mean()`. 

In [18]:
with tf.variable_scope('cost'):
    
    Y = tf.placeholder(tf.float32, shape = (None, 1))
    cost = tf.reduce_mean(tf.squared_difference(prediction, Y))

#### Define the optimizer
`AdamOptimizer`: Powerful standard optimizer.

>The very last step is to create an optimizer operation that TensorFlow can call to train the network.
> We just need to pass in the learning rate which we've already pre-defined above. 

`minimize(cost)`: Which variable we want it to minimize.
> Tells TensorFlow that whenever we tell it to execute the optimizer, it should run one iteration of the Adam optimizer in an attempt to make the cost value smaller.

In [19]:
with tf.variable_scope('train'):
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

### Enable logging (for TensorBoard)

> Being able to visualize your data is very helpful. However, you have to tell TensorFlow to log the values you want to visualize.

`summary.scalar()`
> In TensorFlow, we log values by creating special operations in our graph called summary operations. These operations take in the value and create log data in a format that TensorBoard can understand. Then, we pass that summary data to a TensorFlow file writer object to save it to disk.

`tf.summary.scalar('current_cost', cost)`: Represents the value we are logging
> We can run this node by calling session.run on it just like any other node in our graph. Running it will generate the log data in the right format.

`summary = tf.summary.merge_all()`: Merge all summaries, helper function.
> Sometimes you'll want to log many different metrics. It can be tedious to have to call session.run on every single metric so TensorFlow has a shortcut. 

> Automatically executes all the summary nodes in your graph without you having to explicitly list them all

In [20]:
with tf.variable_scope('logging'):
    tf.summary.scalar('current_cost', cost)
    summary = tf.summary.merge_all()

#### Saving the model

`train.Saver()`: To save our model after we train it.

In [21]:
saver = tf.train.Saver()

### Setup the model training loop

#### setup the session

`tf.Session()`: For running TF sessions. (`InteractiveSession()` for code run over multiple cells))
> Within a session, we can ask TensorFlow to execute commands by calling `session.run()` and then we can pass in the command we want TensorFlow to execute. Those can either be global commands that TensorFlow provides or specific nodes in our graph that we want to execute. 

`session.run(tf.global_variables_initializer())`
> The first command we always run is the built in command to tell TensorFlow to initialize all variables in our graph to their default values.

`session.run(optimizer)`
> To train our neural network, we'll run its optimizer function over and over in the loop, either a fixed number of times or until it hits an accuracy level we want.  Inside the loop we'll tell TensorFlow to execute a single training pass over the training data by calling the optimizer function.

> We can do this by calling `session.run()` and then pass in a reference to the operation that we want to call. In this case, that's the optimizer operation that we defined above. 

`feed_dict`: Dictionary contating the data the optimizer needs to run
> It needs the training data and the expected results for this training pass. In our computational graph, we have a placeholder node called `X` that accepts the training data and a placeholder node called `Y` that accepts the expected results. To feed values into a placeholder node, we can pass them in as a parameter called feed_dict.

`summary.FileWriter()`: Write summary data to files.
> To create the log files to save our data to. 
 - If you put multiple log files in the same top-level folder, TensorBoard will show them all together and let you flip between them.

<br>
> We run the session once for each epoch, logging data every 5. After training we compare a prediction with known data, and save the model for future use.

In [22]:
# initialize a session to run TF operations
with tf.Session() as session:
    
    # Run the global variable initilizer to init all variables and layers
    session.run(tf.global_variables_initializer())
    
    # Create log writers to record training progress
    # Store training and testing data separately
    training_writer = tf.summary.FileWriter('logs/training', session.graph)
    testing_writer = tf.summary.FileWriter('logs/testing', session.graph)
    
    
    # Run the optimizer over and over to train the network
    # One epoch is one full run through the training data set
    for epoch in range(training_epochs):
        
        # Feed in the training data and do one stepf NN training
        session.run(optimizer,
                    feed_dict = {X: X_scaled_training, Y: Y_scaled_training})

        # Every 5 steps, log our process
        if epoch % 5 == 0:
            
            training_feed = {X: X_scaled_training, Y: Y_scaled_training}
            training_cost, training_summary = session.run([cost, summary],
                                                    feed_dict = training_feed)
            
            testing_feed =  {X: X_scaled_testing, Y: Y_scaled_testing}
            testing_cost, testing_summary = session.run([cost, summary],
                                                     feed_dict = testing_feed)
            
            print(epoch, training_cost, testing_cost)
            
            # write the current status to the log files
            training_writer.add_summary(training_summary, epoch)
            testing_writer.add_summary(testing_summary, epoch)
            
    print('Training done')
    
    final_training_cost = session.run(cost,
                                      feed_dict = {X: X_scaled_training,
                                                   Y: Y_scaled_training})
    
    final_testing_cost = session.run(cost,
                                     feed_dict = {X: X_scaled_testing,
                                                  Y: Y_scaled_testing})

    print('Final Training Cost: {}'.format(final_training_cost))
    print('Final Testing Cost: {}'.format(final_testing_cost))
    
    
    # Now that the NN is trained, lets use it to make predictions.
    # pass in the X testing data and run the prediction operation
    Y_prediction_scaled = session.run(prediction,
                                      feed_dict = {X:X_scaled_testing})

    # Unscale the data back to its original units (dollar$)
    Y_predicted = Y_scaler.inverse_transform(Y_prediction_scaled)
    
    # actual earnings of 0th game
    real_earnings = testing_data_df['total_earnings'].values[0]
    
    # predicted_earnings of 0th game
    predicted_earnings = Y_predicted[0][0]
    
    print('The actual earnings of Game #1 were ${}'.format(real_earnings))
    
    msg = 'The predicted earnings of Game #1 were ${}'
    print(msg.format(predicted_earnings))
    
    model_save_location = "logs/trained_model.ckpt"
    save_path = saver.save(session, model_save_location)
    print('Model saved: {}'.format(save_path))


0 0.113659315 0.12273928
5 0.113659315 0.12273928
10 0.113659315 0.12273928
15 0.113659315 0.12273928
20 0.113659315 0.12273928
25 0.113659315 0.12273928
30 0.113659315 0.12273928
35 0.113659315 0.12273928
40 0.113659315 0.12273928
45 0.113659315 0.12273928
50 0.113659315 0.12273928
55 0.113659315 0.12273928
60 0.113659315 0.12273928
65 0.113659315 0.12273928
70 0.113659315 0.12273928
75 0.113659315 0.12273928
80 0.113659315 0.12273928
85 0.113659315 0.12273928
90 0.113659315 0.12273928
95 0.113659315 0.12273928
Training done
Final Training Cost: 0.1136593148112297
Final Testing Cost: 0.122739277780056
The actual earnings of Game #1 were $247537.0
The predicted earnings of Game #1 were $31355.0
Model saved: logs/trained_model.ckpt


> The logs can be viewed in tensorboard

in terminal:
`tensorboard --logdir =logs`

 - then open a new browser in the link provided: `localhost:6006`


![Current Cost](img/logged_current_cost_plot_ex.png)

<img src="img/graph.png" width="700"/>