# Keras Tutorial

## Author: Samyakh Tukra

Every deep learning model follows a similary format: 1. pre process the data (this is where you split the data in to train and test) 2. Build the model of you NN, 3. Compile the model for training 4. Compile the model for testing and finally 5. Deploy the model (evaluate/inference)

So here I will go through a popular high level front end API Keras which will use tensorflow as the backend NN computing engine.

In [None]:
# first step is to always pre process the data! (I'll be using the sales data I got from Kaggle! 2007)
# I'll try to use a NN to predict the total earnings of a new game based of this training data.

''' However problem with my data is that,  the values stay from a range of - all the way up to large numbers (it is a mixture) in the
total earnings column hence it is important to scale the data  from range 0-1. NNS are trained best when data in each column is all scaled to the
same range!'''

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Load training data set from CSV file
training_data_df = pd.read_csv("sales_data_training.csv") # passing in the file name/path

# Load testing data set from CSV file
test_data_df = pd.read_csv("sales_data_test.csv")

# Data needs to be scaled to a small range like 0 to 1 for the neural
# network to work well.
scaler = MinMaxScaler(feature_range=(0, 1)) # use minMaxscaler object to scale the data within the range of 0 and 1

# Scale both the training inputs and outputs
scaled_training = scaler.fit_transform(training_data_df) # scaler.fit_transform will actually fit the scaled data and transform it
scaled_testing = scaler.transform(test_data_df) # This hHAS TO BE SCALED BY THE SAME AMOUNT AS OUR TRAINING DATA! so simply use transform (not fit transform)

# Print out the adjustment that the scaler applied to the total_earnings column of data
print("Note: total_earnings values were scaled by multiplying by {:.10f} and adding {:.6f}".format(scaler.scale_[8], scaler.min_[8]))

# To save our copies of the csv datafiles we just scaled:

# Create new pandas DataFrame objects from the scaled data
scaled_training_df = pd.DataFrame(scaled_training, columns=training_data_df.columns.values)
scaled_testing_df = pd.DataFrame(scaled_testing, columns=test_data_df.columns.values)

# Save scaled data dataframes to new CSV files
scaled_training_df.to_csv("sales_data_training_scaled.csv", index=False)
scaled_testing_df.to_csv("sales_data_testing_scaled.csv", index=False)



## Now its time to code the actual NN model! since the data is preprocessed:

In [None]:
import pandas as pd
from keras.models import Sequential
from keras.layers import *

training_data_df = pd.read_csv("sales_data_training_scaled.csv")

X = training_data_df.drop('total_earnings', axis=1).values
Y = training_data_df[['total_earnings']].values

# Define the model
model = Sequential()
model.add(Dense(50, input_dim=9, activation='relu')) # for the first (input layer) you always have to pass in the input size of our NN
# since there are 9 feature columns we are passing in! I need 9 input nodes
model.add(Dense(100, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(1, activation='linear'))
# last step
model.compile(loss="mean_squared_error", optimizer="adam")

'''The final output of our Neural Network should be a single number that represents the amount of money we predict a single game
will earn. So the last layer of our Neural Network needs to have exactly one output node.'''

In [None]:
# Train the model
# TO train we use the model.fit function.... so we pass in important feaures like the 
# Training data and the training output (X and Y), no of epochs (training passes through the entire data), shuffle the data randomly
# verbose = tells keras to print more detailed information during training so we can watch what's going on
model.fit(
    X,
    Y,
    epochs=50,
    shuffle=True,
    verbose=2
)

# Load the separate test data set
test_data_df = pd.read_csv("sales_data_test_scaled.csv")

X_test = test_data_df.drop('total_earnings', axis=1).values
Y_test = test_data_df[['total_earnings']].values

# To test the model... we use model.evaluate!

test_error_rate = model.evaluate(X_test, Y_test, verbose=0) # using the testing parameters we uploaded above... just put those in
print("The mean squared error (MSE) for the test data set is: {}".format(test_error_rate))

In [None]:
# Now let's predict the data:
X = pd.read_csv("proposed_new_product.csv").values # this has details about a proposed product (hypothetical video game)

# Make a prediction with the neural network
prediction = model.predict(X)
'''There's just one trick to watch out for here. Keras always assumes that we are going to ask for multiple predictions with 
multiple output values in each prediction. So it always returns predictions as a 2D array.Since we just care about the first 
value for the first prediction, in this case, we can just grab element zero, zero. That will give us the actual prediction.'''

# Grab just the first element of the first prediction (since that's the only have one)
prediction = prediction[0][0]

''' Also, remember that the training data we used to train the neural network was scaled so that all the values were in the
zero to one range. That means that the neural networks predictions we'll also be in the zero to one range. To see them back 
in the original units, we have to reverse that scaling.'''

# Re-scale the data from the 0-to-1 range back to dollars
# These constants are from when the data was originally scaled down to the 0-to-1 range
prediction = prediction + 0.1159
prediction = prediction / 0.0000036968

print("Earnings Prediction for Proposed Product - ${}".format(prediction))


In [None]:
# Saving a neural Network:

# Save the model to disk
model.save("trained_model.h5")
print("Model saved to disk.")



# Loading a saved model:
model = load_model('trained_model.h5')

X = pd.read_csv("proposed_new_product.csv").values
prediction = model.predict(X)

# Grab just the first element of the first prediction (since we only have one)
prediction = prediction[0][0]

# Re-scale the data from the 0-to-1 range back to dollars
# These constants are from when the data was originally scaled down to the 0-to-1 range
prediction = prediction + 0.1159
prediction = prediction / 0.0000036968

print("Earnings Prediction for Proposed Product - ${}".format(prediction))

