# Training Models

### Training and evaluating the model
1. Get dataset
2. Scale dataset
3. Train the neural network model
4. Evaluate the model
5. Test on new data (predictions)
6. Saving and loading the models for future use

### 2. Train the model

In [None]:
import pandas as pd 
from keras.models import Sequential
from keras.layers import *

# Get data
train = pd.read_csv('Data/sales_data_training_scaled.csv')
test = pd.read_csv('Data/sales_data_test_scaled.csv')

# Split train data into X and y input arrays
X = train.drop('total earnings', axis=1).values
y = train[['total earnings']].values

# Create a new sequential nn (keras sequential API)
model = Sequential()

# Add in dense layers and input nodes
model.add(Dense(50, input_dim=9, activation='relu'))
model.add(Dense(100, activation='relu')) # second layer
model.add(Dense(50, activation='relu')) # third layer
model.add(Dense(1, activation='linear')) # output layer, predicts 1 value

# Compile the model
model.compile(loss='mean_squared_error', optimize='adam')

# Train the model
model.fit(X,             # training features
          y,             # expected output
          epochs=50,     # training passes (epoch)
          shuffle=True,  # works best when randomly shuffled
          verbose=2)     # print more detailed info

# Split test data into X and y input arrays
X_test = test.drop('total_earnings', axis=1).values
y_test = test[['total_earnings']].values

# Measure the error rate of the testing data
test_error_rate = model.evaluate(X_test, y_test, verbose=0) # pass in verbose 0

# Print error rate
print('The mean squared error (MSE) for the test data set is: {}'.format(test_error_rate))

### 3. Making predictions

Use a training model to make predictions for new data. Following the next steps of training, we can add in the following code.

In [None]:
# Load the new data
X_new = pd.read_csv('Data/proposed_new_product.csv').values

# Make predictions with the neural network
# Keras always assumes that we are going to ask for multiple predictions and multiple output values in each prediction
# Keras model always returns predictions in a 2D array
prediction = model.predict(X_new)

# Get the first element of the first prediction
first_prediction = prediction[0][0]

# Reverse the scaling to get the actual number (dollars)
# The constants are min and max of the dataset so we will use those numbers
first_prediction = first_prediction + 0.1159
first_prediction = first_prediction / 0.000003698

# Print prediction *$265k
print('Earnings Prediction for the First Proposed Product - ${}'.format(first_prediction))

Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data.

It's a binary file format designed for storing Python array data.

### 4. Saving and loading models

In [None]:
# Save the model to disk (structure of neural network and trained weights)
model.save('trained_model.h5') # pass in file name and store it in a hdf5 format

print('Model is saved to disk as trained_model.h5')

In [None]:
# Load the model
model.load_model('trained_model.h5')