The goal of this notebook is to summarize the best models from each algo, and save to evalute other datasets

In [1]:
import numpy as np
import pandas as pd
from time import gmtime, strftime, time
import pickle

from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split, ShuffleSplit, GridSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import RobustScaler


from sklearn.metrics import make_scorer, mean_squared_error, accuracy_score 

from sklearn.linear_model import Lasso
from sklearn.ensemble import RandomForestRegressor
import xgboost as xgb

import keras
from keras import models
from keras.utils.np_utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Activation
from keras.wrappers.scikit_learn import KerasRegressor

import matplotlib.pyplot as plt
import seaborn as sns

from Extract_Data import extract_data

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


Before running the extract_data function, don't forget to update the data with the latest at finance.yahoo.com

In [2]:
SPY_pca, SPY_PCA_df, SPY_df = extract_data('SPY', None)
SPY_PCA_df_rand = shuffle(SPY_PCA_df, random_state = 0)
SPY_df_rand = shuffle(SPY_df, random_state = 0)



In [36]:
SPY_PCA_model_pkl_filename = 'Models/SPY_PCA_model.pkl'
with open(SPY_PCA_model_pkl_filename, 'wb') as file:  
    pickle.dump(SPY_pca, file)

Getting the datasets ready, separating out the features and labels. Please note that we have 4 datasets:
1. PCA
2. PCA with random order
3. regular dataframe
4. regular dataframe with random order

In [4]:
SPY_PCA_rand_X = SPY_PCA_df_rand[['Dimension 1', 'Dimension 2', 'Dimension 3', 'Dimension 4',
                                  'Dimension 5', 'Dimension 6', 'Dimension 7']]
SPY_PCA_rand_y = SPY_PCA_df_rand[['Adj Close 1day', 'Adj Close 5day',
                                  'Adj Close 1day pct_change', 'Adj Close 5day pct_change',
                                  'Adj Close 1day pct_change cls', 'Adj Close 5day pct_change cls']]

SPY_PCA_X = SPY_PCA_df[['Dimension 1', 'Dimension 2', 'Dimension 3', 'Dimension 4',
                        'Dimension 5', 'Dimension 6', 'Dimension 7']]
SPY_PCA_y = SPY_PCA_df[['Adj Close 1day', 'Adj Close 5day',
                        'Adj Close 1day pct_change', 'Adj Close 5day pct_change',
                        'Adj Close 1day pct_change cls', 'Adj Close 5day pct_change cls']]

SPY_df_rand_X = SPY_df_rand[['Open', 'High', 'Low', 'Adj Close', 'Volume', 'Range', 'MA5 Adj Close', 'MA5 Volume',
                             'MA5 Adj Close pct_change', 'MA5 Volume pct_change']]
SPY_df_rand_y = SPY_df_rand[['Adj Close 1day', 'Adj Close 5day', 'Adj Close 1day pct_change',
                             'Adj Close 5day pct_change', 'Adj Close 1day pct_change cls',
                             'Adj Close 5day pct_change cls']]

SPY_df_X = SPY_df[['Open', 'High', 'Low', 'Adj Close', 'Volume', 'Range', 'MA5 Adj Close', 'MA5 Volume',
                   'MA5 Adj Close pct_change', 'MA5 Volume pct_change']]
SPY_df_y = SPY_df[['Adj Close 1day', 'Adj Close 5day', 'Adj Close 1day pct_change',
                   'Adj Close 5day pct_change', 'Adj Close 1day pct_change cls',
                   'Adj Close 5day pct_change cls']]

Let's take a quick look of the datasets

In [6]:
#split into train, test, validation sets
SPY_PCA_rand_Xtrain, SPY_PCA_rand_Xtest, SPY_PCA_rand_ytrain, SPY_PCA_rand_ytest = train_test_split(SPY_PCA_rand_X, SPY_PCA_rand_y, test_size = 0.2)

n_split = int(len(SPY_df_y) * 0.8)
SPY_PCA_Xtrain, SPY_PCA_ytrain = np.array(SPY_PCA_X)[:n_split, :], np.array(SPY_PCA_y)[:n_split] 
SPY_PCA_Xtest, SPY_PCA_ytest = np.array(SPY_PCA_X)[n_split:, :], np.array(SPY_PCA_y)[n_split:]

SPY_df_rand_Xtrain, SPY_df_rand_Xtest, SPY_df_rand_ytrain, SPY_df_rand_ytest = train_test_split(SPY_df_rand_X, SPY_df_rand_y, test_size = 0.2)

SPY_df_Xtrain, SPY_df_ytrain = np.array(SPY_df_X)[:n_split, :], np.array(SPY_df_y)[:n_split] 
SPY_df_Xtest, SPY_df_ytest = np.array(SPY_df_X)[n_split:, :], np.array(SPY_df_y)[n_split:]

In [7]:
print(SPY_PCA_rand_Xtrain.shape, SPY_PCA_rand_Xtest.shape, SPY_PCA_rand_ytrain.shape, SPY_PCA_rand_ytest.shape)
print(SPY_PCA_Xtrain.shape, SPY_PCA_Xtest.shape, SPY_PCA_ytrain.shape, SPY_PCA_ytest.shape)

print(SPY_df_rand_Xtrain.shape, SPY_df_rand_Xtest.shape, SPY_df_rand_ytrain.shape, SPY_df_rand_ytest.shape)
print(SPY_df_Xtrain.shape, SPY_df_Xtest.shape, SPY_df_ytrain.shape, SPY_df_ytest.shape)

(5147, 7) (1287, 7) (5147, 6) (1287, 6)
(5147, 7) (1287, 7) (5147, 6) (1287, 6)
(5147, 10) (1287, 10) (5147, 6) (1287, 6)
(5147, 10) (1287, 10) (5147, 6) (1287, 6)


My target variables:
- 0: 'Adj Close 1day'
- 1: 'Adj Close 5day'
- 2: 'Adj Close 1day pct_change'
- 3: 'Adj Close 5day pct_change'
- 4: 'Adj Close 1day pct_change cls'
- 5: 'Adj Close 5day pct_change cls'

### Best performing:
#### Adj Close 1day: RandomForestRegressor Default with Original Randomized (Same as AAPL)

#### Adj Close 5day: RandomForestRegressor Default with Original Randomized (Same as AAPL)

#### Adj Close 1day pct_change: LSTM Optimized with Original
layer1 = 1024, layer2 = 256, epochs = 100, batch_size = 500

#### Adj Close 5day pct_change: RandomForestRegressor Default with Original Randomized

#### Adj Close 1day pct_change cls: XGBoost Optimized with Original
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
        gamma=0, learning_rate=0.01, max_delta_step=0, max_depth=6,
        min_child_weight=1, missing=None, n_estimators=1100, nthread=-1,
        objective='binary:logistic', reg_alpha=0.0001, reg_lambda=1,
        scale_pos_weight=1, seed=0, silent=True, subsample=1)

#### Adj Close 5day pct_change cls: XGBoost Optimized with Original
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
        gamma=0, learning_rate=0.01, max_delta_step=0, max_depth=8,
        min_child_weight=1, missing=None, n_estimators=1300, nthread=-1,
        objective='binary:logistic', reg_alpha=1e-05, reg_lambda=1,
        scale_pos_weight=1, seed=0, silent=True, subsample=1

Train Model and Save
#### Adj Close 1day: RandomForestRegressor Default with Original Randomized

In [9]:
original_perf = dict()

In [12]:
SPY_0_model = RandomForestRegressor()
SPY_0_model.fit(SPY_df_rand_Xtrain, SPY_df_rand_ytrain['Adj Close 1day'])

train_error = mean_squared_error(SPY_0_model.predict(SPY_df_rand_Xtrain), SPY_df_rand_ytrain['Adj Close 1day'])/ np.mean(SPY_df_rand_ytest['Adj Close 1day'])
test_error = mean_squared_error(SPY_0_model.predict(SPY_df_rand_Xtest), SPY_df_rand_ytest['Adj Close 1day'])/ np.mean(SPY_df_rand_ytest['Adj Close 1day'])

print('Train error: {}% of mean'.format(train_error * 100))
print('Test error: {}% of mean'.format(test_error * 100))

original_perf['Adj Close 1day'] = [train_error, test_error]

Train error: 0.0006150283370063454% of mean
Test error: 0.002844558195920166% of mean


In [13]:
SPY_0_model_pkl_filename = 'Models/SPY_0_model.pkl'
with open(SPY_0_model_pkl_filename, 'wb') as file:  
    pickle.dump(SPY_0_model, file)

#### Adj Close 5day: RandomForestRegressor Default with Original Randomized

In [15]:
SPY_1_model = RandomForestRegressor()
SPY_1_model.fit(SPY_df_rand_Xtrain, SPY_df_rand_ytrain['Adj Close 5day'])

train_error = mean_squared_error(SPY_1_model.predict(SPY_df_rand_Xtrain), SPY_df_rand_ytrain['Adj Close 5day']) / np.mean(SPY_df_rand_ytrain['Adj Close 5day'])
test_error = mean_squared_error(SPY_1_model.predict(SPY_df_rand_Xtest), SPY_df_rand_ytest['Adj Close 5day']) / np.mean(SPY_df_rand_ytest['Adj Close 5day'])

print('Train error: {}% of mean'.format(train_error * 100))
print('Test error: {}% of mean'.format(test_error * 100))

original_perf['Adj Close 5day'] = [train_error, test_error]

Train error: 0.0020762954066527667% of mean
Test error: 0.010555953793320328% of mean


In [16]:
SPY_1_model_pkl_filename = 'Models/SPY_1_model.pkl'
with open(SPY_1_model_pkl_filename, 'wb') as file:  
    pickle.dump(SPY_1_model, file)

#### Adj Close 1day pct_change: LSTM Optimized with Original
layer1 = 1024, layer2 = 256, epochs = 100, batch_size = 500

In [20]:
def window_transform_series(X, y, window_size):
    # containers for input/output pairs
    X_result = []
    y_result = []
    #print(series)
    #print(window_size)
    for i in range(len(y) - window_size):
        X_result.append(X[i: i + window_size])
        y_result.append(y[i + window_size])
        #print(i)
        #print(series[i: i + window_size])
        #print(series[i + window_size])
    # reshape each

    #print(X_result[:3])
    #print(np.asarray(X_result)[:3])
    #wait = input('wait')

    X_result = np.asarray(X_result)
    X_result.shape = (np.shape(X_result)[0:3])

    y_result = np.asarray(y_result)
    y_result.shape = (len(y_result), 1)

    return X_result, y_result

In [19]:
target_variable = 2
window_size = 5
batch_size = 500

Xtrain_LSTM, ytrain_LSTM = window_transform_series(SPY_df_Xtrain, SPY_df_ytrain[:, target_variable], window_size = window_size)
Xtest_LSTM, ytest_LSTM = window_transform_series(SPY_df_Xtest, SPY_df_ytest[:, target_variable], window_size = window_size)


good_size_train = Xtrain_LSTM.shape[0] - Xtrain_LSTM.shape[0] % batch_size
Xtrain_LSTM = Xtrain_LSTM[-good_size_train:]
ytrain_LSTM = ytrain_LSTM[-good_size_train:]

good_size_test = Xtest_LSTM.shape[0] - Xtest_LSTM.shape[0] % batch_size
Xtest_LSTM = Xtest_LSTM[-good_size_test:]
ytest_LSTM  = ytest_LSTM [-good_size_test:]


SPY_2_model = Sequential()
SPY_2_model.add(LSTM(1024, batch_input_shape = (batch_size, Xtrain_LSTM.shape[1], Xtrain_LSTM.shape[2]),
               stateful = True, return_sequences = True))
SPY_2_model.add(LSTM(256))
SPY_2_model.add(Dense(1))
SPY_2_model.compile(loss = 'mean_squared_error', optimizer = 'adam')
SPY_2_model.fit(Xtrain_LSTM, ytrain_LSTM, epochs = 100, batch_size = batch_size, verbose = 1, shuffle = False)

[array([[-6.95555556e-04, -9.00363165e-01, -5.55942244e-03,
         3.36351317e+00,  1.31150320e+01, -4.86066678e+00,
         3.35271907e+00,  1.32249206e+01,  1.12424828e-02,
        -2.69444276e+00],
       [ 0.00000000e+00, -4.97537154e-01, -1.38985561e-03,
         3.36351317e+00,  1.33070114e+01, -5.30521983e+00,
         3.35718597e+00,  1.30186936e+01,  6.57424539e-03,
        -2.29032608e+00],
       [-3.47352714e-03, -2.26479718e+00, -5.57880056e-03,
         3.35677999e+00,  1.17194658e+01, -5.17101524e+00,
         3.36028753e+00,  1.30691931e+01, -3.62621229e-03,
        -3.98214478e+00],
       [ 0.00000000e+00, -8.95167006e-01, -2.79916339e-03,
         3.35813035e+00,  1.28551316e+01, -5.29838625e+00,
         3.36163301e+00,  1.30348712e+01, -3.62101201e-03,
        -2.76562823e+00],
       [ 1.39762560e-03,  1.96803104e-01,  0.00000000e+00,
         3.36284343e+00,  9.88310159e+00, -4.85654233e+00,
         3.36122930e+00,  1.29665506e+01,  1.67296642e-03,
        -6

wait
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100


Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x23d3ba93390>

In [21]:
train_error = mean_squared_error(SPY_2_model.predict(Xtrain_LSTM, batch_size = batch_size), ytrain_LSTM) / np.mean(ytrain_LSTM)
test_error = mean_squared_error(SPY_2_model.predict(Xtest_LSTM, batch_size = batch_size), ytest_LSTM) / np.mean(ytest_LSTM)

print('Train error: {}% of mean'.format(train_error * 100))
print('Test error: {}% of mean'.format(test_error * 100))

original_perf['Adj Close 1day pct_change'] = [train_error, test_error]

Train error: 37.81070729071859% of mean
Test error: 14.055182927957716% of mean


In [22]:
SPY_2_model_pkl_filename = 'Models/SPY_2_model.h5'
SPY_2_model.save(SPY_2_model_pkl_filename)

#### Adj Close 5day pct_change: RandomForestRegressor Default with Original Randomized

In [24]:
SPY_3_model = RandomForestRegressor()
SPY_3_model.fit(SPY_df_rand_Xtrain, SPY_df_rand_ytrain['Adj Close 5day pct_change'])

train_error = mean_squared_error(SPY_3_model.predict(SPY_df_rand_Xtrain), SPY_df_rand_ytrain['Adj Close 5day pct_change']) / np.mean(SPY_df_rand_ytrain['Adj Close 5day pct_change'])
test_error = mean_squared_error(SPY_3_model.predict(SPY_df_rand_Xtest), SPY_df_rand_ytest['Adj Close 5day pct_change']) / np.mean(SPY_df_rand_ytest['Adj Close 5day pct_change'])

print('Train error: {}% of mean'.format(train_error * 100))
print('Test error: {}% of mean'.format(test_error * 100))

original_perf['Adj Close 5day pct_change'] = [train_error, test_error]

Train error: 4.963662841871456% of mean
Test error: 19.68931580372066% of mean


In [25]:
SPY_3_model_pkl_filename = 'Models/SPY_3_model.pkl'
with open(SPY_3_model_pkl_filename, 'wb') as file:  
    pickle.dump(SPY_3_model, file)

#### Adj Close 1day pct_change cls: XGBoost Optimized with PCA Randomized
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
        gamma=0, learning_rate=0.001, max_delta_step=0, max_depth=3,
        min_child_weight=1, missing=None, n_estimators=700, nthread=-1,
        objective='binary:logistic', reg_alpha=0.1, reg_lambda=1,
        scale_pos_weight=1, seed=0, silent=True, subsample=1)

In [29]:
target_variable = 4
SPY_4_model = xgb.XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
                                gamma=0, learning_rate=0.01, max_delta_step=0, max_depth=6,
                                min_child_weight=1, missing=None, n_estimators=1100, nthread=-1,
                                objective='binary:logistic', reg_alpha=0.0001, reg_lambda=1,
                                scale_pos_weight=1, seed=0, silent=True, subsample=1)
SPY_4_model.fit(SPY_df_Xtrain, SPY_df_ytrain[:, target_variable])
train_error = accuracy_score(SPY_4_model.predict(SPY_df_Xtrain), SPY_df_ytrain[:, target_variable])
test_error = accuracy_score(SPY_4_model.predict(SPY_df_Xtest), SPY_df_ytest[:, target_variable])

print('Train accuracy: {}%'.format(train_error * 100))
print('Test accuracy: {}%'.format(test_error * 100))

original_perf['Adj Close 1day pct_change cls'] = [train_error, test_error]

Train accuracy: 84.9232562657859%
Test accuracy: 45.84304584304584%


  if diff:
  if diff:


In [30]:
SPY_4_model_pkl_filename = 'Models/SPY_4_model.pkl'
with open(SPY_4_model_pkl_filename, 'wb') as file:  
    pickle.dump(SPY_4_model, file)

#### Adj Close 5day pct_change cls: XGBoost Optimized with Original
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
        gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=8,
        min_child_weight=1, missing=None, n_estimators=600, nthread=-1,
        objective='binary:logistic', reg_alpha=0.01, reg_lambda=1,
        scale_pos_weight=1, seed=0, silent=True, subsample=1)

In [32]:
target_variable = 5

SPY_5_model = xgb.XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
                                gamma=0, learning_rate=0.01, max_delta_step=0, max_depth=8,
                                min_child_weight=1, missing=None, n_estimators=1300, nthread=-1,
                                objective='binary:logistic', reg_alpha=1e-05, reg_lambda=1,
                                scale_pos_weight=1, seed=0, silent=True, subsample=1)
SPY_5_model.fit(SPY_df_Xtrain, SPY_df_ytrain[:, target_variable])

train_error = accuracy_score(SPY_5_model.predict(SPY_df_Xtrain), SPY_df_ytrain[:, target_variable])
test_error = accuracy_score(SPY_5_model.predict(SPY_df_Xtest), SPY_df_ytest[:, target_variable])

print('Train accuracy: {}%'.format(train_error * 100))
print('Test accuracy: {}%'.format(test_error * 100))

original_perf['Adj Close 5day pct_change cls'] = [train_error, test_error]

Train accuracy: 94.4239362735574%
Test accuracy: 48.717948717948715%


  if diff:
  if diff:


In [33]:
SPY_5_model_pkl_filename = 'Models/SPY_5_model.pkl'
with open(SPY_5_model_pkl_filename, 'wb') as file:  
    pickle.dump(SPY_5_model, file)

In [34]:
pd.DataFrame(original_perf)[['Adj Close 1day','Adj Close 5day','Adj Close 1day pct_change','Adj Close 5day pct_change',
                             'Adj Close 1day pct_change cls','Adj Close 5day pct_change cls']]

Unnamed: 0,Adj Close 1day,Adj Close 5day,Adj Close 1day pct_change,Adj Close 5day pct_change,Adj Close 1day pct_change cls,Adj Close 5day pct_change cls
0,6e-06,2.1e-05,0.378107,0.049637,0.849233,0.944239
1,2.8e-05,0.000106,0.140552,0.196893,0.45843,0.487179


I'm going to try to test my model in two directions:
1. Vertical - This involves in testing the model of the same equity, but with more recent data. Using the original PCA
2. Horizontal - This involves in testing the model of different equity. Using the original PCA.

### Vertical Testing with Updated Data

In [39]:
#loading PCA
SPY_PCA_model_pkl_filename = 'Models/SPY_PCA_model.pkl'
with open(SPY_PCA_model_pkl_filename, 'rb') as file:
    SPY_PCA_model = pickle.load(file)
    
SPY_0_model_pkl_filename = 'Models/SPY_0_model.pkl'
with open(SPY_0_model_pkl_filename, 'rb') as file:  
    SPY_0_model = pickle.load(file)
    
SPY_1_model_pkl_filename = 'Models/SPY_1_model.pkl'
with open(SPY_1_model_pkl_filename, 'rb') as file:  
    SPY_1_model = pickle.load(file)
    
SPY_2_model_pkl_filename = 'Models/SPY_2_model.h5'
SPY_2_model = models.load_model(SPY_2_model_pkl_filename)

SPY_3_model_pkl_filename = 'Models/SPY_3_model.pkl'
with open(SPY_3_model_pkl_filename, 'rb') as file:  
    SPY_3_model = pickle.load(file)
    
SPY_4_model_pkl_filename = 'Models/SPY_4_model.pkl'
with open(SPY_4_model_pkl_filename, 'rb') as file:  
    SPY_4_model = pickle.load(file)
    
SPY_5_model_pkl_filename = 'Models/SPY_5_model.pkl'
with open(SPY_5_model_pkl_filename, 'rb') as file:  
    SPY_5_model = pickle.load(file)

try it on the updated SPY dataset

In [44]:
_, SPY2_PCA_df, SPY2_df = extract_data('SPY2', SPY_PCA_model)
SPY2_PCA_df_rand = shuffle(SPY2_PCA_df, random_state = 0)
SPY2_df_rand = shuffle(SPY2_df, random_state = 0)



In [45]:
SPY2_PCA_rand_X = SPY2_PCA_df_rand[['Dimension 1', 'Dimension 2', 'Dimension 3', 'Dimension 4',
                                    'Dimension 5', 'Dimension 6', 'Dimension 7']]
SPY2_PCA_rand_y = SPY2_PCA_df_rand[['Adj Close 1day', 'Adj Close 5day',
                                    'Adj Close 1day pct_change', 'Adj Close 5day pct_change',
                                    'Adj Close 1day pct_change cls', 'Adj Close 5day pct_change cls']]

SPY2_PCA_X = SPY2_PCA_df[['Dimension 1', 'Dimension 2', 'Dimension 3', 'Dimension 4',
                          'Dimension 5', 'Dimension 6', 'Dimension 7']]
SPY2_PCA_y = SPY2_PCA_df[['Adj Close 1day', 'Adj Close 5day',
                          'Adj Close 1day pct_change', 'Adj Close 5day pct_change',
                          'Adj Close 1day pct_change cls', 'Adj Close 5day pct_change cls']]

SPY2_df_rand_X = SPY2_df_rand[['Open', 'High', 'Low', 'Adj Close', 'Volume', 'Range', 'MA5 Adj Close', 'MA5 Volume',
                               'MA5 Adj Close pct_change', 'MA5 Volume pct_change']]
SPY2_df_rand_y = SPY2_df_rand[['Adj Close 1day', 'Adj Close 5day', 'Adj Close 1day pct_change',
                               'Adj Close 5day pct_change', 'Adj Close 1day pct_change cls',
                               'Adj Close 5day pct_change cls']]

SPY2_df_X = SPY2_df[['Open', 'High', 'Low', 'Adj Close', 'Volume', 'Range', 'MA5 Adj Close', 'MA5 Volume',
                     'MA5 Adj Close pct_change', 'MA5 Volume pct_change']]
SPY2_df_y = SPY2_df[['Adj Close 1day', 'Adj Close 5day', 'Adj Close 1day pct_change',
                     'Adj Close 5day pct_change', 'Adj Close 1day pct_change cls',
                     'Adj Close 5day pct_change cls']]

In [46]:
print(SPY2_PCA_rand_X.shape, SPY2_PCA_rand_y.shape)
print(SPY2_PCA_X.shape, SPY2_PCA_y.shape)

print(SPY2_df_rand_X.shape, SPY2_df_rand_y.shape)
print(SPY2_df_X.shape, SPY2_df_y.shape)

(6449, 7) (6449, 6)
(6449, 7) (6449, 6)
(6449, 10) (6449, 6)
(6449, 10) (6449, 6)


In [47]:
SPY2_perf = dict()

#### Adj Close 1day: RandomForestRegressor Default with Original Randomized (Same as AAPL)

In [49]:
error = mean_squared_error(SPY_0_model.predict(SPY2_df_rand_X), SPY2_df_rand_y['Adj Close 1day']) / np.mean(SPY2_df_rand_y['Adj Close 1day'])

print('Error: {}% of mean'.format(error * 100))

SPY2_perf['Adj Close 1day'] = error

Error: 0.0010732753937476827% of mean


#### Adj Close 5day: RandomForestRegressor Default with Original Randomized (Same as AAPL)

In [50]:
error = mean_squared_error(SPY_1_model.predict(SPY2_df_rand_X), SPY2_df_rand_y['Adj Close 5day']) / np.mean(SPY2_df_rand_y['Adj Close 5day'])

print('Error: {}% of mean'.format(error * 100))

SPY2_perf['Adj Close 5day'] = error

Error: 0.003767846190031942% of mean


In [51]:
def window_transform_series(X, y, window_size):
    # containers for input/output pairs
    X_result = []
    y_result = []
    #print(series)
    #print(window_size)
    for i in range(len(y) - window_size):
        X_result.append(X[i: i + window_size])
        y_result.append(y[i + window_size])
        #print(i)
        #print(series[i: i + window_size])
        #print(series[i + window_size])
    # reshape each

    X_result = np.asarray(X_result)
    X_result.shape = (np.shape(X_result)[0:3])

    y_result = np.asarray(y_result)
    y_result.shape = (len(y_result), 1)

    return X_result, y_result

#### Adj Close 1day pct_change: LSTM Optimized with Original
layer1 = 1024, layer2 = 256, epochs = 100, batch_size = 500

In [52]:
target_variable = 2
window_size = 5
batch_size = 500

SPY2_X_LSTM, SPY2_y_LSTM = window_transform_series(np.array(SPY2_df_X), np.array(SPY2_df_y['Adj Close 1day pct_change']), window_size = window_size)

good_size_train = SPY2_X_LSTM.shape[0] - SPY2_X_LSTM.shape[0] % batch_size
SPY2_X_LSTM = SPY2_X_LSTM[-good_size_train:]
SPY2_y_LSTM = SPY2_y_LSTM[-good_size_train:]

error = mean_squared_error(SPY_2_model.predict(SPY2_X_LSTM, batch_size = batch_size), np.array(SPY2_y_LSTM)) / np.mean(SPY2_y_LSTM)

print('Error: {}% of mean'.format(error * 100))

SPY2_perf['Adj Close 1day pct_change'] = error

Error: 31.814899848662787% of mean


#### Adj Close 5day pct_change: RandomForestRegressor Default with Original Randomized

In [54]:
error = mean_squared_error(SPY_3_model.predict(SPY2_df_rand_X), SPY2_df_rand_y['Adj Close 5day pct_change']) / np.mean(SPY2_df_rand_y['Adj Close 5day pct_change'])

print('Error: {}% of mean'.format(error * 100))

SPY2_perf['Adj Close 5day pct_change'] = error

Error: 8.545652512624786% of mean


#### Adj Close 1day pct_change cls: XGBoost Optimized with Original
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
        gamma=0, learning_rate=0.01, max_delta_step=0, max_depth=6,
        min_child_weight=1, missing=None, n_estimators=1100, nthread=-1,
        objective='binary:logistic', reg_alpha=0.0001, reg_lambda=1,
        scale_pos_weight=1, seed=0, silent=True, subsample=1)

In [75]:
target_variable = 4
error = accuracy_score(SPY_4_model.predict(np.array(SPY2_df_X)), SPY2_df_y['Adj Close 1day pct_change cls'])

print('Error: {}% of mean'.format(error * 100))

SPY2_perf['Adj Close 1day pct_change cls'] = error

Error: 76.83361761513413% of mean


  if diff:


#### Adj Close 5day pct_change cls: XGBoost Optimized with Original
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
        gamma=0, learning_rate=0.01, max_delta_step=0, max_depth=8,
        min_child_weight=1, missing=None, n_estimators=1300, nthread=-1,
        objective='binary:logistic', reg_alpha=1e-05, reg_lambda=1,
        scale_pos_weight=1, seed=0, silent=True, subsample=1

In [76]:
error = accuracy_score(SPY_5_model.predict(np.array(SPY2_df_X)), SPY2_df_y['Adj Close 5day pct_change cls'])

print('Error: {}% of mean'.format(error * 100))

SPY2_perf['Adj Close 5day pct_change cls'] = error

Error: 85.08295859823228% of mean


  if diff:


In [77]:
SPY2_perf

{'Adj Close 1day': 1.0732753937476826e-05,
 'Adj Close 1day pct_change': 0.31814899848662787,
 'Adj Close 1day pct_change cls': 0.7683361761513413,
 'Adj Close 5day': 3.767846190031942e-05,
 'Adj Close 5day pct_change': 0.08545652512624785,
 'Adj Close 5day pct_change cls': 0.8508295859823228}

### SPY performance Comparison

|MSE and ACC|Adj Close 1day|Adj Close 5day|Adj Close 1day pct_change|Adj Close 5day pct_change|Adj Close 1day pct_change cls|Adj Close 5day pct_change cls|
|-|-|-|-|-|-||
|SPY train  |0.000006|0.000021|0.378107|0.049637|0.849233|0.944239|
|SPY test   |0.000028|0.000106|0.140552|0.196893|0.458430|0.487179|
|SPY Updated|1.073274e-05|3.767846e-05|0.318148|0.318148|0.768336|0.850829|

### Horizontal Testing with Updated Data

#### Trying AAPL2

In [78]:
_, AAPL2_PCA_df, SPY_df = extract_data('AAPL2', SPY_PCA_model)
AAPL2_PCA_df_rand = shuffle(AAPL2_PCA_df, random_state = 0)
AAPL2_df_rand = shuffle(AAPL2_df, random_state = 0)



In [80]:
AAPL2_PCA_rand_X = AAPL2_PCA_df_rand[['Dimension 1', 'Dimension 2', 'Dimension 3', 'Dimension 4',
                                      'Dimension 5', 'Dimension 6', 'Dimension 7']]
AAPL2_PCA_rand_y = AAPL2_PCA_df_rand[['Adj Close 1day', 'Adj Close 5day',
                                      'Adj Close 1day pct_change', 'Adj Close 5day pct_change',
                                      'Adj Close 1day pct_change cls', 'Adj Close 5day pct_change cls']]
AAPL2_PCA_X = AAPL2_PCA_df[['Dimension 1', 'Dimension 2', 'Dimension 3', 'Dimension 4',
                            'Dimension 5', 'Dimension 6', 'Dimension 7']]
AAPL2_PCA_y = AAPL2_PCA_df[['Adj Close 1day', 'Adj Close 5day',
                            'Adj Close 1day pct_change', 'Adj Close 5day pct_change',
                            'Adj Close 1day pct_change cls', 'Adj Close 5day pct_change cls']]

AAPL2_df_rand_X = AAPL2_df_rand[['Open', 'High', 'Low', 'Adj Close', 'Volume', 'Range', 'MA5 Adj Close', 'MA5 Volume',
                                 'MA5 Adj Close pct_change', 'MA5 Volume pct_change']]
AAPL2_df_rand_y = AAPL2_df_rand[['Adj Close 1day', 'Adj Close 5day', 'Adj Close 1day pct_change',
                                 'Adj Close 5day pct_change', 'Adj Close 1day pct_change cls',
                                 'Adj Close 5day pct_change cls']]

AAPL2_df_X = AAPL2_df[['Open', 'High', 'Low', 'Adj Close', 'Volume', 'Range', 'MA5 Adj Close', 'MA5 Volume',
                       'MA5 Adj Close pct_change', 'MA5 Volume pct_change']]
AAPL2_df_y = AAPL2_df[['Adj Close 1day', 'Adj Close 5day', 'Adj Close 1day pct_change',
                       'Adj Close 5day pct_change', 'Adj Close 1day pct_change cls',
                       'Adj Close 5day pct_change cls']]

In [81]:
print(AAPL2_PCA_rand_X.shape, AAPL2_PCA_rand_y.shape)
print(AAPL2_PCA_X.shape, AAPL2_PCA_y.shape)

print(AAPL2_df_rand_X.shape, AAPL2_df_rand_y.shape)
print(AAPL2_df_X.shape, AAPL2_df_y.shape)

(7220, 7) (7220, 6)
(7220, 7) (7220, 6)
(7220, 10) (7220, 6)
(7220, 10) (7220, 6)


In [82]:
AAPL2_perf = dict()

In [83]:
error = mean_squared_error(SPY_0_model.predict(AAPL2_df_rand_X), AAPL2_df_rand_y['Adj Close 1day']) / np.mean(AAPL2_df_rand_y['Adj Close 1day'])

print('Error: {}% of mean'.format(error * 100))

AAPL2_perf['Adj Close 1day'] = error

Error: 216.61559473813355% of mean


In [84]:
error = mean_squared_error(SPY_1_model.predict(AAPL2_df_rand_X), AAPL2_df_rand_y['Adj Close 5day']) / np.mean(AAPL2_df_rand_y['Adj Close 5day'])

print('Error: {}% of mean'.format(error * 100))

AAPL2_perf['Adj Close 5day'] = error

Error: 218.05364708970703% of mean


In [85]:
target_variable = 2
window_size = 5
batch_size = 500

AAPL2_X_LSTM, AAPL2_y_LSTM = window_transform_series(np.array(AAPL2_df_X), np.array(AAPL2_df_y['Adj Close 1day pct_change']), window_size = window_size)

good_size_train = AAPL2_X_LSTM.shape[0] - AAPL2_X_LSTM.shape[0] % batch_size
AAPL2_X_LSTM = AAPL2_X_LSTM[-good_size_train:]
AAPL2_y_LSTM = AAPL2_y_LSTM[-good_size_train:]

error = mean_squared_error(SPY_2_model.predict(AAPL2_X_LSTM, batch_size = batch_size), np.array(AAPL2_y_LSTM)) / np.mean(AAPL2_y_LSTM)

print('Error: {}% of mean'.format(error * 100))

AAPL2_perf['Adj Close 1day pct_change'] = error

Error: 56.4209137543583% of mean


In [86]:
error = mean_squared_error(SPY_3_model.predict(AAPL2_df_rand_X), AAPL2_df_rand_y['Adj Close 5day pct_change']) / np.mean(AAPL2_df_rand_y['Adj Close 5day pct_change'])

print('Error: {}% of mean'.format(error * 100))

AAPL2_perf['Adj Close 5day pct_change'] = error

Error: 62.833230362136426% of mean


In [87]:
target_variable = 4
error = accuracy_score(SPY_4_model.predict(np.array(AAPL2_df_X)), AAPL2_df_y['Adj Close 1day pct_change cls'])

print('Error: {}% of mean'.format(error * 100))

AAPL2_perf['Adj Close 1day pct_change cls'] = error

Error: 52.5623268698061% of mean


  if diff:


In [88]:
error = accuracy_score(SPY_5_model.predict(np.array(AAPL2_df_X)), AAPL2_df_y['Adj Close 5day pct_change cls'])

print('Error: {}% of mean'.format(error * 100))

AAPL2_perf['Adj Close 5day pct_change cls'] = error

Error: 54.21052631578947% of mean


  if diff:


In [89]:
AAPL2_perf

{'Adj Close 1day': 2.1661559473813354,
 'Adj Close 1day pct_change': 0.564209137543583,
 'Adj Close 1day pct_change cls': 0.525623268698061,
 'Adj Close 5day': 2.1805364708970703,
 'Adj Close 5day pct_change': 0.6283323036213643,
 'Adj Close 5day pct_change cls': 0.5421052631578948}

|MSE and ACC|Adj Close 1day|Adj Close 5day|Adj Close 1day pct_change|Adj Close 5day pct_change|Adj Close 1day pct_change cls|Adj Close 5day pct_change cls|
|-|-|-|-|-|-||
|SPY train  |0.000006|0.000021|0.378107|0.049637|0.849233|0.944239|
|SPY test   |0.000028|0.000106|0.140552|0.196893|0.458430|0.487179|
|SPY Updated|1.073274e-05|3.767846e-05|0.318148|0.318148|0.768336|0.850829|
|AAPL Updated|2.166155|2.180536|0.525623|0.628332|0.525623|0.542105|

Let me actually built a pipeline like item to produce results for the individual stock

In [94]:
#initalize what we need
perf = dict()

In [90]:
#loading models
pca_model = SPY_PCA_model
model0 = SPY_0_model
model1 = SPY_1_model
model2 = SPY_2_model
model3 = SPY_3_model
model4 = SPY_4_model
model5 = SPY_5_model

In [91]:
#extract data and implementing some methods to the data
_, PCA_df, df = extract_data('HD', pca_model)
PCA_df_rand = shuffle(PCA_df, random_state = 0)
df_rand = shuffle(df, random_state = 0)



In [93]:
#separate the targets from the features
PCA_rand_X = PCA_df_rand[['Dimension 1', 'Dimension 2', 'Dimension 3', 'Dimension 4',
                          'Dimension 5', 'Dimension 6', 'Dimension 7']]
PCA_rand_y = PCA_df_rand[['Adj Close 1day', 'Adj Close 5day',
                          'Adj Close 1day pct_change', 'Adj Close 5day pct_change',
                          'Adj Close 1day pct_change cls', 'Adj Close 5day pct_change cls']]

PCA_X = PCA_df[['Dimension 1', 'Dimension 2', 'Dimension 3', 'Dimension 4',
                'Dimension 5', 'Dimension 6', 'Dimension 7']]
PCA_y = PCA_df[['Adj Close 1day', 'Adj Close 5day',
                'Adj Close 1day pct_change', 'Adj Close 5day pct_change',
                'Adj Close 1day pct_change cls', 'Adj Close 5day pct_change cls']]

df_rand_X = df_rand[['Open', 'High', 'Low', 'Adj Close', 'Volume', 'Range', 'MA5 Adj Close', 'MA5 Volume',
                     'MA5 Adj Close pct_change', 'MA5 Volume pct_change']]
df_rand_y = df_rand[['Adj Close 1day', 'Adj Close 5day', 'Adj Close 1day pct_change',
                     'Adj Close 5day pct_change', 'Adj Close 1day pct_change cls',
                     'Adj Close 5day pct_change cls']]

df_X = df[['Open', 'High', 'Low', 'Adj Close', 'Volume', 'Range', 'MA5 Adj Close', 'MA5 Volume',
           'MA5 Adj Close pct_change', 'MA5 Volume pct_change']]
df_y = df[['Adj Close 1day', 'Adj Close 5day', 'Adj Close 1day pct_change',
           'Adj Close 5day pct_change', 'Adj Close 1day pct_change cls',
           'Adj Close 5day pct_change cls']]

In [100]:
df_rand_y['Adj Close 1day'].name

'Adj Close 1day'

In [109]:
#this is what an user would pass in
data_list = [(df_rand_X, df_rand_y['Adj Close 1day']),
             (df_rand_X, df_rand_y['Adj Close 5day']),
             (df_X, df_y['Adj Close 1day pct_change']),
             (df_rand_X, df_rand_y['Adj Close 5day pct_change']),
             (df_X, df_y['Adj Close 1day pct_change cls']),
             (df_X, df_y['Adj Close 5day pct_change cls'])]

In [110]:
i = 0
error = mean_squared_error(model0.predict(data_list[i][0]), data_list[i][1]) / np.mean(data_list[i][1])

print('Error: {}% of mean'.format(error * 100))

perf[data_list[i][1].name] = error

Error: 22.457426852774436% of mean


In [111]:
i = 1
error = mean_squared_error(model1.predict(data_list[i][0]), data_list[i][1]) / np.mean(data_list[i][1])

print('Error: {}% of mean'.format(error * 100))

perf[data_list[i][1].name] = error

Error: 22.719214838362394% of mean


In [112]:
i = 2
window_size = 5
batch_size = 500

X_LSTM, y_LSTM = window_transform_series(np.array(data_list[i][0]), np.array(data_list[i][1]), window_size = window_size)

good_size_train = X_LSTM.shape[0] - X_LSTM.shape[0] % batch_size
X_LSTM = X_LSTM[-good_size_train:]
y_LSTM = y_LSTM[-good_size_train:]

error = mean_squared_error(model2.predict(X_LSTM, batch_size = batch_size), np.array(y_LSTM)) / np.mean(y_LSTM)

print('Error: {}% of mean'.format(error * 100))

perf[data_list[i][1].name] = error

Error: 42.9825275261483% of mean


In [113]:
i = 3
error = mean_squared_error(model3.predict(np.array(data_list[i][0])), data_list[i][1]) / np.mean(data_list[i][1])

print('Error: {}% of mean'.format(error * 100))

perf[data_list[i][1].name] = error

Error: 44.71636578327715% of mean


In [115]:
i = 4
error = accuracy_score(model4.predict(np.array(data_list[i][0])), data_list[i][1])

print('Error: {}% of mean'.format(error * 100))

perf[data_list[i][1].name] = error

Error: 52.60099612617598% of mean


  if diff:


In [116]:
i = 5
error = accuracy_score(model5.predict(np.array(data_list[i][0])), data_list[i][1])

print('Error: {}% of mean'.format(error * 100))

perf[data_list[i][1].name] = error

Error: 54.12285556170448% of mean


  if diff:


In [124]:
perf

{'Adj Close 1day': 0.22457426852774437,
 'Adj Close 1day pct_change': 0.42982527526148306,
 'Adj Close 1day pct_change cls': 0.5260099612617598,
 'Adj Close 5day': 0.22719214838362392,
 'Adj Close 5day pct_change': 0.44716365783277146,
 'Adj Close 5day pct_change cls': 0.5412285556170449}

|MSE and ACC|Adj Close 1day|Adj Close 5day|Adj Close 1day pct_change|Adj Close 5day pct_change|Adj Close 1day pct_change cls|Adj Close 5day pct_change cls|
|-|-|-|-|-|-||
|SPY train  |0.000006|0.000021|0.378107|0.049637|0.849233|0.944239|
|SPY test   |0.000028|0.000106|0.140552|0.196893|0.458430|0.487179|
|SPY Updated|1.073274e-05|3.767846e-05|0.318148|0.318148|0.768336|0.850829|
|AAPL Updated|2.166155|2.180536|0.525623|0.628332|0.525623|0.542105|
|HD Updated |0.224574|0.227192|0.429825|0.447163|0.526009|0.541228|

Now let's 'pipeline' the process a little more

In [131]:
def SPY_apply(symbol):
    #initalize what we need
    perf = dict()
    
    #loading models
    pca_model = SPY_PCA_model
    models = [SPY_0_model, SPY_1_model, SPY_2_model,
              SPY_3_model, SPY_4_model, SPY_5_model]
    
    #extract data and implementing some methods to the data
    _, PCA_df, df = extract_data(symbol, pca_model)
    PCA_df_rand = shuffle(PCA_df, random_state = 0)
    df_rand = shuffle(df, random_state = 0)
    
    #separate the targets from the features
    PCA_rand_X = PCA_df_rand[['Dimension 1', 'Dimension 2', 'Dimension 3', 'Dimension 4',
                              'Dimension 5', 'Dimension 6', 'Dimension 7']]
    PCA_rand_y = PCA_df_rand[['Adj Close 1day', 'Adj Close 5day',
                              'Adj Close 1day pct_change', 'Adj Close 5day pct_change',
                              'Adj Close 1day pct_change cls', 'Adj Close 5day pct_change cls']]

    PCA_X = PCA_df[['Dimension 1', 'Dimension 2', 'Dimension 3', 'Dimension 4',
                    'Dimension 5', 'Dimension 6', 'Dimension 7']]
    PCA_y = PCA_df[['Adj Close 1day', 'Adj Close 5day',
                    'Adj Close 1day pct_change', 'Adj Close 5day pct_change',
                    'Adj Close 1day pct_change cls', 'Adj Close 5day pct_change cls']]

    df_rand_X = df_rand[['Open', 'High', 'Low', 'Adj Close', 'Volume', 'Range', 'MA5 Adj Close', 'MA5 Volume',
                         'MA5 Adj Close pct_change', 'MA5 Volume pct_change']]
    df_rand_y = df_rand[['Adj Close 1day', 'Adj Close 5day', 'Adj Close 1day pct_change',
                         'Adj Close 5day pct_change', 'Adj Close 1day pct_change cls',
                         'Adj Close 5day pct_change cls']]

    df_X = df[['Open', 'High', 'Low', 'Adj Close', 'Volume', 'Range', 'MA5 Adj Close', 'MA5 Volume',
               'MA5 Adj Close pct_change', 'MA5 Volume pct_change']]
    df_y = df[['Adj Close 1day', 'Adj Close 5day', 'Adj Close 1day pct_change',
               'Adj Close 5day pct_change', 'Adj Close 1day pct_change cls',
               'Adj Close 5day pct_change cls']]
    
    #this is what an user would pass in
    data_list = [(df_rand_X, df_rand_y['Adj Close 1day']),
                 (df_rand_X, df_rand_y['Adj Close 5day']),
                 (df_X, df_y['Adj Close 1day pct_change']),
                 (df_rand_X, df_rand_y['Adj Close 5day pct_change']),
                 (df_X, df_y['Adj Close 1day pct_change cls']),
                 (df_X, df_y['Adj Close 5day pct_change cls'])]
    
    for i in range(2):
        error = mean_squared_error(models[i].predict(data_list[i][0]), data_list[i][1]) / np.mean(data_list[i][1])
        print('Error: {}% of mean'.format(error * 100))
        perf[data_list[i][1].name] = error
        
    i = 2
    window_size = 5
    batch_size = 500

    X_LSTM, y_LSTM = window_transform_series(np.array(data_list[i][0]), np.array(data_list[i][1]), window_size = window_size)

    good_size_train = X_LSTM.shape[0] - X_LSTM.shape[0] % batch_size
    X_LSTM = X_LSTM[-good_size_train:]
    y_LSTM = y_LSTM[-good_size_train:]

    error = mean_squared_error(models[i].predict(X_LSTM, batch_size = batch_size), np.array(y_LSTM)) / np.mean(y_LSTM)
    print('Error: {}% of mean'.format(error * 100))
    perf[data_list[i][1].name] = error
    
    i = 3
    error = mean_squared_error(models[i].predict(data_list[i][0]), data_list[i][1]) / np.mean(data_list[i][1])
    print('Error: {}% of mean'.format(error * 100))
    perf[data_list[i][1].name] = error
    
    for i in range(4,6):
        error = accuracy_score(models[i].predict(np.array(data_list[i][0])), data_list[i][1])
        print('Error: {}% of mean'.format(error * 100))
        perf[data_list[i][1].name] = error
        
    return perf

In [133]:
SPY_apply('FB')



Error: 0.09982693494705508% of mean
Error: 0.1475675954928723% of mean
Error: 29.711184438887038% of mean
Error: 44.910184343552125% of mean
Error: 49.87389659520807% of mean
Error: 47.91929382093317% of mean


  if diff:
  if diff:


{'Adj Close 1day': 0.0009982693494705508,
 'Adj Close 1day pct_change': 0.2971118443888704,
 'Adj Close 1day pct_change cls': 0.4987389659520807,
 'Adj Close 5day': 0.001475675954928723,
 'Adj Close 5day pct_change': 0.44910184343552123,
 'Adj Close 5day pct_change cls': 0.4791929382093317}

|MSE and ACC|Adj Close 1day|Adj Close 5day|Adj Close 1day pct_change|Adj Close 5day pct_change|Adj Close 1day pct_change cls|Adj Close 5day pct_change cls|
|-|-|-|-|-|-|-|
|SPY train  |0.000006|0.000021|0.378107|0.049637|0.849233|0.944239|
|SPY test   |0.000028|0.000106|0.140552|0.196893|0.458430|0.487179|
|SPY Updated|1.073274e-05|3.767846e-05|0.318148|0.318148|0.768336|0.850829|
|AAPL Updated|2.166155|2.180536|0.525623|0.628332|0.525623|0.542105|
|HD Updated |0.224574|0.227192|0.429825|0.447163|0.526009|0.541228|
|FB Updated |0.000998|0.001475|0.297111|0.449101|0.498738|0.479192|

In [135]:
SPY_apply('AMZN')



Error: 9.364985775812924% of mean
Error: 9.377376385121924% of mean
Error: 85.04026364569988% of mean
Error: 76.61001176054467% of mean
Error: 49.142431021625654% of mean
Error: 51.043997017151376% of mean


  if diff:
  if diff:


{'Adj Close 1day': 0.09364985775812924,
 'Adj Close 1day pct_change': 0.8504026364569989,
 'Adj Close 1day pct_change cls': 0.49142431021625654,
 'Adj Close 5day': 0.09377376385121923,
 'Adj Close 5day pct_change': 0.7661001176054467,
 'Adj Close 5day pct_change cls': 0.5104399701715138}

|MSE and ACC|Adj Close 1day|Adj Close 5day|Adj Close 1day pct_change|Adj Close 5day pct_change|Adj Close 1day pct_change cls|Adj Close 5day pct_change cls|
|-|-|-|-|-|-|-|
|SPY train  |0.000006|0.000021|0.378107|0.049637|0.849233|0.944239|
|SPY test   |0.000028|0.000106|0.140552|0.196893|0.458430|0.487179|
|SPY Updated|1.073274e-05|3.767846e-05|0.318148|0.318148|0.768336|0.850829|
|AAPL Updated|2.166155|2.180536|0.525623|0.628332|0.525623|0.542105|
|HD Updated |0.224574|0.227192|0.429825|0.447163|0.526009|0.541228|
|FB Updated |0.000998|0.001475|0.297111|0.449101|0.498738|0.479192|
|AMZN Updated|0.093649|0.093773|0.850402|0.766100|0.491424|0.510439|

In [137]:
SPY_apply('VXX')



Error: 97.05980223545964% of mean
Error: 95.84183472825993% of mean
Error: -420.70907146557295% of mean
Error: -88.79930611683879% of mean
Error: 54.96277915632754% of mean
Error: 51.28205128205128% of mean


  if diff:
  if diff:


{'Adj Close 1day': 0.9705980223545964,
 'Adj Close 1day pct_change': -4.207090714655729,
 'Adj Close 1day pct_change cls': 0.5496277915632755,
 'Adj Close 5day': 0.9584183472825992,
 'Adj Close 5day pct_change': -0.8879930611683879,
 'Adj Close 5day pct_change cls': 0.5128205128205128}

|MSE and ACC|Adj Close 1day|Adj Close 5day|Adj Close 1day pct_change|Adj Close 5day pct_change|Adj Close 1day pct_change cls|Adj Close 5day pct_change cls|
|-|-|-|-|-|-|-|
|SPY train  |0.000006|0.000021|0.378107|0.049637|0.849233|0.944239|
|SPY test   |0.000028|0.000106|0.140552|0.196893|0.458430|0.487179|
|SPY Updated|1.073274e-05|3.767846e-05|0.318148|0.318148|0.768336|0.850829|
|AAPL Updated|2.166155|2.180536|0.525623|0.628332|0.525623|0.542105|
|HD Updated |0.224574|0.227192|0.429825|0.447163|0.526009|0.541228|
|FB Updated |0.000998|0.001475|0.297111|0.449101|0.498738|0.479192|
|AMZN Updated|0.093649|0.093773|0.850402|0.766100|0.491424|0.510439|
|VXX Updated|0.970598|0.958418|-4.207090|-0.887993|0.549627|0.512820|