<a href="https://colab.research.google.com/github/Zanderl1987/Neural-Networks-and-Deep-Learning-Projects/blob/master/3_Multi_input_LSTM_Time_Series_Modeling_Practice_04142020.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Useful Tutorials referenced to create this notebook
https://www.kaggle.com/lokeshkumarn/timeseries-multivariate  
https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/  
[Saving and loading keras models tutorial](https://machinelearningmastery.com/save-load-keras-deep-learning-models/)  
[Answer to MinMaxScaler broadcasting error](https://datascience.stackexchange.com/questions/22488/value-error-operands-could-not-be-broadcast-together-with-shapes-lstm)    
[Tutorial on building a multi-step LSTM model](https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/)  

[TA technical analysis library github](https://github.com/bukosabino/ta)  
[Using the Box-Cox transform to denoise time series data](https://mode.com/example-gallery/forecasting_prophet_python_cookbook/)  
[Multi-step Time series forecasting with Long Short-Term Memory Networks in Python](https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/)   
[Python script from GitHub with code for advanced time series modeling with LSTMs combined with other NN layer types for superior performance](https://github.com/dhingratul/Stock-Price-Prediction/blob/master/src/timeSeriesPredict.py)  

## Next Steps
1. Denoise the data using the Box-Cox transform (tutorial in the cell above) and see if this helps to improve the model's performance
2. Go through the entire tutorial above on multi-step time series forecasting with LSTMs - Make sure you know exactly what his code is doing inside and out
3. Explore time series feature extraction libraries and see if there are any that are useful
4. Explore optimal model structure of LSTM time series models in greater detail. Figure out if there are other types of layers or anything else that your current model might be lacking and if any, learn why they are important and exactly how they impact the performance of the model
5. Explore more about how LSTMs work, what kinds of features they find most useful, appropriate data preprocessing steps that you may not be using etc. Ultimately it would be nice to understand loosely (if possible and if any) some nuances or rules of thumb that may help shave some time off of the modeling process.
6. Are there any common mistakes, important things to avoid, or important things to do in the LSTM time series modeling process?

In [0]:
%%capture
!pip install yfinance
!pip install xarray
!pip install fbprophet
!pip install pandas_profiling
!pip install --upgrade ta

In [7]:
import yfinance as yf
import xarray
import ta

import requests
import pandas as pd
import datetime
import pandas_profiling
import numpy as np
import seaborn as sns
import xgboost
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from tqdm import tqdm_notebook as tqdm


import plotly
from plotly.graph_objs import *
import plotly.tools as tls
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.express as px
import plotly.offline as py
import plotly.graph_objs as go

from scipy.stats import norm
from scipy.stats import skew
from scipy import stats
from scipy.stats.stats import pearsonr
from collections import Counter

from xgboost import XGBClassifier

import sklearn
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import GridSearchCV

# from sklearn.linear_model import LogisticRegression
# from sklearn.ensemble import RandomForestClassifier
# from sklearn.ensemble import ExtraTreesClassifier
# from sklearn.ensemble import GradientBoostingClassifier
# from sklearn.ensemble import AdaBoostClassifier
# from sklearn.svm import LinearSVC
# from sklearn.svm import SVC
# from sklearn.neural_network import MLPClassifier
# from sklearn.ensemble import VotingClassifier
# from sklearn.naive_bayes import GaussianNB
# from sklearn.neighbors import KNeighborsClassifier
# from sklearn.tree import DecisionTreeClassifier
# from sklearn.linear_model import SGDClassifier
# from sklearn.model_selection import StratifiedKFold

from sklearn.metrics import accuracy_score, precision_recall_curve, classification_report,confusion_matrix, f1_score

from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.callbacks import EarlyStopping
from keras.layers import Dense, LSTM, GRU, Dropout, Activation
from keras.layers.convolutional import Conv1D, MaxPooling1D

import helper

import pandas_datareader as pdr

sns.set(style='ticks')
%matplotlib inline

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

#import category_encoders as ce

# Import data retrieval libraries
#import quandl
#quandl.ApiConfig.api_key = 'GkD22rnR-DByq6AHz8ys'

#from fredapi import Fred
#fred = Fred(api_key='03a917886981ef1429f92b3cf80a4701')

plt.style.use('dark_background') # use this if plotting in a dark themed notebook

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [0]:
def get_yf_hist(symbol_list,startDate,endDate,interval):
  import matplotlib.pyplot as plt
  import datetime
  import yfinance as yf  

  # Get historical pricing data
  data = yf.download(symbol_list, startDate, endDate, interval)

  return data


def get_intraday(symbol_list,period,interval):
  intraday_list = []
  intraday_df = pd.DataFrame()
  for sym in symbol_list:
    intraday = yf.download(tickers=sym,
                           period=period,
                           interval=interval)
    intraday['symbol'] = sym
    intraday_df = pd.concat([intraday_df,intraday])

  return intraday_df

def rolling_zscore(data,return_period,window_length):
  log_returns = (np.log(data / data.shift(return_period)))
  zscore = (log_returns - log_returns.rolling(window_length).mean() / log_returns.rolling(window_length).std())
  #results_dict = dict({'log_returns':log_returns})
  results_df = pd.DataFrame(zscore)

  return results_df


  # convert series to supervised learning
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    n_vars = 1 if type(data) is list else data.shape[1]
    df = pd.DataFrame(data)
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, ... t+n)
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
    # put it all together
    agg = pd.concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg

    def prophet_format(df):
      df = df.reset_index()
      df = df.rename(columns={'Adj Close':'y',
                              'Date':'ds'})

      return df

In [0]:
class AMTD_API:
  
  def __init__(self,api_key,symbol=None):
    self.api_key = api_key

  def get_option_chain_specific(self,symbol,strike_range,strike_count):
    # strike_range: Returns options for the given range. Possible values are:

    # ITM: In-the-money
    # NTM: Near-the-money
    # OTM: Out-of-the-money
    # SAK: Strikes Above Market
    # SBK: Strikes Below Market
    # SNK: Strikes Near Market
    # ALL: All Strikes

    # Default is ALL.

    # optionType: Specifies the kind of contract to return

    # 'Type of contracts to return. Possible values are:

    # S: Standard contracts
    # NS: Non-standard contracts
    # ALL: All contracts

    # Default is ALL.''

    callback_url = 'http://localhost'

    end_point = "https://api.tdameritrade.com/v1/marketdata/chains"

    payload = {'apikey':self.api_key,
              'symbol': symbol,
              'contractType':'ALL',
              'strikeCount':strike_count,
              'includeQuotes':'TRUE',
              'strategy':'ANALYTICAL',
              'range':strike_range,
              'expMonth':'ALL',
              'optionType':'ALL'}

    content = requests.get(url = end_point,params=payload)
    data = content.json()
  
    call_data_norm = pd.json_normalize(data['callExpDateMap'],max_level=None)
    put_data_norm = pd.json_normalize(data['putExpDateMap'],max_level=None)

    call_data_list = []
    put_data_list = []

    opt_df = pd.DataFrame()
    call_data_df = pd.DataFrame()
    put_data_df = pd.DataFrame()

    for i in call_data_norm.iloc[0]:
      df_i_c = pd.DataFrame(i)
      #opt_df = pd.concat([opt_df,df_i_c])
      call_data_list.append(df_i_c)

    for i in put_data_norm.iloc[0]:
      df_i_p = pd.DataFrame(i)
      #opt_df = pd.concat([opt_df,df_i_p])
      put_data_list.append(df_i_p)

    call_data_df = pd.concat(call_data_list)
    put_data_df = pd.concat(put_data_list)
    opt_df = pd.concat([opt_df,call_data_df,put_data_df])

    # Convert to datetetime from unix
    opt_df['tradeTimeInLong'] = pd.to_datetime(opt_df['tradeTimeInLong'],unit='ms')
    opt_df['quoteTimeInLong'] = pd.to_datetime(opt_df['quoteTimeInLong'],unit='ms')
    opt_df['expirationDate'] = pd.to_datetime(opt_df['expirationDate'],unit='ms')
    opt_df['lastTradingDay'] = pd.to_datetime(opt_df['lastTradingDay'],unit='ms')

    return opt_df


  def get_option_chain_all(self,symbol,strike_range='ALL'):
      # strike_range: Returns options for the given range. Possible values are:

      # ITM: In-the-money
      # NTM: Near-the-money
      # OTM: Out-of-the-money
      # SAK: Strikes Above Market
      # SBK: Strikes Below Market
      # SNK: Strikes Near Market
      # ALL: All Strikes

      # Default is ALL.

      # optionType: Specifies the kind of contract to return

      # 'Type of contracts to return. Possible values are:

      # S: Standard contracts
      # NS: Non-standard contracts
      # ALL: All contracts

      # Default is ALL.''

      callback_url = 'http://localhost'

      end_point = "https://api.tdameritrade.com/v1/marketdata/chains"

      payload = {'apikey':self.api_key,
                'symbol': symbol,
                'contractType':'ALL',
                #'strikeCount':strike_count,
                'includeQuotes':'TRUE',
                'strategy':'ANALYTICAL',
                'range':strike_range,
                'expMonth':'ALL',
                'optionType':'ALL'}

      content = requests.get(url = end_point,params=payload)
      data = content.json()
    
      call_data_norm = pd.json_normalize(data['callExpDateMap'],max_level=None)
      put_data_norm = pd.json_normalize(data['putExpDateMap'],max_level=None)

      call_data_list = []
      put_data_list = []

      opt_df = pd.DataFrame()
      call_data_df = pd.DataFrame()
      put_data_df = pd.DataFrame()

      for i in call_data_norm.iloc[0]:
        df_i_c = pd.DataFrame(i)
        #opt_df = pd.concat([opt_df,df_i_c])
        call_data_list.append(df_i_c)

      for i in put_data_norm.iloc[0]:
        df_i_p = pd.DataFrame(i)
        #opt_df = pd.concat([opt_df,df_i_p])
        put_data_list.append(df_i_p)

      call_data_df = pd.concat(call_data_list)
      put_data_df = pd.concat(put_data_list)
      opt_df = pd.concat([opt_df,call_data_df,put_data_df])

      
      # Convert to datetetime from unix
      opt_df['tradeTimeInLong'] = pd.to_datetime(opt_df['tradeTimeInLong'],unit='ms')
      opt_df['quoteTimeInLong'] = pd.to_datetime(opt_df['quoteTimeInLong'],unit='ms')
      opt_df['expirationDate'] = pd.to_datetime(opt_df['expirationDate'],unit='ms')
      opt_df['lastTradingDay'] = pd.to_datetime(opt_df['lastTradingDay'],unit='ms')

      return opt_df


  def get_symbol_trading_data_bulk(self,symbol_list,f_name,sleep_time,loop_end):

    dir_path = "/content/drive/My Drive/Data Science/Datasets/Ameritrade_trade_data/"
    callback_url = 'http://localhost'
    symbol_data_df = pd.DataFrame()
    df_list = []

    bearer_auth = "UAkdaHk9kPp5HsNkrYq8AOLULhhWDGz6tn33Ni4xhwZuIokiEXnE1Lyqd+z9H99f7vF6N/wVI1820O1oAreMHsAHa7A5SyHtoyHF/2qsRHRpUIjkHS7ZPV9ZvvKwm3tqiCDhM1gxi3ygxbpGo+1BIiMF2WhIcT7a5j+EverGCc9BwQYsbWVbDJexAEz/5gmVaoowvHtJGlnKG/1z5St609h8VjcBSv92UiaCuhFXED/q5w/Tfl5BsUurF8rTYO8WW4snyp1VOkYaXmYpirfIIlDpLupyExn/LZoNZ0eucqM/RplGi47iayNfVGRT7dl+6L+3JJv86RZk74xEPthbTBJKrJ/hBFWUc1iTb5ThXtfpikHyof3/VsxC/tFNlNmwzGmXgFi1/ncRGWNpG1awxNCXfduoReFJHMxQZhOB70DmIQ7ZnpZtXWDJ5Q9ellkSQa38hyWJexixuvvDCYMQn1OdoO8MFQuWBx7EAdYubZYnKlJIiwCyT6fro5RCK/M96SivDPxBM2iYsjpuMgJpKrmkaNr19J2VK3fF3100MQuG4LYrgoVi/JHHvlDLLFGHwFIguTkfIGWmCIKVBZhLI7SWyW0HshDDiXJVcb8JQcR0A8XgqkAEi38FJBcSCVwXeI0o/rSyrict72RuRtxto1r4D+5XNjKXmsLuBVFrBQ/jSRGRuSkcNPUykRJg0rIo/FQq1OxnWfkLZLlGMKJJX5ZPZ70c4onUzPd0+WEccG6Expy2rD33OX6VSSOkhepvZvLhkc8BeS6vdql16Q1MLSu9cRgk6QfpG79bfw6l8QOpSPKDRWxJw/PbyhvTDmIa7xEAi4B3TsfF3eDprq20bEU2Uo0+SoZnYNVHn8mgTqUfJ/pcQrXLmUXPsQwVy1iJ4ndJSmr1ctyb59rK8QPmG6hs5uxnJi4vBrczT8XVl/hLNRWBO6rhb9rv/XTaifq8vyTBlvA4wsDwYeXzJa6bVjevo1zEzJUyHsHiGtuW5+sCSa72aI0c0RPqTDr/wHKQKGYpGGI+W9TZdv+EG3kI94memAOHJSb82FHnf8dnCCLiUUHitbOAuNgfKh1cHluz4y4jjzHHoaJW8RnAJtvfW4PU08StoJoX212FD3x19z9sWBHDJACbC00B75E"
    end_point = f"https://api.tdameritrade.com/v1/marketdata/quotes"

    file_name = f"{f_name}_{dt.datetime.today().month}_{dt.datetime.today().day}_{dt.datetime.today().year}.csv"
    payload = {'apikey':self.api_key,
                'symbol':symbol_list}

    for i in tqdm(range(0,loop_end,1)):
      content = requests.get(url = end_point,params=payload)
      data_2 = content.json()
      df1 = pd.DataFrame.from_dict(data_2,orient='index')
      df_list.append(df1)
      symbol_data_df = pd.concat(df_list,axis=0)
      symbol_data_df.to_csv(dir_path+file_name)
      print(symbol_data_df)
      print(i)
      print((i/loop_end) * 100)
      time.sleep(sleep_time)

    symbol_data_df['tradeTimeInLong'] = pd.to_datetime(symbol_data_df['tradeTimeInLong'],unit='ms')
    symbol_data_df['quoteTimeInLong'] = pd.to_datetime(symbol_data_df['quoteTimeInLong'],unit='ms')
    symbol_data_df['lastTradingDay'] = pd.to_datetime(symbol_data_df['lastTradingDay'],unit='ms')

    return symbol_data_df

  def get_option_chain_trading_bulk(self,symbol_list,f_name,sleep_time,loop_end):

    dir_path = "/content/drive/My Drive/Data Science/Datasets/Ameritrade_trade_data/"
    callback_url = 'http://localhost'
    symbol_data_df = pd.DataFrame()
    df_list = []

    end_point = "https://api.tdameritrade.com/v1/marketdata/chains"

    payload = {'apikey':self.api_key,
              'symbol': symbol,
              'contractType':'ALL',
              #'strikeCount':5,
              'includeQuotes':'TRUE',
              'strategy':'ANALYTICAL',
              'range':'SNK',
              'expMonth':'ALL',
              'optionType':'ALL'}
    

    file_name = f"{f_name}_{dt.datetime.today().month}_{dt.datetime.today().day}_{dt.datetime.today().year}.csv"

    for i in range(0,loop_end,1):
      content = requests.get(url = end_point,params=payload)
      data_2 = content.json()
      df1 = pd.DataFrame.from_dict(data_2,orient='index')
      df_list.append(df1)
      symbol_data_df = pd.concat(df_list,axis=0)
      symbol_data_df.to_csv(dir_path+file_name)
      print(symbol_data_df)
      time.sleep(sleep_time)

    return symbol_data_df


  def get_all_price_data_daily(self,symbol):
    
    callback_url = 'http://localhost'

    end_point = f"https://api.tdameritrade.com/v1/marketdata/{symbol}/pricehistory"

    payload = {'apikey':self.api_key,
              'periodType':'year',
              #'period':20,
              'frequencyType':'daily',
              'frequency':1,
              #'endDate':1586233133000,
              'startDate':-2208938400000}

    content = requests.get(url = end_point,params=payload)
    data = content.json()
    data = data['candles']
    data = pd.DataFrame(data)
    data['datetime'] = pd.to_datetime(data['datetime'],unit='ms')

    return data


def my_describe(df, stats):
  d = df.describe()
  d.loc['IQR'] = d.loc['75%'] - d.loc['25%']
  return d.append(df.reindex(d.columns,axis=1).agg(stats))

In [0]:
# amtd = AMTD_API(api_key='MC5BZKEALDOCSS5BALLBEVCRXMNTJBVZ')

# #symbol_list = ['SPY','CAT','AAPL','GOOGL','NVDA','INTC','AMZN','UNH','GLD','TLT']
# #symbol_list = ['SPY','TLT','IWM','QQQ','GLD','UUP']
# symbol_list = ['AMZN','MSFT','AMD','ADBE','GOOGL','NVDA','INTC','NFLX','NTES','ORCL','AAPL','XLK','TSLA','IBM','BKNG','GILD','ALXN','BIIB','ISRG','SPY']

# hd_list = []
# cp_list = []
# cp_list2 = []
# sl = []

# close_prices_df = pd.DataFrame()

# for i in symbol_list:
#   tsdf = amtd.get_all_price_data_daily(i)
#   tsdf['symbol'] = i
#   tsdf['datetime'] = pd.to_datetime(tsdf['datetime'])
#   tsdf['date'] = tsdf['datetime'].dt.date
#   tsdf.index = tsdf['date']
  

#   close_s = pd.Series(tsdf['close'],index=tsdf.index,name=i)

  
#   hd_list.append(tsdf)
#   cp_list.append(close_s)
#   cp_list2.append(tsdf['close'])
#   sl.append(i)

# d1 = pd.concat(hd_list)
# d1['datetime'] = d1['datetime'].dt.date


# close_df = pd.concat(cp_list,axis=1)
# close_df = close_df.sort_index(ascending=True)


# f"d2_unqs length: {len(d2_unqs)}"
# print()
# f"d2 shape: {d2.shape}"

In [0]:
def prophet_format(df):
  df = df.reset_index()
  df = df.rename(columns={'Adj Close':'y',
                          'Date':'ds'})

  return df

def format_data1(input_option,df1):
  '''
  input_option 1: multi-symbol adj_close prices from yfinance
  input_option 2: single-symbol with fbprohpet used for feature extraction
  '''

  if input_option == 1:
    df2 = df1['Adj Close']
    df2 = df2.dropna(axis=0)
    #df2 = df2.sort_index(ascending=True)
    return df2
  if input_option == 2:
    
    from fbprophet import Prophet

    m = Prophet()

    df2 = df1
    df2 = prophet_format(df2)
    df2 = m.fit(df2)
    return df2
  elif None:
    return print("Please specify an input option")

# df2 = prophet_format(df1)
# df2 = df2.set_index('ds')
# df2.head()

In [0]:
# from fbprophet import Prophet

# m = Prophet(daily_seasonality=True)

# df2 = df1
# df2 = prophet_format(df2)
# df2 = m.fit(df2)
# #df2 = df2.set_index('ds')

# df2 = format_data1(input_option=2,df1=df1)

# #pd.isnull(close_df).sum()
# #df2.shape
# #df2.head()

In [13]:
from ta import add_all_ta_features
from ta.utils import dropna

#dir_path = "/content/drive/My Drive/Data Science/Datasets/"
#file_name = "None"

startDate = '1900-01-01'
endDate = datetime.date.today()

#symbol_list = ['DD','JPM','FO','STT','INA','CI','BK','RRD','GFF','KFT','LO']
#symbol_list = ['SPY','GLD','USO','TLT','IWM','UUP']
#symbol_list = ['SPY','JPM','STT','GS','BAC','C','WFC']
#symbol_list = ['AMZN','MSFT','AMD','ADBE','GOOGL','NVDA','INTC','NFLX','NOW','NTES','ORCL','AAPL','XLK','TSLA','IBM','BKNG','TDOC','GILD','ALXN','BIIB','ISRG']
symbol_list = ['^GSPC']

df1 = yf.download(symbol_list, startDate, endDate, interval='1d')

print(f"df1 shape: {df1.shape}")
#df1.head()

df1 = dropna(df1)

df2 = df1

#df2 = add_all_ta_features(df1,open='Open',high='High',low='Low',close='Adj Close',volume='Volume')

# indicator_bb = ta.volatility.BollingerBands(close=df2['Adj Close'],n=20,ndev=2)

# df2['bb_bbm'] = indicator_bb.bollinger_mavg()
# df2['bb_bbh'] = indicator_bb.bollinger_hband()
# df2['bb_bbl'] = indicator_bb.bollinger_lband()

#df2['expanding_3'] = df1['Adj Close'].expanding(min_periods=3).median()

#indicator_stoch = ta.momentum.StochasticOscillator(high=df1['High'],low=df1['Low'],close=df1['Adj Close'],n=14,d_n=3,fillna=False)
#indicator_atr = ta.volatility.AverageTrueRange(high=df1['High'],low=df1['Low'],close=df1['Adj Close'],n=14,fillna=False)
#df2['stoch_osc'] = indicator_stoch.stoch()
#df2['stoch_sig'] = indicator_stoch.stoch_signal()

#df2['atr'] = indicator_atr.average_true_range()

df2['rmavg_3'] = df1['Adj Close'].rolling(window=3).mean()
df2['rmmed_3'] = df1['Adj Close'].rolling(window=3).median()

df2['rmavg_5'] = df1['Adj Close'].rolling(window=5).mean()
df2['rmmed_5'] = df1['Adj Close'].rolling(window=5).median()

df2['rmavg_8'] = df1['Adj Close'].rolling(window=8).mean()
df2['rmmed_8'] = df1['Adj Close'].rolling(window=8).median()

df2['rmavg_13'] = df1['Adj Close'].rolling(window=13).mean()
df2['rmmed_13'] = df1['Adj Close'].rolling(window=13).median()

df2['rmavg_21'] = df1['Adj Close'].rolling(window=21).mean()
df2['rmmed_21'] = df1['Adj Close'].rolling(window=21).median()

df2['rmavg_28'] = df1['Adj Close'].rolling(window=28).mean()
df2['rmmed_28'] = df1['Adj Close'].rolling(window=28).median()

df2['rmavg_30'] = df1['Adj Close'].rolling(window=30).mean()
df2['rmmed_30'] = df1['Adj Close'].rolling(window=30).median()

#df2['rmax_30'] = df1['Adj Close'].rolling(window=30).max()
#df2['rmin_30'] = df1['Adj Close'].rolling(window=30).min()
#df2['range_avg'] = (df2['rmax_30'] + df2['rmin_30']) / 2

#df2['rmavg_200'] = df1['Adj Close'].rolling(window=200).mean()
#df2['rmavg_252'] = df1['Adj Close'].rolling(window=252).mean()

#df2['rstd_5'] = df1['Adj Close'].rolling(window=5).std()
#df2['lag_1'] = df1['Adj Close'].shift(1)
#df2['lag_3'] = df1['Adj Close'].shift(3)
#df2['lag_5'] = df1['Adj Close'].shift(5)

#indicator_vwap = ta.volume.VolumeWeightedAveragePrice(high=df1['High'],low=df1['Low'],close=df1['Adj Close'],volume=df1['Volume'],n=5)
#df2['VWAP'] = indicator_vwap.volume_weighted_average_price()


df2['median_price'] = df2[['Open','High','Low','Adj Close']].median(axis=1)
df2['mean_price'] = df2[['Open','High','Low','Adj Close']].mean(axis=1)

df2 = df2.drop(['Volume'],axis=1)

# zero_count = df2[df2['Open'] == 0.0]
# df2 = df2.iloc[len(zero_count)+1:]

df2_columns = df2.columns

df2 = df2.dropna(how='any',axis=0)

print(f"df2 cleaned for NaNs shape: {df2.shape}")
df2.head()

print()

include_data_descriptions = False

if include_data_descriptions == True:

  df2_described = my_describe(df2,['var','mad','median','skew','kurtosis']).T
  df2_described
elif include_data_descriptions == False:
  pass

[*********************100%***********************]  1 of 1 completed
df1 shape: (23196, 6)
df2 cleaned for NaNs shape: (17671, 21)


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,rmavg_3,rmmed_3,rmavg_5,rmmed_5,rmavg_8,rmmed_8,rmavg_13,rmmed_13,rmavg_21,rmmed_21,rmavg_28,rmmed_28,rmavg_30,rmmed_30,median_price,mean_price
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1950-02-14,17.059999,17.059999,17.059999,17.059999,17.059999,17.193333,17.24,17.204,17.23,17.2325,17.235,17.117692,17.209999,17.011905,17.02,16.9925,17.0,16.976667,16.955,17.059999,17.059999
1950-02-15,17.059999,17.059999,17.059999,17.059999,17.059999,17.12,17.059999,17.17,17.209999,17.21125,17.235,17.143077,17.209999,17.028095,17.049999,16.997143,17.025001,16.99,17.0,17.059999,17.059999
1950-02-16,16.99,16.99,16.99,16.99,16.99,17.036666,17.059999,17.126,17.059999,17.17375,17.219999,17.156154,17.209999,17.034286,17.049999,16.9975,17.025001,16.994667,17.005,16.99,16.99
1950-02-17,17.15,17.15,17.15,17.15,17.15,17.066666,17.059999,17.1,17.059999,17.1525,17.179999,17.166154,17.209999,17.048571,17.049999,17.0,17.025001,17.002,17.025001,17.15,17.15
1950-02-20,17.200001,17.200001,17.200001,17.200001,17.200001,17.113333,17.15,17.092,17.059999,17.14875,17.175,17.177692,17.209999,17.064286,17.059999,17.006071,17.035,17.009333,17.04,17.200001,17.200001





In [14]:
df2_values = df2.values
df2_values = df2_values.astype('float32')

n_input_slices = 1

df2_fmt = series_to_supervised(df2_values,n_input_slices,1)
df2_fmt3 = series_to_supervised(df2_values,n_input_slices,1)

print(f"df2_values shape: {df2_values.shape}")
print(f"df2_fmt shape: {df2_fmt.shape}")

df2_fmt

plt.style.use('dark_background') # use this if plotting in a dark themed notebook
 
import numpy as np
 
plot_heatmap = False

if plot_heatmap == True:

  corr_data = df2_fmt.diff(1,axis=0).corr()
  mask = np.zeros_like(corr_data, dtype=np.bool)
  #mask[np.triu_indices_from(mask)] = True
  plt.subplots(figsize = (15,12))
  sns.heatmap(corr_data,
              annot=True,
              mask = mask,
              cmap = 'RdBu', ## in order to reverse the bar replace "RdBu" with "RdBu_r"
              linewidths=.9,
              linecolor='gray',
              fmt='.2g',
              center = 0,
              square=True)
  
  plt.title("Correlations Among Features", y = 1.03,fontsize = 20, pad = 40);

elif plot_heatmap == False:
  pass

df2_values shape: (17671, 21)
df2_fmt shape: (17670, 42)


Unnamed: 0,var1(t-1),var2(t-1),var3(t-1),var4(t-1),var5(t-1),var6(t-1),var7(t-1),var8(t-1),var9(t-1),var10(t-1),var11(t-1),var12(t-1),var13(t-1),var14(t-1),var15(t-1),var16(t-1),var17(t-1),var18(t-1),var19(t-1),var20(t-1),var21(t-1),var1(t),var2(t),var3(t),var4(t),var5(t),var6(t),var7(t),var8(t),var9(t),var10(t),var11(t),var12(t),var13(t),var14(t),var15(t),var16(t),var17(t),var18(t),var19(t),var20(t),var21(t)
1,17.059999,17.059999,17.059999,17.059999,17.059999,17.193333,17.240000,17.204000,17.230000,17.232500,17.235001,17.117693,17.209999,17.011904,17.020000,16.992500,17.000000,16.976667,16.955000,17.059999,17.059999,17.059999,17.059999,17.059999,17.059999,17.059999,17.119999,17.059999,17.170000,17.209999,17.211250,17.235001,17.143076,17.209999,17.028095,17.049999,16.997143,17.025002,16.990000,17.000000,17.059999,17.059999
2,17.059999,17.059999,17.059999,17.059999,17.059999,17.119999,17.059999,17.170000,17.209999,17.211250,17.235001,17.143076,17.209999,17.028095,17.049999,16.997143,17.025002,16.990000,17.000000,17.059999,17.059999,16.990000,16.990000,16.990000,16.990000,16.990000,17.036667,17.059999,17.125999,17.059999,17.173750,17.219999,17.156153,17.209999,17.034286,17.049999,16.997499,17.025002,16.994667,17.005001,16.990000,16.990000
3,16.990000,16.990000,16.990000,16.990000,16.990000,17.036667,17.059999,17.125999,17.059999,17.173750,17.219999,17.156153,17.209999,17.034286,17.049999,16.997499,17.025002,16.994667,17.005001,16.990000,16.990000,17.150000,17.150000,17.150000,17.150000,17.150000,17.066666,17.059999,17.100000,17.059999,17.152500,17.180000,17.166153,17.209999,17.048571,17.049999,17.000000,17.025002,17.002001,17.025002,17.150000,17.150000
4,17.150000,17.150000,17.150000,17.150000,17.150000,17.066666,17.059999,17.100000,17.059999,17.152500,17.180000,17.166153,17.209999,17.048571,17.049999,17.000000,17.025002,17.002001,17.025002,17.150000,17.150000,17.200001,17.200001,17.200001,17.200001,17.200001,17.113333,17.150000,17.091999,17.059999,17.148750,17.174999,17.177692,17.209999,17.064285,17.059999,17.006071,17.035000,17.009333,17.040001,17.200001,17.200001
5,17.200001,17.200001,17.200001,17.200001,17.200001,17.113333,17.150000,17.091999,17.059999,17.148750,17.174999,17.177692,17.209999,17.064285,17.059999,17.006071,17.035000,17.009333,17.040001,17.200001,17.200001,17.170000,17.170000,17.170000,17.170000,17.170000,17.173334,17.170000,17.114000,17.150000,17.143749,17.160000,17.186922,17.209999,17.077143,17.059999,17.008928,17.035000,17.012333,17.040001,17.170000,17.170000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17666,2918.459961,2954.860107,2912.159912,2939.510010,2939.510010,2893.793213,2878.479980,2863.184082,2863.389893,2834.368652,2829.949951,2826.162354,2823.159912,2746.363770,2789.820068,2676.322754,2743.270020,2658.150879,2700.120117,2928.984863,2931.247559,2930.909912,2930.909912,2892.469971,2912.429932,2912.429932,2905.109863,2912.429932,2886.109863,2878.479980,2845.527588,2850.064941,2837.762207,2836.739990,2761.975342,2797.800049,2698.019531,2755.804932,2675.295410,2743.270020,2921.669922,2916.679932
17667,2930.909912,2930.909912,2892.469971,2912.429932,2912.429932,2905.109863,2912.429932,2886.109863,2878.479980,2845.527588,2850.064941,2837.762207,2836.739990,2761.975342,2797.800049,2698.019531,2755.804932,2675.295410,2743.270020,2921.669922,2916.679932,2869.090088,2869.090088,2821.610107,2830.709961,2830.709961,2894.216553,2912.429932,2884.904053,2878.479980,2857.296143,2850.064941,2836.581543,2830.709961,2779.128174,2799.310059,2719.209229,2772.495117,2689.339355,2755.804932,2849.899902,2847.625000
17668,2869.090088,2869.090088,2821.610107,2830.709961,2830.709961,2894.216553,2912.429932,2884.904053,2878.479980,2857.296143,2850.064941,2836.581543,2830.709961,2779.128174,2799.310059,2719.209229,2772.495117,2689.339355,2755.804932,2849.899902,2847.625000,2815.010010,2844.239990,2797.850098,2842.739990,2842.739990,2861.959961,2842.739990,2877.755859,2863.389893,2862.725098,2853.064941,2841.149170,2836.739990,2794.167969,2799.550049,2733.331055,2786.590088,2707.266602,2772.495117,2828.875000,2824.959961
17669,2815.010010,2844.239990,2797.850098,2842.739990,2842.739990,2861.959961,2842.739990,2877.755859,2863.389893,2862.725098,2853.064941,2841.149170,2836.739990,2794.167969,2799.550049,2733.331055,2786.590088,2707.266602,2772.495117,2828.875000,2824.959961,2868.879883,2898.229980,2863.550049,2868.439941,2868.439941,2847.296631,2842.739990,2878.765869,2868.439941,2871.554932,2865.915039,2846.448486,2842.739990,2812.253418,2823.159912,2747.362549,2793.810059,2728.301270,2786.590088,2868.659912,2874.774902


In [15]:
#df2_fmt2 = df2_fmt.iloc[:,:len(df2.columns)+1]
#df2_fmt3 = df2_fmt2

#df2_fmt2 = df2_fmt.iloc[:,:-len(df2.columns)+1]
#df2_fmt2['var1(t)'] = df2_fmt3['var1(t)']

target_variable = 'var20(t)'
df2_fmt2 = df2_fmt.iloc[:,:-len(df2.columns)]
df2_fmt2[target_variable] = df2_fmt3[target_variable]


print(f"df2_fmt2 shape: {df2_fmt2.shape}")
df2_fmt2.head()

df2_fmt2 shape: (17670, 22)




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,var1(t-1),var2(t-1),var3(t-1),var4(t-1),var5(t-1),var6(t-1),var7(t-1),var8(t-1),var9(t-1),var10(t-1),var11(t-1),var12(t-1),var13(t-1),var14(t-1),var15(t-1),var16(t-1),var17(t-1),var18(t-1),var19(t-1),var20(t-1),var21(t-1),var20(t)
1,17.059999,17.059999,17.059999,17.059999,17.059999,17.193333,17.24,17.204,17.23,17.2325,17.235001,17.117693,17.209999,17.011904,17.02,16.9925,17.0,16.976667,16.955,17.059999,17.059999,17.059999
2,17.059999,17.059999,17.059999,17.059999,17.059999,17.119999,17.059999,17.17,17.209999,17.21125,17.235001,17.143076,17.209999,17.028095,17.049999,16.997143,17.025002,16.99,17.0,17.059999,17.059999,16.99
3,16.99,16.99,16.99,16.99,16.99,17.036667,17.059999,17.125999,17.059999,17.17375,17.219999,17.156153,17.209999,17.034286,17.049999,16.997499,17.025002,16.994667,17.005001,16.99,16.99,17.15
4,17.15,17.15,17.15,17.15,17.15,17.066666,17.059999,17.1,17.059999,17.1525,17.18,17.166153,17.209999,17.048571,17.049999,17.0,17.025002,17.002001,17.025002,17.15,17.15,17.200001
5,17.200001,17.200001,17.200001,17.200001,17.200001,17.113333,17.15,17.091999,17.059999,17.14875,17.174999,17.177692,17.209999,17.064285,17.059999,17.006071,17.035,17.009333,17.040001,17.200001,17.200001,17.17


In [0]:
from sklearn.preprocessing import MinMaxScaler

#scaler = MinMaxScaler(feature_range=(0,1))


#scaled = scaler.fit_transform(df2_values)
#df2_reframed = series_to_supervised(scaled,n_input_slices,1)
#df2_reframed3 = series_to_supervised(scaled,n_input_slices,1)


#target_variable = 'var7(t)'

#df2_reframed = df2_reframed.iloc[:,:-len(df2.columns)]
#df2_reframed[target_variable] = df2_reframed3[target_variable]

#df2_reframed.shape
#df2_reframed.head()

In [54]:
scale_train = MinMaxScaler(feature_range=(0,1))
scale_test = MinMaxScaler(feature_range=(0,1))
X_test_scaler = MinMaxScaler(feature_range=(0,1))
y_test_scaler = MinMaxScaler(feature_range=(0,1))

#values = df2_reframed.values
values = df2_fmt2.values

train_split = round(int(len(values)*0.60))
test_split = (len(values) - train_split)

train = values[:train_split]
test = values[train_split:]

train = scale_train.fit_transform(train)
test = scale_test.fit_transform(test)


X_train,y_train = train[:,:-1],train[:,-1]
X_test,y_test = test[:,:-1],test[:,-1]

X_test_scaler = X_test_scaler.fit(X_test)

print(f"test shape: {test.shape}")
print(f"train shape: {train.shape}")
print(f"X_train shape: {X_train.shape}",f"y_train shape: {y_train.shape}", f"X_test shape: {X_test.shape}",f"y_test shape: {y_test.shape}")

test shape: (7068, 22)
train shape: (10602, 22)
X_train shape: (10602, 21) y_train shape: (10602,) X_test shape: (7068, 21) y_test shape: (7068,)


In [55]:
# reshape input to be 3D [samples, timesteps, features]
X_train = X_train.reshape(X_train.shape[0],1,X_train.shape[1])
X_test = X_test.reshape(X_test.shape[0],1,X_test.shape[1])

print(f"X_train reshaped shape: {X_train.shape}")
print(f"X_test reshaped shape: {X_test.shape}")

X_train reshaped shape: (10602, 1, 21)
X_test reshaped shape: (7068, 1, 21)


In [0]:
# predicted = model.predict(X_test)
# X_test_rs = X_test.reshape(X_test.shape[0],X_test.shape[2])
# predicted = np.concatenate((predicted,X_test_rs[:,1:]),axis=1)

# predicted_df = pd.DataFrame(predicted)

# y_test = y_test.reshape(len(y_test),1)
# predicted_df['y_test'] = y_test

# predicted_df_cols = predicted_df.columns

# y_test = np.concatenate((y_test,X_test_rs[:,1:]),axis=1)
# y_test_df = pd.DataFrame(y_test)

# print(f"y_test shape: {y_test.shape}")
# print(f"X_test shape: {X_test.shape}")
# print(f"X_test_rs shape: {X_test_rs.shape}")
# print(f"predicted shape: {predicted.shape}")
# print(f"y_test_df shape: {y_test_df.shape}")


# #predicted_df = scale_test.inverse_transform(predicted_df)

In [0]:
# y_test = X_test_scaler.inverse_transform(y_test)

In [0]:
# y_test

In [0]:
early_stopping = EarlyStopping(monitor='val_loss',mode='auto',patience=25,restore_best_weights=True)

univ_dropout = 0.20
univ_recurrent_dropout = 0.20

model = Sequential()
model.add(LSTM(128,
               return_sequences=True,
               input_shape=(X_train.shape[1],X_train.shape[2]),
               dropout=univ_dropout,
               recurrent_dropout=univ_recurrent_dropout,
               activation='tanh'))
model.add(Conv1D(filters=64,kernel_size=10,padding='same',activation='relu'))
model.add(MaxPooling1D(pool_size=1))

model.add(LSTM(units=64,return_sequences=True,dropout=univ_dropout,recurrent_dropout=univ_recurrent_dropout))

model.add(Conv1D(filters=32,kernel_size=10,padding='same',activation='relu'))
model.add(MaxPooling1D(pool_size=1))

#model.add(LSTM(units=64,return_sequences=True,dropout=univ_dropout,recurrent_dropout=univ_recurrent_dropout))

#model.add(LSTM(units=64,return_sequences=True,dropout=univ_dropout,recurrent_dropout=univ_recurrent_dropout))

#model.add(LSTM(units=700,return_sequences=True,dropout=univ_dropout,recurrent_dropout=univ_recurrent_dropout))

#model.add(LSTM(units=600,return_sequences=True,dropout=univ_dropout,recurrent_dropout=univ_recurrent_dropout))
#model.add(LSTM(units=500,return_sequences=True,dropout=univ_dropout,recurrent_dropout=univ_recurrent_dropout))
#model.add(LSTM(units=400,return_sequences=True,dropout=univ_dropout,recurrent_dropout=univ_recurrent_dropout))
#model.add(LSTM(units=300,return_sequences=True,dropout=univ_dropout,recurrent_dropout=univ_recurrent_dropout))
#model.add(LSTM(units=200,return_sequences=True,dropout=univ_dropout,recurrent_dropout=univ_recurrent_dropout))
#model.add(LSTM(units=100,return_sequences=True,dropout=univ_dropout,recurrent_dropout=univ_recurrent_dropout))

model.add(LSTM(32,return_sequences=False,dropout=univ_dropout,recurrent_dropout=univ_recurrent_dropout))

# model.add(LSTM(units=10,
#                dropout=univ_dropout,
#                recurrent_dropout=univ_recurrent_dropout,
#                activation='tanh'))

model.add(Dense(1,activation='linear'))

model.compile(loss="mean_squared_error",optimizer="adam")

In [27]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_4 (LSTM)                (None, 1, 128)            76800     
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 1, 64)             81984     
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 1, 64)             0         
_________________________________________________________________
lstm_5 (LSTM)                (None, 1, 64)             33024     
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 1, 32)             20512     
_________________________________________________________________
max_pooling1d_4 (MaxPooling1 (None, 1, 32)             0         
_________________________________________________________________
lstm_6 (LSTM)                (None, 32)               

In [28]:
score = model.evaluate(X_test,y_test,verbose=1,batch_size=len(y_test),callbacks=[early_stopping],use_multiprocessing=True)
score



0.15375617146492004

In [29]:
history = model.fit(X_train,y_train,validation_data=(X_test,y_test),epochs=250,verbose=2,callbacks=[early_stopping],shuffle=False,use_multiprocessing=True)

Train on 10602 samples, validate on 7068 samples
Epoch 1/1000
 - 3s - loss: 0.0034 - val_loss: 0.2705
Epoch 2/1000
 - 2s - loss: 0.0242 - val_loss: 0.2242
Epoch 3/1000
 - 2s - loss: 0.0126 - val_loss: 0.1683
Epoch 4/1000
 - 2s - loss: 0.0054 - val_loss: 0.1865
Epoch 5/1000
 - 2s - loss: 0.0053 - val_loss: 0.2165
Epoch 6/1000
 - 2s - loss: 0.0075 - val_loss: 0.1319
Epoch 7/1000
 - 2s - loss: 0.0038 - val_loss: 0.2009
Epoch 8/1000
 - 2s - loss: 0.0049 - val_loss: 0.5045
Epoch 9/1000
 - 2s - loss: 0.0278 - val_loss: 0.3080
Epoch 10/1000
 - 2s - loss: 0.0254 - val_loss: 0.2489
Epoch 11/1000
 - 2s - loss: 0.0069 - val_loss: 0.1782
Epoch 12/1000
 - 2s - loss: 0.0062 - val_loss: 0.1617
Epoch 13/1000
 - 2s - loss: 0.0061 - val_loss: 0.1424
Epoch 14/1000
 - 2s - loss: 0.0057 - val_loss: 0.1253
Epoch 15/1000
 - 2s - loss: 0.0056 - val_loss: 0.1807
Epoch 16/1000
 - 2s - loss: 0.0062 - val_loss: 0.1116
Epoch 17/1000
 - 2s - loss: 0.0056 - val_loss: 0.1661
Epoch 18/1000
 - 2s - loss: 0.0038 - val_l

In [30]:
trace_0 = go.Scatter(y=history.history['loss'],
                     name='train',
                     mode='lines',
                     marker= {'color':'#FF6F61','opacity':0.9})

trace_1 = go.Scatter(y=history.history['val_loss'],
                     name='test',
                     mode='lines',
                     marker= {'color':'#1d5dec','opacity':0.9})

data = [trace_0,trace_1]

layout = go.Layout(title="train/test logarithmic loss comparison",
                  template='plotly_dark',
                   height=600,
                   width=1000)

fig = go.Figure(data,layout=layout)
fig.show()
#py.iplot(fig)

In [31]:
weights_1 = model.get_weights()

print(f"weights1 list of arrays length: {len(weights_1)}")
print(f"Length of the first array in the list: {len(weights_1[0])}")

for i in weights_1:
  print(i.shape)
#weights_df1 = pd.DataFrame(weights_1[0])
#weights_df1

weights1 list of arrays length: 15
Length of the first array in the list: 21
(21, 512)
(128, 512)
(512,)
(10, 128, 64)
(64,)
(64, 256)
(64, 256)
(256,)
(10, 64, 32)
(32,)
(32, 128)
(32, 128)
(128,)
(32, 1)
(1,)


In [0]:
# predicted = model.predict(X_test)
# X_test_rs = X_test.reshape(X_test.shape[0],X_test.shape[2])
# predicted = np.concatenate((predicted,X_test_rs[:,1:]),axis=1)

# predicted_df = pd.DataFrame(predicted)

# y_test = y_test.reshape(len(y_test),1)
# predicted_df['y_test'] = y_test

# predicted_df_cols = predicted_df.columns

# y_test = np.concatenate((y_test,X_test_rs[:,1:]),axis=1)
# y_test_df = pd.DataFrame(y_test)

# print(f"y_test shape: {y_test.shape}")
# print(f"X_test shape: {X_test.shape}")
# print(f"X_test_rs shape: {X_test_rs.shape}")
# print(f"predicted shape: {predicted.shape}")
# print(f"y_test_df shape: {y_test_df.shape}")


# #predicted_df = scale_test.inverse_transform(predicted_df)

In [0]:
temp_scaler = MinMaxScaler(feature_range=(0,1))
temp_scaler = temp_scaler.fit(df2_values)

In [57]:
predicted = model.predict(X_test)
X_test_rs = X_test.reshape(X_test.shape[0],X_test.shape[2])
predicted = np.concatenate((predicted,X_test_rs[:,1:]),axis=1)
print(f"predicted shape: {predicted.shape}")

#predicted = scaler.inverse_transform(predicted)
predicted = temp_scaler.inverse_transform(predicted)
predicted_df = pd.DataFrame(predicted)

y_test = y_test.reshape(len(y_test),1)
print(f"y_test shape: {y_test.shape}")

predicted shape: (7068, 21)
y_test shape: (7068, 1)


In [59]:
#y_test = np.concatenate((y_test,X_test_rs[:,1:]),axis=1)
#y_test = scaler.inverse_transform(y_test)
y_test = temp_scaler.inverse_transform(y_test)

y_test_df = pd.DataFrame(y_test)

print(f"y_test_df shape: {y_test_df.shape}")

y_test_df shape: (7068, 21)


In [60]:
y_test_df_labeled = y_test_df
y_test_df_labeled.columns = df2_columns

y_test_df_labeled.head()

Unnamed: 0,Open,High,Low,Close,Adj Close,rmavg_3,rmmed_3,rmavg_5,rmmed_5,rmavg_8,rmmed_8,rmavg_13,rmmed_13,rmavg_21,rmmed_21,rmavg_28,rmmed_28,rmavg_30,rmmed_30,median_price,mean_price
0,25.282196,20.518826,24.685068,22.459059,22.459059,18.297897,18.982821,16.834,16.870001,17.055,16.965,17.117693,17.209999,17.011904,17.02,16.9925,17.0,16.994068,16.955,20.710896,20.611809
1,31.035788,29.302935,27.14292,29.581408,29.581408,22.721884,21.004545,20.069607,20.990246,18.733566,19.025824,17.511684,17.209999,17.33559,17.02,17.230467,17.0,16.976667,16.955,25.289406,25.712238
2,33.094685,32.035248,34.257313,33.972134,33.972134,27.236248,28.131603,24.987226,23.010834,20.81308,21.306873,18.623079,17.424841,17.697063,17.02,17.709597,17.356934,17.239271,16.955,31.047823,31.055426
3,29.659431,32.035248,35.396034,33.701256,33.701256,30.986296,32.254192,28.464344,30.133802,22.288349,21.583521,19.757103,18.035465,18.068787,17.02,18.147934,17.634537,17.59931,16.955,33.108444,32.370747
4,26.382168,31.775553,29.228781,27.086872,27.086872,30.154144,32.254192,29.794209,30.133802,23.999409,24.909136,20.321505,18.103264,18.089281,17.02,18.28109,17.634537,17.816196,17.232653,29.670311,29.042898


In [0]:
#predicted = X_test_scaler.inverse_transform(predicted)
#y_test = X_test_scaler.inverse_transform(y_test)

In [61]:
#y_test = scaler.inverse_transform(y_test)
print(f"y_test_df shape: {y_test_df.shape}")
#y_test_df_labeled = y_test_df
#y_test_df_labeled.columns = df2_columns
#y_test_df_labeled.head()

y_test_df shape: (7068, 21)


In [62]:
np.sqrt(mean_squared_error(y_test[:,0],predicted[:,0]))

145.91615

In [63]:
results = pd.concat([pd.Series(predicted[:,0]),pd.Series(y_test[:,0])],axis=1)
results.columns = ['theta_hat','theta']
results['diff'] = results['theta_hat'] - results['theta']
results

Unnamed: 0,theta_hat,theta,diff
0,229.387802,25.282196,204.105606
1,229.387802,31.035788,198.352020
2,229.387802,33.094685,196.293121
3,229.387802,29.659431,199.728363
4,229.387802,26.382168,203.005630
...,...,...,...
7063,2641.939697,2859.694336,-217.754639
7064,2646.988037,2778.726807,-131.738770
7065,2629.156250,2755.007568,-125.851318
7066,2621.056396,2799.891113,-178.834717


In [64]:
symbol_name = symbol_list[0]

trace_0 = go.Scatter(y=results['theta_hat'],
                     name='predicted',
                     mode='lines',
                     marker= {'color':'#FF6F61','opacity':0.9})

trace_1 = go.Scatter(y=results['theta'],
                     name=symbol_name,
                     mode='lines',
                     marker= {'color':'#1d5dec','opacity':0.9})

data = [trace_0,trace_1]

layout = go.Layout(title=f"LSTM model predictions vs. actual price for the {symbol_name} share price",
                  template='plotly_dark',
                   height=700,
                   width=1400)

fig = go.Figure(data,layout=layout)
fig.show()
#py.iplot(fig)

In [65]:
symbol_name = symbol_list[0]

trace_0 = go.Scatter(y=results['diff'],
                     name='difference',
                     mode='lines',
                     marker= {'color':'#FF6F61','opacity':0.9})

# trace_1 = go.Scatter(y=results['theta'],
#                      name=symbol_name,
#                      mode='lines',
#                      marker= {'color':'#1d5dec','opacity':0.9})

data = [trace_0]

layout = go.Layout(title=f"Difference between LSTM model predictions and the adjusted close price of {symbol_name}",
                  template='plotly_dark',
                   height=700,
                   width=1400)

fig = go.Figure(data,layout=layout)
fig.show()

In [0]:
%%capture
!pip install h5py

In [0]:
folder_path = "/content/drive/My Drive/Data Science/Projects/LSTM Time Series Prediction/LSTM_multi_input_stocks_05062020_1"
model.save(folder_path)

In [69]:
from keras.models import model_from_json

# Serialize model to json
model_json = model.to_json()
with open("model_05072020_spx500.json","w") as json_file:
  json_file.write(model_json)

#Serialize weights to HDF5
model.save_weights("model_05072020_spx500.h5")

4857