<center> <font size = 6> Apple stock forecast with LTSM </font><size>

- [Using previous data to forecast Apple's closing price on 9/16/2020.](#9/16/2020)
- [Including Apple's close on 9/16/2020 to forecast the stock closing prices for the next five days.](#forecast)

<a id ="9/16/2020"> </a>
### I. Using previous data to forecast Apple's closing price on 9/16/2020.

In [17]:
from math import sqrt
from numpy import split
from numpy import array
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import LSTM
from keras.layers import RepeatVector
from keras.layers import TimeDistributed
import pandas as pd

In [18]:
# split a univariate dataset into train/test sets
def split_dataset(data):
	# split into standard weeks
	train, test = data[0:-85], data[-85:]
	# restructure into windows of weekly data
	train = array(split(train, len(train)/5))
	test = array(split(test, len(test)/5))
	return train, test

In [19]:
def evaluate_forecasts(actual, predicted):
	scores = list()
	# calculate an RMSE score for each day
	for i in range(actual.shape[1]):
		# calculate mse
		mse = mean_squared_error(actual[:, i], predicted[:, i])
		# calculate rmse
		rmse = sqrt(mse)
		# store
		scores.append(rmse)
	# calculate overall RMSE
	s = 0
	for row in range(actual.shape[0]):
		for col in range(actual.shape[1]):
			s += (actual[row, col] - predicted[row, col])**2
	score = sqrt(s / (actual.shape[0] * actual.shape[1]))
	return score, scores

In [20]:
def summarize_scores(name, score, scores):
	s_scores = ', '.join(['%.1f' % s for s in scores])
	print('%s: [%.3f] %s' % (name, score, s_scores))
    
def to_supervised(train, n_input, n_out=5):
	# flatten data
	data = train.reshape((train.shape[0]*train.shape[1], train.shape[2]))
	X, y = list(), list()
	in_start = 0
	# step over the entire history one time step at a time
	for _ in range(len(data)):
		# define the end of the input sequence
		in_end = in_start + n_input
		out_end = in_end + n_out
		# ensure we have enough data for this instance
		if out_end <= len(data):
			X.append(data[in_start:in_end, :])
			y.append(data[in_end:out_end, 0])
		# move along one time step
		in_start += 1
	return array(X), array(y)

In [21]:
def build_model(train, n_input):
	# prepare data
	train_x, train_y = to_supervised(train, n_input)
	# define parameters
	verbose, epochs, batch_size = 0, 50, 16
	n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
	# reshape output into [samples, timesteps, features]
	train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
	# define model
	model = Sequential()
	model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
	model.add(RepeatVector(n_outputs))
	model.add(LSTM(200, activation='relu', return_sequences=True))
	model.add(TimeDistributed(Dense(100, activation='relu')))
	model.add(TimeDistributed(Dense(1)))
	model.compile(loss='mse', optimizer='adam')
	# fit network
	model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
	return model

In [22]:
def forecast(model, history, n_input):
	# flatten data
	data = array(history)
	data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))
	# retrieve last observations for input data
	input_x = data[-n_input:, :]
	# reshape into [1, n_input, n]
	input_x = input_x.reshape((1, input_x.shape[0], input_x.shape[1]))
	# forecast the next week
	yhat = model.predict(input_x, verbose=0)
	# we only want the vector forecast
	yhat = yhat[0]
	return yhat

In [23]:
def evaluate_model(train, test, n_input):
	# fit model
	model = build_model(train, n_input)
	# history is a list of weekly data
	history = [x for x in train]
	# walk-forward validation over each week
	predictions = list()
	for i in range(len(test)):
		# predict the week
		yhat_sequence = forecast(model, history, n_input)
		# store the predictions
		predictions.append(yhat_sequence)
		# get real observation and add to history for predicting the next week
		history.append(test[i, :])
	# evaluate predictions days for each week
	predictions = array(predictions)
	score, scores = evaluate_forecasts(test[:, :, 0], predictions)
	return score, scores

In [25]:
df=pd.read_csv("AAPL.csv")
dataset=df[["Close"]]
train, test = split_dataset(dataset.values)
n_input = 10
history=[x for x in test]
model=build_model(train, n_input)
yhat=forecast(model,history,n_input)
yhat

array([[112.10172],
       [109.49966],
       [106.66553],
       [106.25573],
       [107.43674]], dtype=float32)

In [28]:
score, scores = evaluate_model(train, test, n_input)
scores

[4.547840016064624,
 5.6832387773499935,
 5.910516801746006,
 6.461462928317763,
 7.1019631404807155]

In [31]:
forecast=pd.DataFrame(yhat).rename(columns={0:"Forecast"})
scores=pd.DataFrame(scores).rename(columns={0:"RMSE"})

In [32]:
import datetime

def workdays(d, end, excluded=(6, 7)):
    days = []
    while d.date() <= end.date():
        if d.isoweekday() not in excluded:
            days.append(d)
        d += datetime.timedelta(days=1)
    return days

workdays=workdays(datetime.datetime(2020, 9, 14),
               datetime.datetime(2020, 9, 18 ))
workdays=pd.DataFrame(workdays).rename(columns={0:"Date"})

In [35]:
Apple=pd.concat([workdays,forecast,scores],axis=1)
Apple

Unnamed: 0,Date,Forecast,RMSE
0,2020-09-14,112.101723,4.54784
1,2020-09-15,109.499657,5.683239
2,2020-09-16,106.665527,5.910517
3,2020-09-17,106.25573,6.461463
4,2020-09-18,107.436737,7.101963


The model forecast that Apple's closing prices for 9/17 and 9/18 are 106.255730 and 107.436737. The model above used data up to 9/11/2020. Now that we know Apple's closing price for 9/16/2020, let's include that and see if that would impact the model's forecast for 9/17 and 9/18. For that, we need to train and run the model again.

<a id= "forecast"> </a>
### II. Including Apple's close on 9/16/2020 to forecast the stock closing prices for the next five days.

In [54]:
# split a univariate dataset into train/test sets
def split_dataset(data):
	# split into standard weeks
	train, test = data[0:-80], data[-80:]
	# restructure into windows of weekly data
	train = array(split(train, len(train)/4))
	test = array(split(test, len(test)/2))
	return train, test

In [55]:
def evaluate_forecasts(actual, predicted):
	scores = list()
	# calculate an RMSE score for each day
	for i in range(actual.shape[1]):
		# calculate mse
		mse = mean_squared_error(actual[:, i], predicted[:, i])
		# calculate rmse
		rmse = sqrt(mse)
		# store
		scores.append(rmse)
	# calculate overall RMSE
	s = 0
	for row in range(actual.shape[0]):
		for col in range(actual.shape[1]):
			s += (actual[row, col] - predicted[row, col])**2
	score = sqrt(s / (actual.shape[0] * actual.shape[1]))
	return score, scores

In [56]:
def summarize_scores(name, score, scores):
	s_scores = ', '.join(['%.1f' % s for s in scores])
	print('%s: [%.3f] %s' % (name, score, s_scores))
    
def to_supervised(train, n_input, n_out=5):
	# flatten data
	data = train.reshape((train.shape[0]*train.shape[1], train.shape[2]))
	X, y = list(), list()
	in_start = 0
	# step over the entire history one time step at a time
	for _ in range(len(data)):
		# define the end of the input sequence
		in_end = in_start + n_input
		out_end = in_end + n_out
		# ensure we have enough data for this instance
		if out_end <= len(data):
			X.append(data[in_start:in_end, :])
			y.append(data[in_end:out_end, 0])
		# move along one time step
		in_start += 1
	return array(X), array(y)

In [57]:
def build_model(train, n_input):
	# prepare data
	train_x, train_y = to_supervised(train, n_input)
	# define parameters
	verbose, epochs, batch_size = 0, 50, 16
	n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
	# reshape output into [samples, timesteps, features]
	train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
	# define model
	model = Sequential()
	model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
	model.add(RepeatVector(n_outputs))
	model.add(LSTM(200, activation='relu', return_sequences=True))
	model.add(TimeDistributed(Dense(100, activation='relu')))
	model.add(TimeDistributed(Dense(1)))
	model.compile(loss='mse', optimizer='adam')
	# fit network
	model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
	return model

In [58]:
def forecast(model, history, n_input):
	# flatten data
	data = array(history)
	data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))
	# retrieve last observations for input data
	input_x = data[-n_input:, :]
	# reshape into [1, n_input, n]
	input_x = input_x.reshape((1, input_x.shape[0], input_x.shape[1]))
	# forecast the next week
	yhat = model.predict(input_x, verbose=0)
	# we only want the vector forecast
	yhat = yhat[0]
	return yhat

In [59]:
def evaluate_model(train, test, n_input):
	# fit model
	model = build_model(train, n_input)
	# history is a list of weekly data
	history = [x for x in train]
	# walk-forward validation over each week
	predictions = list()
	for i in range(len(test)):
		# predict the week
		yhat_sequence = forecast(model, history, n_input)
		# store the predictions
		predictions.append(yhat_sequence)
		# get real observation and add to history for predicting the next week
		history.append(test[i, :])
	# evaluate predictions days for each week
	predictions = array(predictions)
	score, scores = evaluate_forecasts(test[:, :, 0], predictions)
	return score, scores

In [60]:
df=pd.read_csv("AAPL (updated).csv")
dataset=df[["Close"]]
train, test = split_dataset(dataset.values)
n_input = 6
history=[x for x in test]
model=build_model(train, n_input)
yhat=forecast(model,history,n_input)
yhat

array([[112.28015 ],
       [110.680214],
       [108.2564  ],
       [109.275246],
       [110.04598 ]], dtype=float32)

In [62]:
forecast=pd.DataFrame(yhat).rename(columns={0:"Forecast"})

In [65]:
import datetime

def workdays(d, end, excluded=(6, 7)):
    days = []
    while d.date() <= end.date():
        if d.isoweekday() not in excluded:
            days.append(d)
        d += datetime.timedelta(days=1)
    return days

workdays=workdays(datetime.datetime(2020, 9, 17),
               datetime.datetime(2020, 9, 23))
workdays=pd.DataFrame(workdays).rename(columns={0:"Date"})

In [66]:
Apple=pd.concat([workdays,forecast],axis=1)
Apple

Unnamed: 0,Date,Forecast
0,2020-09-17,112.280151
1,2020-09-18,110.680214
2,2020-09-21,108.256401
3,2020-09-22,109.275246
4,2020-09-23,110.045982


With updated data, the model did revise its forecast for Apple's closing prices for 9/17 and 9/18. Given Apple's close today and in the recent past, the model forecast that Apple will close at 112.18 on 9/17 but will slide to 110.68 on 9/18. Apple will continue the slide on 9/21 and 9/23 and will climb again on 9/23, but even that, Apple will still just close at 110.045982 on 9/23. 