In this report, we found that there is no best model across all industries when doing long term 126 day predictions of stock prices. In other words, there is no general model, and different industries require different optimal models. This is not unexpected because stock price data is diverse and situational, and it is unlikely that any single model will be uniformly best across all industries or contexts. Based on our results, ARIMA GARCH methods are better for Consumer Discretionary and Financial industries, and LSTM models are better for Healthcare and Industrials. Specifically, we found Cumulative Year (CumYr) ARIMA GARCH performs best for the Consumer Discretionary industry, year by year (YrByYr) ARIMA GARCH performs best for Financials, YrByYr multivariate LSTM performs best for Healthcare, and YrByYr univariate LSTM performs best for Industrials. Overall, the LSTM models with YrByYr under multivariate condition perform better than LSTM models under other conditions.
After finding that there was no general model, we still attempted to find a potential model that might apply all the industries. So we transformed our data and fitted a classification model. We converted our data into a series of -1,0, and 1 depending on whether prices increased or decreased by a certain amount. We then ran an LSTM on this transformed data set predicting 126 new time points. This method did not work well, with a low accuracy rate around 40% for classification, as model predicted almost exclusively 0s. Thus, we concluded that LSTM is not fit for this situation and stock price data may generally be resistant to this kind of transformation.
Finally, we ran LSTM and ARIMA GARCH models predicting 20 days of stock prices. We found that ARIMA GARCH always outperformed LSTM models, which is reasonable as time series tend to predict well on the first several time points but predicts worse as more time points are added.