This project is a timeseries-forecasting of MSFT ( microsoft ) daily closing stock prices using the fbprophet model as recommended by the professor. It uses more than 3 decades of weekday data to predict the next 7 days of stock prices, however some values are missing in the dataset. It uses concepts taught in class -- especially cross-validation for hyperparameter-tuning.

Although brief, it took great effort since I had to learn python and how to use Jupyter notebooks and the fbprophet API. I thank Dr. Venugopal and Ryan Wickman for helping me when I was stuck somewhere. It has been a very rewarding challenge for me since I had to spend many hours reading documentation and forums, which taught me a lot about the practicalities of ML and forecasting.

---

In [1]:
import numpy as np
from data_loader import *
import pandas as pd
from prophet import Prophet
import matplotlib.pyplot as pyplot
from sklearn.metrics import mean_absolute_error
from prophet.diagnostics import cross_validation
import itertools
from prophet.diagnostics import performance_metrics

The first thing we do is to import the data. Then, we change its shape to the format accepted by the fbprophet model.

In [2]:
df_original = pd.read_csv("data/MSFT.csv")
df_original['Date'] = pd.to_datetime(df_original.Date, format='%Y-%m-%d')
print(df_original.head())

        Date      Open      High       Low     Close  Adj Close      Volume
0 1986-03-13  0.088542  0.101563  0.088542  0.097222   0.061751  1031788800
1 1986-03-14  0.097222  0.102431  0.097222  0.100694   0.063956   308160000
2 1986-03-17  0.100694  0.103299  0.100694  0.102431   0.065059   133171200
3 1986-03-18  0.102431  0.103299  0.098958  0.099826   0.063405    67766400
4 1986-03-19  0.099826  0.100694  0.097222  0.098090   0.062302    47894400


We then split it into training and test sets. Here, the training set is mainly used for hyperparameter tuning.

In [3]:
prophet_df = df_original[['Date','Adj Close']]
prophet_df = prophet_df.rename(columns = {'Date':'ds'})
prophet_df = prophet_df.rename(columns = {'Adj Close':'y'})
print(prophet_df.head())

train_mask = (prophet_df["ds"] < "2019-01-01")
test_mask = (prophet_df["ds"] >= "2019-01-01") & (prophet_df["ds"] < "2020-01-01")

train_set = prophet_df.loc[train_mask]
test_set = prophet_df.loc[test_mask]


          ds         y
0 1986-03-13  0.061751
1 1986-03-14  0.063956
2 1986-03-17  0.065059
3 1986-03-18  0.063405
4 1986-03-19  0.062302


In the cells below, we first tune two parameters and then the third. Tuning them altogether would take too long since the third parameter is a different type that needs to be tuned over very large timeseries. So we make a reasonable assumption that its optimal value is independent of the values of the other two.

In [5]:
param_grid = {  
    'changepoint_prior_scale': [0.001, 0.01, 0.1, 0.5],
    'seasonality_prior_scale': [0.01, 0.1, 1.0, 10.0],
}

# Generate all combinations of parameters
all_params = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())]
mapes = []  # Store the MAPEs for each params here

# Use cross validation to evaluate all parameters
for params in all_params:
    m = Prophet(**params).fit(train_set)  # Fit model with given params
    df_cv = cross_validation(m, initial='11251 days', period='90 days', horizon='7 days', parallel="processes")
    df_p = performance_metrics(df_cv, rolling_window=1)
    mapes.append(df_p['mape'].values[0])

# Find the best parameters
tuning_results = pd.DataFrame(all_params)
tuning_results['mape'] = mapes
print(tuning_results)

best_params = all_params[np.argmin(mapes)]
print(best_params)


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:prophet:Making 9 forecasts with cutoffs between 2017-01-03 00:00:00 and 2018-12-24 00:00:00
INFO:prophet:Applying in parallel with <concurrent.futures.process.ProcessPoolExecutor object at 0x7f5bdcf04880>
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:prophet:Making 9 forecasts with cutoffs between 2017-01-03 00:00:00 and 2018-12-24 00:00:00
INFO:prophet:Applying in parallel with <concurrent.futures.process.ProcessPoolExecutor object at 0x7f5b8590aa00>
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:prophet:Making 9 forecasts with cutoffs between 2017-01-03 00:00:00 and 2018-12-24 00:00:00
INFO:prophet:Applying in parallel with <concurrent.futures.process.ProcessPoolExecutor object at 0x7f5ba6a6b850>
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seaso

In [6]:
param_grid = {  
    'changepoint_prior_scale': [0.5], 
    'seasonality_prior_scale': [0.01],
    'changepoint_range': [0.8, 0.85, 0.9, 0.95],
}

# Generate all combinations of parameters
all_params = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())]
mapes = []  # Store the MAPEs for each params here

# Use cross validation to evaluate all parameters
for params in all_params:
    m = Prophet(**params).fit(train_set)  # Fit model with given params
    df_cv = cross_validation(m, initial='7000 days', period='120 days', horizon='7 days', parallel="processes")
    df_p = performance_metrics(df_cv, rolling_window=1)
    mapes.append(df_p['mape'].values[0])

# Find the best parameters
tuning_results = pd.DataFrame(all_params)
tuning_results['mape'] = mapes
print(tuning_results)

best_params = all_params[np.argmin(mapes)]
print(best_params)

INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:prophet:Making 42 forecasts with cutoffs between 2005-07-05 00:00:00 and 2018-12-24 00:00:00
INFO:prophet:Applying in parallel with <concurrent.futures.process.ProcessPoolExecutor object at 0x7f5bc537ecd0>
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:prophet:Making 42 forecasts with cutoffs between 2005-07-05 00:00:00 and 2018-12-24 00:00:00
INFO:prophet:Applying in parallel with <concurrent.futures.process.ProcessPoolExecutor object at 0x7f5bc53f6340>
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:prophet:Making 42 forecasts with cutoffs between 2005-07-05 00:00:00 and 2018-12-24 00:00:00
INFO:prophet:Applying in parallel with <concurrent.futures.process.ProcessPoolExecutor object at 0x7f5bdcf04250>
INFO:prophet:Disabling daily seasonality. Run prophet with daily_se

On these tuned hyperparameter values, we test the fbprophet using the test set of data.

In [11]:
m = Prophet(seasonality_mode='multiplicative',changepoint_prior_scale=0.5,seasonality_prior_scale=10,changepoint_range=0.95)
m.fit(test_set)


INFO:prophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


<prophet.forecaster.Prophet at 0x7fc2fd629eb0>

In [12]:
df_cv_test = cross_validation(m, period='14 days', horizon = '7 days')
df_p = performance_metrics(df_cv)
df_p.tail()

INFO:prophet:Making 24 forecasts with cutoffs between 2019-02-05 00:00:00 and 2019-12-24 00:00:00


  0%|          | 0/24 [00:00<?, ?it/s]

INFO:prophet:n_changepoints greater than number of observations. Using 21.


Unnamed: 0,horizon,mse,rmse,mae,mape,mdape,smape,coverage
2,3 days,11.69091,3.419197,2.231122,0.071165,0.043069,0.069873,0.558824
3,4 days,11.490131,3.38971,2.222637,0.079886,0.045179,0.078976,0.514286
4,5 days,6.753642,2.598777,2.123525,0.082913,0.052493,0.083744,0.441176
5,6 days,9.008732,3.001455,2.388808,0.075657,0.053617,0.078399,0.424242
6,7 days,14.622048,3.823879,2.68103,0.07826,0.058324,0.080723,0.457143


We see that the MAPE is <0.1 which is excellent.
MAPE = Mean Average Percentage Error
This is an error metric that is considered very useful/apt for timeseries forecasting.