# Previous Day Baseline Model for Stock Price Prediction
This notebook loads daily stock closing prices and uses the previous day's price as the prediction (a simple naive baseline). It calculates RMSE and plots the actual vs predicted values with trend.

In [1]:
from sklearn.metrics import root_mean_squared_error, mean_absolute_error
from b_feature_engineering import df_prelag, df_clolag

#### Model 1

In [2]:
# Import feature engineered data set with closing price lags for this analysis
df_eval2 = df_clolag.copy()
df_eval2.head(1)

Unnamed: 0,date,closing_price,clolag_1,clolag_2,clolag_3,clolag_4,clolag_5
0,2023-01-30,109.76,109.85,108.83,108.05,108.61,108.66


In [4]:
# In this first base line prediction, the day before closing stock price used as the prediction.
# Root mean squired error as rmse and Mean absolute error as mae were calculated as the error metrics.
rmse = root_mean_squared_error(df_eval2['closing_price'], df_eval2['clolag_1'])
print(f'RMSE using previous day baseline: {rmse:.6f}')

mae = mean_absolute_error(df_eval2['closing_price'], df_eval2['clolag_1'])
print(f' MAE using previous day baseline: {mae:.6f}')

RMSE using previous day baseline: 1.615091
 MAE using previous day baseline: 1.156646


#### Model 2

In [24]:
# In this second base line prediction, the previous five days average used as the prediction.
# Root mean squired error as rmse and Mean absolute error as mae were calculated as the error metrics.

df_eval2['avg'] = df_eval2[['clolag_1', 'clolag_2', 'clolag_3', 'clolag_4', 'clolag_5']].mean(axis=1)
df_eval2.head(1)

Unnamed: 0,date,closing_price,clolag_1,clolag_2,clolag_3,clolag_4,clolag_5,avg
0,2023-01-30,109.76,109.85,108.83,108.05,108.61,108.66,108.8


In [26]:
rmse = root_mean_squared_error(df_eval2['closing_price'], df_eval2['avg'])
print(f'RMSE five day average: {rmse:.6f}')

mae = mean_absolute_error(df_eval2['closing_price'], df_eval2['avg'])
print(f' MAE five day average: {mae:.6f}')

RMSE five day average: 2.423932
 MAE five day average: 1.789370


#### Model 3

In [27]:
# Import feature engineered data set with price difference lags for this analysis
df_eval3 = df_prelag.copy()
df_eval3.head(1)

Unnamed: 0,date,closing_price,clolag_1,price_diff,prilag_1,prilag_2,prilag_3,prilag_4,prilag_5
0,2023-02-06,110.75,111.15,-0.4,1.42,-0.42,0.09,0.3,-0.09


In [29]:
# In this base line prediction, the difference between previous two days added to the previous day value.
# Root mean squired error as rmse and Mean absolute error as mae were calculated as the error metrics.

df_eval3['predict'] = df_eval3['clolag_1'] + df_eval3['prilag_1']
df_eval3.head(1)

Unnamed: 0,date,closing_price,clolag_1,price_diff,prilag_1,prilag_2,prilag_3,prilag_4,prilag_5,predict
0,2023-02-06,110.75,111.15,-0.4,1.42,-0.42,0.09,0.3,-0.09,112.57


In [30]:
rmse = root_mean_squared_error(df_eval3['closing_price'], df_eval3['predict'])
print(f'RMSE using previous two days difference: {rmse:.6f}')

mae = mean_absolute_error(df_eval3['closing_price'], df_eval3['predict'])
print(f' MAE using previous two days difference: {mae:.6f}')

RMSE using previous two days difference: 2.251344
 MAE using previous two days difference: 1.625349


From above three analysis the previous day stock price as the prediction gave the smallest error in both metrices. 
Therefore, model 1 selected as the baseline model. 