# Exploratory Data Analysis - Amazon Stock Prices and Predictions with FB Prophet
Let's explore Amazon stocks to test the accuracy of FB Prophet to forecast stock prices.

Make predictions over the Stock Market is a hard, challenging task since the stock prices depends of lot of factors: social changes, economics, politic decisions and even social networks can dramatically impact the value of a particular stock, or even the entire market.

Taking this in consideration, we are trying to make the most possible accurate predictions.

We can't expect make good predictions at all. Our goal will be follow the price over the future in a reasonable way, and if we can achieve this, we will be happy with our work.

Let's import the necessary libraries:

In [1]:
import pandas as pd
from fbprophet import Prophet
from fbprophet.plot import plot_plotly, plot_components_plotly
import plotly.express as px
import os, sys
path = os.getcwd()
path = os.path.dirname(path)
sys.path.append(path)
from train import train, save_model
import datetime as dt
from datetime import timedelta
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

Once we used our functions in the *utils* module to transform our data to make predictions with FB Prophet, we can read the data and start to work with it. In this example we are using the **Amazon stock Price data:**

In [2]:
# Loading Amazon data
amazon = pd.read_csv('../data/amazon.csv')
amazon

Unnamed: 0,ds,y
0,2001-11-26,12.210000
1,2001-11-27,11.480000
2,2001-11-28,11.590000
3,2001-11-29,11.150000
4,2001-11-30,11.320000
...,...,...
5032,2021-11-19,3676.570068
5033,2021-11-22,3572.570068
5034,2021-11-23,3580.040039
5035,2021-11-24,3580.409912


We can see that we have lot of available data to train our model (20 years exactly, to be precise). This can be good news, because in machine learning tasks, the most data we have to train, the more accurate the model will be. Let's take a look at the price over this period:

In [3]:
px.line(amazon, x='ds', y='y')

The first fifteen years the Amazon stock price was almost flat. From 2015, the price raises dramatically, multiplying their value more than ten times. We can see a spectacular raise at the start of COVID-19 pandemic; past a few months, the price behaviour becomes more stable, slowly raising over time.

Let's predict the 2021 year in this two different escenarios:
- First, we train our model with all the data (combination of flat and violent price changes)
- After, we train the model with the last 5 years (the years where the price raises faster)

In [4]:
# We want to make predictions over 2021
amazon['ds'] = pd.to_datetime(amazon['ds'])
X_test = amazon[amazon['ds'].dt.year == 2021][['ds']]
X_test

Unnamed: 0,ds
4809,2021-01-04
4810,2021-01-05
4811,2021-01-06
4812,2021-01-07
4813,2021-01-08
...,...
5032,2021-11-19
5033,2021-11-22
5034,2021-11-23
5035,2021-11-24


## Training the model with all the data - Except Target

In [5]:
# Training with the full data - excluding the target
X_train_full_data = amazon[amazon['ds'].dt.year != 2021]
X_train_full_data

Unnamed: 0,ds,y
0,2001-11-26,12.210000
1,2001-11-27,11.480000
2,2001-11-28,11.590000
3,2001-11-29,11.150000
4,2001-11-30,11.320000
...,...,...
4804,2020-12-24,3172.689941
4805,2020-12-28,3283.959961
4806,2020-12-29,3322.000000
4807,2020-12-30,3285.850098


In [6]:
# Predictions
model = Prophet()
model.fit(X_train_full_data)
forecast = model.predict(X_test)

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


In [7]:
fig = plot_plotly(model, forecast, xlabel='Date', ylabel='Price')
fig.update_layout(title='AMAZON stock 2021 Predictions - Model trained with Full Data')
fig.show()

In [8]:
# Validating predictions
val = forecast.merge(amazon, on='ds', how='right')
val = val[['ds', 'yhat', 'y']]
val.columns = ['Date', 'Predicted Price', 'True Price']
val = val[val.Date.dt.year == 2021]
fig = px.scatter(val, x=val.Date, y=val.columns[1:],
                title='AMAZON stock 2021 Predictions - Validation')
fig.update_traces(marker_size=5)
fig.show()

Let's see the forecast components:

In [9]:
plot_components_plotly(model, forecast)

Our predictions are not bad at all. Let's see the scores:

In [10]:
def scores(y_true, y_pred):
    print('MAE:', mean_absolute_error(y_true, y_pred))
    print('RMSE', np.sqrt(mean_squared_error(y_true, y_pred)))

In [11]:
y_true = amazon[amazon.ds.dt.year == 2021]['y']
y_pred =  forecast['yhat']
scores(y_true, y_pred)

MAE: 266.8391958587969
RMSE 301.3748112322291


## Training the model with the last five years of data

In [12]:
# Data to train
X_train_last_five = amazon[(amazon.ds.dt.year >= 2015) & (amazon.ds.dt.year <=2020)]
X_train_last_five

Unnamed: 0,ds,y
3298,2015-01-02,308.519989
3299,2015-01-05,302.190002
3300,2015-01-06,295.290009
3301,2015-01-07,298.420013
3302,2015-01-08,300.459991
...,...,...
4804,2020-12-24,3172.689941
4805,2020-12-28,3283.959961
4806,2020-12-29,3322.000000
4807,2020-12-30,3285.850098


In [13]:
# Predictions
model = Prophet()
model.fit(X_train_last_five)
forecast2 = model.predict(X_test)
fig = plot_plotly(model, forecast2, xlabel='Date', ylabel='Price')
fig.update_layout(title='AMAZON stock 2021 Predictions - Model trained with last 5 years of Data')
fig.show()

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


In [14]:
# Validating predictions
val2 = forecast2.merge(amazon, on='ds', how='right')
val2 = val2[['ds', 'yhat', 'y']]
val2 = val2[val2.ds.dt.year == 2021]
fig = px.scatter(val2, x=val2.ds, y=val2.columns[1:],
                title='AMAZON stock 2021 Predictions - Validation')
fig.update_traces(marker_size=5)
fig.show()

In [15]:
plot_components_plotly(model, forecast2)

In this case, the model were trained with a period of violent price raises, and it predicts that the price will raise like that forever. The scores here will be bad:

In [16]:
y_true = amazon[amazon.ds.dt.year == 2021]['y']
y_pred =  forecast2['yhat']
scores(y_true, y_pred)

MAE: 796.7691041859233
RMSE 885.0852387899911


## Results
As we can see, the predictions are much better training the model with all the available data (MAE: 266.83, RMSE 301.37), than with the last five years (MAE: 796.76, RMSE: 885.08).

In [17]:
results = pd.DataFrame(
    {'MAE': [266.83, 796.76], 
    'RMSE': [301.37, 885.08], 
    'Train Data': ['All the data', 'Last 5 years']})

In [18]:
px.bar(results, x='Train Data', y=['MAE', 'RMSE'], barmode='group', 
        title='Train Scores: All Data vs Last 5 Years (Less is Better)')

In [19]:
val['Last 5 Years'] = val2['yhat']
val = val.rename(columns={'yhat': 'All the Data',  'y': 'True Price', 'ds': 'Date'})
px.line(val, x='Date', y=val.columns[1:], title='Amazon Stock Predictions: Train with all the Data vs Train with Last 5 years')

## Training a model to make future predictions in Amazon Stocks
Now that we know that the model predictions are better training with all the data, we can train a model to forecast the future Amazon stock prices.

We want to train a model with our *train* module, save the model, then load the trained model and make future predictions with it:

In [20]:
# Training the model to predict the future
model = train('amazon', '../data/amazon.csv', False, True)

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


In [21]:
# Making future predictions with the model: two years

# 1. Creating the forecast Dates
X_test_future = []
end = dt.datetime.strptime('2023-12-31', '%Y-%m-%d').date()
start = dt.datetime.strptime('2021-11-20', '%Y-%m-%d').date()

for i in range((end-start).days):
    X_test_future += [(start+timedelta(i)).strftime('%Y-%m-%d')]

X_test_future = pd.DataFrame(X_test_future)
X_test_future.columns = ['ds']
X_test_future

Unnamed: 0,ds
0,2021-11-20
1,2021-11-21
2,2021-11-22
3,2021-11-23
4,2021-11-24
...,...
766,2023-12-26
767,2023-12-27
768,2023-12-28
769,2023-12-29


In [22]:
# 2. Making predictions: 2 years
forecast = model.predict(X_test_future)
fig = plot_plotly(model, forecast, xlabel='Date', ylabel='Price')
fig.update_layout(title='Amazon Stocks - Two Years Forecasting')
fig.show()

## Final Step: Saving the model
Finally, we can save our model with our *save_model* function from *train* module:

In [23]:
# Saving the model
save_model('../models', model, 'amazon')

Model Succesfully Saved in: 
../models/amazon.json


## Conclussions
In this case, **the model performed better training with all the available data**, and it is able to smooth their predictions. The model can ignore the big price raise of the previous years, being conservative in the predictions, and following the real price very well.

With a mean price of about $3300-$3400 (year 2021), **our 266 MAE represents an 8% of error in the predictions, which is equivalent to 92% accuracy, so our model performs really well** predicting Amazon Stock Prices.

In the next Notebooks we will analyze the remaining stocks: Apple, BTC-USD, Ford, Microsoft and Tesla.