# Exploratory Data Analysis - Ford Stock Prices
This is the fourth EDA of our stocks selection. We already studied Amazon, Apple and the BTC-USD pair. In this case, we want to analyze the Ford Stocks, a classic stock of american markets.

In the last 18 months, Ford price is growing faster, from $4 to $20, which is an amazing performance in the markets.

So let's start to analyze this stock, and of course, our fisrt movement is to import our libraries:

In [1]:
import pandas as pd
from fbprophet import Prophet
from fbprophet.plot import plot_plotly, plot_components_plotly
import plotly.express as px
import os, sys
path = os.getcwd()
path = os.path.dirname(path)
sys.path.append(path)
from train import train, save_model
import datetime as dt
from datetime import timedelta
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

Looking the data, DataFrame and price evolution:

In [2]:
# Loading Ford data
ford = pd.read_csv('../data/ford.csv')
ford

Unnamed: 0,ds,y
0,2001-11-26,18.580000
1,2001-11-27,18.290001
2,2001-11-28,17.910000
3,2001-11-29,18.260000
4,2001-11-30,18.940001
...,...,...
5032,2021-11-19,19.389999
5033,2021-11-22,20.480000
5034,2021-11-23,20.200001
5035,2021-11-24,20.250000


In [3]:
px.line(ford, x='ds', y='y')

The Ford price is like a *sin or cosine* (ondulatory) function over the time: it starts to fall and when is almost touching the floor, goes up to the previous maximum, and starts to fall again.

This stock will be very hard to predict for sure, because there is not a clearly trend or pattern to follow by the model.

Let's start making predictions with all the data and with half of it, and let's see what can we got.

In [4]:
# Goal: predict 2021 year
ford['ds'] = pd.to_datetime(ford['ds'])
X_test = ford[ford['ds'].dt.year == 2021][['ds']]
X_test

Unnamed: 0,ds
4809,2021-01-04
4810,2021-01-05
4811,2021-01-06
4812,2021-01-07
4813,2021-01-08
...,...
5032,2021-11-19
5033,2021-11-22
5034,2021-11-23
5035,2021-11-24


## Case 1. Training the model with all the Data (2001-2020)

In [5]:
# Full data 2001-2020
X_train_full_data = ford[ford['ds'].dt.year != 2021]
X_train_full_data

Unnamed: 0,ds,y
0,2001-11-26,18.580000
1,2001-11-27,18.290001
2,2001-11-28,17.910000
3,2001-11-29,18.260000
4,2001-11-30,18.940001
...,...,...
4804,2020-12-24,8.860000
4805,2020-12-28,8.890000
4806,2020-12-29,8.820000
4807,2020-12-30,8.860000


In [6]:
# Predictions
model = Prophet()
model.fit(X_train_full_data)
forecast = model.predict(X_test)

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


In [7]:
fig = plot_plotly(model, forecast, xlabel='Date', ylabel='Price')
fig.update_layout(title='FORD stock 2021 Predictions - Model trained with Full Data')
fig.show()

In [8]:
# Validating predictions
val = forecast.merge(ford, on='ds', how='right')
val = val[['ds', 'yhat', 'y']]
val.columns = ['Date', 'Predicted Price', 'True Price']
val = val[val.Date.dt.year == 2021]
fig = px.scatter(val, x=val.Date, y=val.columns[1:],
                title='FORD stock 2021 Predictions - Validation')
fig.update_traces(marker_size=5)
fig.show()

In [9]:
# Scores
def scores(y_true, y_pred):
    print('MAE:', mean_absolute_error(y_true, y_pred))
    print('RMSE', np.sqrt(mean_squared_error(y_true, y_pred)))

y_true = ford[ford.ds.dt.year == 2021]['y']
y_pred =  forecast['yhat']
scores(y_true, y_pred)

MAE: 7.504724923584742
RMSE 8.001831384532519


In [10]:
print('Mean Ford Price in 2021: $', round(val['True Price'].mean(), 2))
print(f'Score: ', 1 - 7.5 / val['True Price'].mean())

Mean Ford Price in 2021: $ 13.61
Score:  0.4488546806936764


We've got a poor score of 0.44, and as we said in the beggining of the notebook, this particular stock will be very complicated to predict, because is changing up and down forever.

Let's try to train with few data.

## Case 2. Training with 10 years of data (2010-2020)

In [11]:
# Training data - 10 years
X_train_last_ten = ford[(ford.ds.dt.year >= 2010) & (ford.ds.dt.year <=2020)]
X_train_last_ten

Unnamed: 0,ds,y
2040,2010-01-04,10.28
2041,2010-01-05,10.96
2042,2010-01-06,11.37
2043,2010-01-07,11.66
2044,2010-01-08,11.69
...,...,...
4804,2020-12-24,8.86
4805,2020-12-28,8.89
4806,2020-12-29,8.82
4807,2020-12-30,8.86


In [12]:
# Predictions
model = Prophet()
model.fit(X_train_last_ten)
forecast2 = model.predict(X_test)
fig = plot_plotly(model, forecast2, xlabel='Date', ylabel='Price')
fig.update_layout(title='FORD stock 2021 Predictions - Model trained with last 10 years of Data')
fig.show()

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


In [13]:
# Validating predictions
val2 = forecast2.merge(ford, on='ds', how='right')
val2 = val2[['ds', 'yhat', 'y']]
val2 = val2[val2.ds.dt.year == 2021]
fig = px.scatter(val2, x=val2.ds, y=val2.columns[1:],
                title='FORD stock 2021 Predictions - Validation')
fig.update_traces(marker_size=5)
fig.show()

In [14]:
y_true = ford[ford.ds.dt.year == 2021]['y']
y_pred =  forecast2['yhat']
scores(y_true, y_pred)

MAE: 7.655413886016749
RMSE 8.129118100161572


In [15]:
print('Mean Ford Price in 2021: $', round(val['True Price'].mean(), 2))
print(f'Score: ', 1 - 7.65 / val['True Price'].mean())

Mean Ford Price in 2021: $ 13.61
Score:  0.4378317743075498


Again, our predictions are very far from the real price.

We can try to improve the predictions training with few data, but it's clear that we will not able to make proper predictions over this stock.

## Case 3. Training with 5 years of data (2015-2020)

In [16]:
# Training data - 5 years
X_train_last_five = ford[(ford.ds.dt.year >= 2015) & (ford.ds.dt.year <=2020)]
X_train_last_five

Unnamed: 0,ds,y
3298,2015-01-02,15.36
3299,2015-01-05,14.76
3300,2015-01-06,14.62
3301,2015-01-07,15.04
3302,2015-01-08,15.42
...,...,...
4804,2020-12-24,8.86
4805,2020-12-28,8.89
4806,2020-12-29,8.82
4807,2020-12-30,8.86


In [17]:
# Predictions
model = Prophet()
model.fit(X_train_last_five)
forecast3 = model.predict(X_test)
fig = plot_plotly(model, forecast3, xlabel='Date', ylabel='Price')
fig.update_layout(title='FORD stock 2021 Predictions - Model trained with last 5 years of Data')
fig.show()

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


In [18]:
# Validating predictions
val3 = forecast3.merge(ford, on='ds', how='right')
val3 = val3[['ds', 'yhat', 'y']]
val3 = val3[val3.ds.dt.year == 2021]
fig = px.scatter(val3, x=val3.ds, y=val3.columns[1:],
                title='FORD stock 2021 Predictions - Validation')
fig.update_traces(marker_size=5)
fig.show()

In [19]:
# Scores
y_true = ford[ford.ds.dt.year == 2021]['y']
y_pred =  forecast3['yhat']
scores(y_true, y_pred)

MAE: 8.081780084725905
RMSE 8.475016717324351


In [20]:
print('Mean FORD Price in 2021: $', round(val['True Price'].mean(), 2))
print(f'Score: ', 1 - 8.08 / val['True Price'].mean())

Mean FORD Price in 2021: $ 13.61
Score:  0.406232776000654


Our score is worse each time we try to predict the Ford price.

Maybe we can try selecting only the last year, but this don't indicate that our predictions are accurate, just indicates that the model finds a pattern to follow and its able to stay near the price; so we decide to stop here and conclude that we can't predict Ford stocks properly.

## Results
Let's plot the results:

In [21]:
results = pd.DataFrame(
    {'MAE': [7.5, 7.65, 8.08], 
    'RMSE': [8.00, 8.12, 8.47], 
    'Train Data': ['All the data', 'Last 10 years', 'Last 5 years']})

In [22]:
px.bar(results, x='Train Data', y=['MAE', 'RMSE'], barmode='group', 
        title='Train MAE: All Data vs Last 5 and 10 Years (Less is Better)')

In [23]:
val['Last 5 Years'] = val2['yhat']
val['Last 10 Years'] = val3['yhat']
val = val.rename(columns={'Predicted Price': 'All the Data',  'y': 'True Price', 'ds': 'Date'})
px.line(val, x='Date', y=val.columns[1:], title='Ford Stock Predictions: Train with all the Data vs Train with Last 5 and 10 years')

## Training a model to make predictions in Ford Stocks
We can try to make predictions with only the last two years of data:

In [24]:
model = train('ford', '../data/ford.csv', False, True, 504)

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


In [25]:
# Making future predictions with the model: one year

# 1. Creating the forecast Dates
X_test_future = []
end = dt.datetime.strptime('2022-12-31', '%Y-%m-%d').date()
start = dt.datetime.strptime('2021-11-20', '%Y-%m-%d').date()

for i in range((end-start).days):
    X_test_future += [(start+timedelta(i)).strftime('%Y-%m-%d')]

X_test_future = pd.DataFrame(X_test_future)
X_test_future.columns = ['ds']
X_test_future

Unnamed: 0,ds
0,2021-11-20
1,2021-11-21
2,2021-11-22
3,2021-11-23
4,2021-11-24
...,...
401,2022-12-26
402,2022-12-27
403,2022-12-28
404,2022-12-29


In [26]:
# 2. Making predictions: 1 year
forecast = model.predict(X_test_future)
fig = plot_plotly(model, forecast, xlabel='Date', ylabel='Price')
fig.update_layout(title='Ford Stocks - One Year Forecasting')
fig.show()

## Final Step: Save the model
The pattern showed in the last forecast (the 1 year blind predictions) have lot of sense, because the model is following the created trend in the last two years.

So we want to train the model with the last two years of data to make predictions with it.

In [27]:
# Saving the model
save_model('../models', model, 'ford')

Model Succesfully Saved in: 
../models/ford.json


## Conclussions
This notebooks shows how difficult is try to make predictions over stocks.

Ford stock is constantly changing, and **the model is not able to follow the price training with lots of data**. The model just follow the trend and **can predict the future value of the price only if the price keeps following the current trend**.

In the next notebook, we are trying to predict Microsoft Stocks.