# Exploratory Data Analysis - Tesla Stocks
This is our last EDA Notebook, and we are going to study the Tesla Stock Price.

Tesla is a relatively new stock in the markets. The last two years Tesla is raising very fast, multiplying their value lot of times.

Let's check this with a plot, but first we must import our libraries:

In [1]:
import pandas as pd
from fbprophet import Prophet
from fbprophet.plot import plot_plotly, plot_components_plotly
import plotly.express as px
import os, sys
path = os.getcwd()
path = os.path.dirname(path)
sys.path.append(path)
from train import train, save_model
import datetime as dt
from datetime import timedelta
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

In [2]:
# Loading Tesla Data
tesla = pd.read_csv('../data/tesla.csv')
tesla

Unnamed: 0,ds,y
0,2010-06-29,4.778000
1,2010-06-30,4.766000
2,2010-07-01,4.392000
3,2010-07-02,3.840000
4,2010-07-06,3.222000
...,...,...
2870,2021-11-19,1137.060059
2871,2021-11-22,1156.869995
2872,2021-11-23,1109.030029
2873,2021-11-24,1116.000000


In [3]:
px.line(tesla, x='ds', y='y')

The pattern is similar to Amazon, Apple and Microsoft, but in this case, 80% of the time was **really** flat, and the final raise was very strong, from $50 to $1200 in less than two years.

Let's predict 2021, a really challenging forecasting task.

In [5]:
# We want to predict 2021 year
tesla['ds'] = pd.to_datetime(tesla['ds'])
X_test = tesla[tesla['ds'].dt.year == 2021][['ds']]
X_test

Unnamed: 0,ds
2647,2021-01-04
2648,2021-01-05
2649,2021-01-06
2650,2021-01-07
2651,2021-01-08
...,...
2870,2021-11-19
2871,2021-11-22
2872,2021-11-23
2873,2021-11-24


## Case 1. Training the model with all the data (2010-2020)

In [6]:
# Full data 2010-2020
X_train_full_data = tesla[tesla['ds'].dt.year != 2021]
X_train_full_data

Unnamed: 0,ds,y
0,2010-06-29,4.778000
1,2010-06-30,4.766000
2,2010-07-01,4.392000
3,2010-07-02,3.840000
4,2010-07-06,3.222000
...,...,...
2642,2020-12-24,661.770020
2643,2020-12-28,663.690002
2644,2020-12-29,665.989990
2645,2020-12-30,694.780029


In [7]:
# Predictions
model = Prophet()
model.fit(X_train_full_data)
forecast = model.predict(X_test)

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


In [8]:
fig = plot_plotly(model, forecast, xlabel='Date', ylabel='Price')
fig.update_layout(title='TESLA stock 2021 Predictions - Model trained with Full Data')
fig.show()

In [9]:
# Validating predictions
val = forecast.merge(tesla, on='ds', how='right')
val = val[['ds', 'yhat', 'y']]
val.columns = ['Date', 'Predicted Price', 'True Price']
val = val[val.Date.dt.year == 2021]
fig = px.scatter(val, x=val.Date, y=val.columns[1:],
                title='TESLA stock 2021 Predictions - Validation')
fig.update_traces(marker_size=5)
fig.show()

In [10]:
# Forecast Components
plot_components_plotly(model, forecast)

In [11]:
# Scores
def scores(y_true, y_pred):
    print('MAE:', mean_absolute_error(y_true, y_pred))
    print('RMSE', np.sqrt(mean_squared_error(y_true, y_pred)))

y_true = tesla[tesla.ds.dt.year == 2021]['y']
y_pred =  forecast['yhat']
scores(y_true, y_pred)

MAE: 299.54024501039817
RMSE 326.64569251772207


In [12]:
print('Mean Tesla Price in 2021: $', round(val['True Price'].mean(), 2))
print(f'Score: ', 1 - 299.54 / val['True Price'].mean())

Mean Tesla Price in 2021: $ 753.8
Score:  0.602626469222818


Our predictions are not accurate at all, because the model was trained with 8 years of flat data, and only the last year the price starts to raise. That's why it can't follow the price in a proper way. However, taking this in consideration, our predictions was not bad at all. 

We can improve for sure, training with less data:

## Case 2. Training with 5 years of data (2015-2020)

In [13]:
# Training data - 5 years
X_train_last_five = tesla[(tesla.ds.dt.year >= 2015) & (tesla.ds.dt.year <=2020)]
X_train_last_five

Unnamed: 0,ds,y
1136,2015-01-02,43.862000
1137,2015-01-05,42.018002
1138,2015-01-06,42.256001
1139,2015-01-07,42.189999
1140,2015-01-08,42.124001
...,...,...
2642,2020-12-24,661.770020
2643,2020-12-28,663.690002
2644,2020-12-29,665.989990
2645,2020-12-30,694.780029


In [20]:
# Predictions
model = Prophet()
model.fit(X_train_last_five)
forecast2 = model.predict(X_test)
fig = plot_plotly(model, forecast2, xlabel='Date', ylabel='Price')
fig.update_layout(title='TESLA stock 2021 Predictions - Model trained with last 5 years of Data')
fig.show()

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


In [21]:
# Validating predictions
val3 = forecast2.merge(tesla, on='ds', how='right')
val3 = val3[['ds', 'yhat', 'y']]
val3 = val3[val3.ds.dt.year == 2021]
fig = px.scatter(val3, x=val3.ds, y=val3.columns[1:],
                title='TESLA stock 2021 Predictions - Validation')
fig.update_traces(marker_size=5)
fig.show()

In [22]:
plot_components_plotly(model, forecast2)

In [23]:
# Scores
y_true = tesla[tesla.ds.dt.year == 2021]['y']
y_pred =  forecast2['yhat']
scores(y_true, y_pred)

MAE: 118.84902399924631
RMSE 149.45996488545765


In [18]:
print('Mean Tesla Price in 2021: $', round(val['True Price'].mean(), 2))
print(f'Score: ', 1 - 118.84 / val['True Price'].mean())

Mean Tesla Price in 2021: $ 753.8
Score:  0.8423453615625281


This predictions are so good. Training with less data, the model can predict the trend better, because it can understand that the nature of the trend is changing fast.

This predictions are good enough to make a "reliable" model to make future predictions:

## Results
Our best model was trained with five years of data:

In [36]:
results = pd.DataFrame(
    {'MAE': [299.54, 118.84], 
    'RMSE': [326.64, 149.45], 
    'Train Data': ['All the data', 'Last 5 years']})

In [37]:
px.bar(results, x='Train Data', y=['MAE', 'RMSE'], barmode='group', 
        title='Train MAE: All Data vs Last 5 Years (Less is Better)')

In [39]:
val['Last 10 Years'] = val3['yhat']
val = val.rename(columns={'Predicted Price': 'All the Data',  'y': 'True Price', 'ds': 'Date'})
px.line(val, x='Date', y=val.columns[1:], title='Tesla Stock Predictions: Train with all the Data vs Train with Last 5years')

## Training a model to make predictions over Tesla Stocks
Let's train our final model:

In [40]:
model = train('tesla', '../data/tesla.csv', False, True, len(X_train_last_five))

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


In [41]:
# Making future predictions with the model: next year

# 1. Creating the forecast Dates
X_test_future = []
end = dt.datetime.strptime('2022-12-31', '%Y-%m-%d').date()
start = dt.datetime.strptime('2021-11-20', '%Y-%m-%d').date()

for i in range((end-start).days):
    X_test_future += [(start+timedelta(i)).strftime('%Y-%m-%d')]

X_test_future = pd.DataFrame(X_test_future)
X_test_future.columns = ['ds']
X_test_future

Unnamed: 0,ds
0,2021-11-20
1,2021-11-21
2,2021-11-22
3,2021-11-23
4,2021-11-24
...,...
401,2022-12-26
402,2022-12-27
403,2022-12-28
404,2022-12-29


In [42]:
# 2. Making predictions: 1 year
forecast = model.predict(X_test_future)
fig = plot_plotly(model, forecast, xlabel='Date', ylabel='Price')
fig.update_layout(title='Tesla Stocks - 2022 Forecasting')
fig.show()

## Final Step: Saving the model
The final step is save our model in the *models* folder:

In [43]:
# Saving the model
save_model('../models', model, 'tesla')

Model Succesfully Saved in: 
../models/tesla.json


## Conclussions
**With 5 years of training data, our model got a 0.84 score**, better than the model trained with all the data.

Knowing that 2021 was crazy in the Tesla Stock price, our predictions was good, and the 2022 forecast makes a lot of sense looking the last years of the price.