# Timeseries Forecasting

This notebook explains how to use `tsfresh` in time series foreacasting.
Make sure you also read through the [documentation](https://tsfresh.readthedocs.io/en/latest/text/forecasting.html) to learn more on this feature.

It is basically a copy of the other time series forecasting notebook, but this time using more than one 
stock.
This is conceptionally not much different, but the pandas multi-index magic is a bit advanced :-)

We will use the Google, Facebook and Alphabet stock.
Please find all documentation in the other notebook.

In [None]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pylab as plt

from tsfresh import extract_features, select_features
from tsfresh.utilities.dataframe_functions import roll_time_series, make_forecasting_frame
from tsfresh.utilities.dataframe_functions import impute

try:
    import pandas_datareader.data as web
except ImportError:
    print("You need to install the pandas_datareader. Run pip install pandas_datareader.")

from sklearn.ensemble import AdaBoostRegressor

## Reading the data

In [None]:
df = web.DataReader(['F', "AAPL", "GOOGL"], 'stooq')["High"]
df.head()

In [None]:
plt.figure(figsize=(15, 6))
df.plot(ax=plt.gca())
plt.show()

This time we need to make sure to preserve the stock symbol information while reordering:

In [None]:
df_melted = df.copy()
df_melted["date"] = df_melted.index
df_melted = df_melted.melt(id_vars="date", value_name="high").sort_values(["Symbols", "date"])
df_melted = df_melted[["Symbols", "date", "high"]]

df_melted.head()

## Create training data sample

In [None]:
df_rolled = roll_time_series(df_melted, column_id="Symbols", column_sort="date",
                             max_timeshift=20, min_timeshift=5)

In [None]:
df_rolled.head()

## Extract Features

In [None]:
X = extract_features(df_rolled.drop("Symbols", axis=1), 
                     column_id="id", column_sort="date", column_value="high", 
                     impute_function=impute, show_warnings=False)

In [None]:
X.head()

We make the data a bit easier to work with by giving them a multi-index instead ot the tuple index:

In [None]:
# split up the two parts of the index and give them proper names
X = X.set_index([X.index.map(lambda x: x[0]), X.index.map(lambda x: x[1])], drop=True)
X.index.names = ["Symbols", "last_date"]

In [None]:
X.head()

Our `(AAPL, 2015-07-15 00:00:00)` is also in the data again:

In [None]:
X.loc["AAPL", pd.to_datetime('2015-07-15')]

Just to repeat: the features in this row were only calculated using the time series values of `AAPL` up to and including `2015-07-15` and the last 20 days.

## Prediction

The next line might look like magic if you are not used to pandas transformations, but what it does is:

for each stock symbol separately:
* sort by date
* take the high value
* shift 1 time step in the future
* bring into the same multi-index format as `X` above

In [None]:
y = df_melted.groupby("Symbols").apply(lambda x: x.set_index("date")["high"].shift(-1)).T.unstack()

Quick consistency test:

In [None]:
y["AAPL", pd.to_datetime("2015-07-15")], df.loc[pd.to_datetime("2015-07-16"), "AAPL"]

In [None]:
y = y[y.index.isin(X.index)]
X = X[X.index.isin(y.index)]

The splitting into train and test samples workes in principle the same as with a single identifier, but this time we have a multi-index symbol-date, so the `loc` call looks a bit more complicated:

In [None]:
X_train = X.loc[(slice(None), slice(None, "2018")), :]
X_test = X.loc[(slice(None), slice("2019", "2020")), :]

y_train = y.loc[(slice(None), slice(None, "2018"))]
y_test = y.loc[(slice(None), slice("2019", "2020"))]

In [None]:
X_train_selected = select_features(X_train, y_train)

We are training a regressor for each of the stocks separately

In [None]:
adas = {stock: AdaBoostRegressor() for stock in ["AAPL", "F", "GOOGL"]}

for stock, ada in adas.items():
    ada.fit(X_train_selected.loc[stock], y_train.loc[stock])

Now lets check again how good our prediction is:

In [None]:
X_test_selected = X_test[X_train_selected.columns]

y_pred = pd.concat({
    stock: pd.Series(adas[stock].predict(X_test_selected.loc[stock]), index=X_test_selected.loc[stock].index)
    for stock in adas.keys()
})
y_pred.index.names = ["Symbols", "last_date"]

In [None]:
plt.figure(figsize=(15, 6))

y.unstack("Symbols").plot(ax=plt.gca())
y_pred.unstack("Symbols").shift(-1).plot(ax=plt.gca(), legend=None, marker=".")