# Timeseries Forecasting

This notebook explains how to use `tsfresh` in time series foreacasting.
Make sure you also read through the [documentation](https://tsfresh.readthedocs.io/en/latest/text/forecasting.html) to learn more on this feature.

It is basically a copy of the other time series forecasting notebook, but this time using more than one 
stock.
This is conceptionally not much different, but the pandas multi-index magic is a bit advanced :-)

We will use the Google, Facebook and Alphabet stock.
Please find all documentation in the other notebook.

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pylab as plt

from tsfresh import extract_features, select_features
from tsfresh.utilities.dataframe_functions import roll_time_series, make_forecasting_frame
from tsfresh.utilities.dataframe_functions import impute

try:
    import pandas_datareader.data as web
except ImportError:
    print("You need to install the pandas_datareader. Run pip install pandas_datareader.")

from sklearn.ensemble import AdaBoostRegressor

## Reading the data

In [2]:
df_melted = pd.read_excel("new1.xlsx")

In [3]:
df_melted.head()

Unnamed: 0.1,Unnamed: 0,Symbols,date,High
0,1.0,Valueone,0,9.43
1,2.0,Valueone,1,9.56
2,3.0,Valueone,2,9.63
3,4.0,Valueone,3,9.59
4,5.0,Valueone,4,9.59


## Create training data sample

In [4]:
df_rolled = roll_time_series(df_melted, column_id="Symbols", column_sort="date",
                             max_timeshift=20, min_timeshift=5)

Rolling: 100%|██████████| 30/30 [00:13<00:00,  2.19it/s]


In [5]:
df_rolled.tail()

Unnamed: 0.1,Unnamed: 0,Symbols,date,High,id
439918,,Valuetwo2,852,3.1,"id=Valuetwo2,timeshift=853"
440547,,Valuetwo2,852,3.1,"id=Valuetwo2,timeshift=854"
439919,,Valuetwo2,853,3.1,"id=Valuetwo2,timeshift=853"
440548,,Valuetwo2,853,3.1,"id=Valuetwo2,timeshift=854"
440549,,Valuetwo2,854,3.11,"id=Valuetwo2,timeshift=854"


## Extract Features

In [6]:
X = extract_features(df_rolled.drop("Symbols", axis=1), 
                     column_id="id", column_sort="date", column_value="High", 
                     impute_function=impute, show_warnings=False)

Feature Extraction: 100%|██████████| 30/30 [03:38<00:00,  7.30s/it]


In [7]:
X.head()

variable,High__abs_energy,High__absolute_sum_of_changes,"High__agg_autocorrelation__f_agg_""mean""__maxlag_40","High__agg_autocorrelation__f_agg_""median""__maxlag_40","High__agg_autocorrelation__f_agg_""var""__maxlag_40","High__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""max""","High__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""mean""","High__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""min""","High__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""var""","High__agg_linear_trend__attr_""intercept""__chunk_len_50__f_agg_""max""",...,High__symmetry_looking__r_0.9500000000000001,High__time_reversal_asymmetry_statistic__lag_1,High__time_reversal_asymmetry_statistic__lag_2,High__time_reversal_asymmetry_statistic__lag_3,High__value_count__value_-1,High__value_count__value_0,High__value_count__value_1,High__variance,High__variance_larger_than_standard_deviation,High__variation_coefficient
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"id=ValueEight,timeshift=10",21.4765,0.02,-0.548034,-0.198913,0.80289,1.4,1.396,1.39,2.4e-05,0.0,...,1.0,0.01302,0.02786,0.03906,0.0,0.0,0.0,3.801653e-05,0.0,0.004412707
"id=ValueEight,timeshift=100",43.5456,0.0,0.0,0.0,0.0,1.44,1.44,1.44,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.930381e-32,0.0,1.541976e-16
"id=ValueEight,timeshift=101",43.5456,0.0,0.0,0.0,0.0,1.44,1.44,1.44,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.930381e-32,0.0,1.541976e-16
"id=ValueEight,timeshift=102",43.5456,0.0,0.0,0.0,0.0,1.44,1.44,1.44,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.930381e-32,0.0,1.541976e-16
"id=ValueEight,timeshift=103",43.5456,0.0,0.0,0.0,0.0,1.44,1.44,1.44,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.930381e-32,0.0,1.541976e-16


In [8]:
X.to_excel("latest.xlsx")

In [15]:
X["High__abs_energy"]["id=ValueEight,timeshift=100"]

43.54559999999999

In [16]:
X["High__abs_energy"]

id
id=ValueEight,timeshift=10      21.4765
id=ValueEight,timeshift=100     43.5456
id=ValueEight,timeshift=101     43.5456
id=ValueEight,timeshift=102     43.5456
id=ValueEight,timeshift=103     43.5456
                                 ...   
id=Valuetwo2,timeshift=95      187.8088
id=Valuetwo2,timeshift=96      188.0472
id=Valuetwo2,timeshift=97      188.2856
id=Valuetwo2,timeshift=98      188.5240
id=Valuetwo2,timeshift=99      188.7031
Name: High__abs_energy, Length: 25500, dtype: float64

Index(['High__abs_energy', 'High__absolute_sum_of_changes',
       'High__agg_autocorrelation__f_agg_"mean"__maxlag_40',
       'High__agg_autocorrelation__f_agg_"median"__maxlag_40',
       'High__agg_autocorrelation__f_agg_"var"__maxlag_40',
       'High__agg_linear_trend__attr_"intercept"__chunk_len_10__f_agg_"max"',
       'High__agg_linear_trend__attr_"intercept"__chunk_len_10__f_agg_"mean"',
       'High__agg_linear_trend__attr_"intercept"__chunk_len_10__f_agg_"min"',
       'High__agg_linear_trend__attr_"intercept"__chunk_len_10__f_agg_"var"',
       'High__agg_linear_trend__attr_"intercept"__chunk_len_50__f_agg_"max"',
       ...
       'High__symmetry_looking__r_0.9500000000000001',
       'High__time_reversal_asymmetry_statistic__lag_1',
       'High__time_reversal_asymmetry_statistic__lag_2',
       'High__time_reversal_asymmetry_statistic__lag_3',
       'High__value_count__value_-1', 'High__value_count__value_0',
       'High__value_count__value_1', 'High__variance',
     

We make the data a bit easier to work with by giving them a multi-index instead ot the tuple index: