# Timeseries Forecasting

This notebook explains how to use `tsfresh` in time series foreacasting.
Make sure you also read through the [documentation](https://tsfresh.readthedocs.io/en/latest/text/forecasting.html) to learn more on this feature.

It is basically a copy of the other time series forecasting notebook, but this time using more than one 
stock.
This is conceptionally not much different, but the pandas multi-index magic is a bit advanced :-)

We will use the Google, Facebook and Alphabet stock.
Please find all documentation in the other notebook.

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pylab as plt

from tsfresh import extract_features, select_features
from tsfresh.utilities.dataframe_functions import roll_time_series, make_forecasting_frame
from tsfresh.utilities.dataframe_functions import impute

try:
    import pandas_datareader.data as web
except ImportError:
    print("You need to install the pandas_datareader. Run pip install pandas_datareader.")

from sklearn.ensemble import AdaBoostRegressor

## Reading the data

In [2]:
df_melted = pd.read_excel("train.xlsx")

In [3]:
df_melted.head()

Unnamed: 0.1,Unnamed: 0,Symbols,Time,High
0,0,1 sensor response,0,0.013786
1,1,1 sensor response,1,0.017507
2,2,1 sensor response,2,0.017326
3,3,1 sensor response,3,0.017236
4,4,1 sensor response,4,0.01761


## Create training data sample

In [4]:
df_rolled = roll_time_series(df_melted, column_id="Symbols", column_sort="Time",
                             max_timeshift=999, min_timeshift=999)

Rolling: 100%|██████████| 30/30 [00:03<00:00,  8.15it/s]


In [5]:
df_rolled.tail()

Unnamed: 0.1,Unnamed: 0,Symbols,Time,High,id
76995,46995,9 sensor response 9,995,0.067651,"id=9 sensor response 9,timeshift=999"
76996,46996,9 sensor response 9,996,0.067713,"id=9 sensor response 9,timeshift=999"
76997,46997,9 sensor response 9,997,0.067775,"id=9 sensor response 9,timeshift=999"
76998,46998,9 sensor response 9,998,0.067836,"id=9 sensor response 9,timeshift=999"
76999,46999,9 sensor response 9,999,0.067897,"id=9 sensor response 9,timeshift=999"


## Extract Features

In [7]:
X = extract_features(df_rolled.drop("Symbols", axis=1), 
                     column_id="id", column_sort="Time", column_value="High", 
                     impute_function=impute, show_warnings=False)

Feature Extraction: 100%|██████████| 26/26 [00:19<00:00,  1.30it/s]


In [8]:
X.head()

variable,High__abs_energy,High__absolute_sum_of_changes,"High__agg_autocorrelation__f_agg_""mean""__maxlag_40","High__agg_autocorrelation__f_agg_""median""__maxlag_40","High__agg_autocorrelation__f_agg_""var""__maxlag_40","High__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""max""","High__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""mean""","High__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""min""","High__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""var""","High__agg_linear_trend__attr_""intercept""__chunk_len_50__f_agg_""max""",...,High__symmetry_looking__r_0.9500000000000001,High__time_reversal_asymmetry_statistic__lag_1,High__time_reversal_asymmetry_statistic__lag_2,High__time_reversal_asymmetry_statistic__lag_3,High__value_count__value_-1,High__value_count__value_0,High__value_count__value_1,High__variance,High__variance_larger_than_standard_deviation,High__variation_coefficient
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"id=1 sensor response 1,timeshift=999",14.66556,0.320984,0.948025,0.948777,0.000919,-0.048056,-0.048266,-0.048442,-3.066782e-07,-0.041973,...,1.0,3.191407e-05,6.380381e-05,9.566774e-05,0.0,48.0,0.0,0.007273,0.0,0.991945
"id=1 sensor response,timeshift=999",737.932646,2.107785,0.941218,0.947792,0.001742,0.615003,0.594858,0.576394,0.0003647206,0.693506,...,1.0,0.0008286899,0.001660145,0.002494069,0.0,0.0,0.0,0.125729,0.0,0.453178
"id=10 sensor response 10,timeshift=999",18.729054,0.264712,0.78711,0.789804,0.015492,0.108367,0.105874,0.103238,8.501153e-06,0.117758,...,1.0,7.323802e-06,1.467303e-05,2.203423e-05,0.0,0.0,0.0,0.000956,0.0,0.231872
"id=10 sensor response,timeshift=999",0.790253,0.118053,0.954875,0.95713,0.000706,0.011647,0.011181,0.010705,4.583489e-07,0.013345,...,1.0,6.685702e-08,1.315695e-07,1.93984e-07,0.0,0.0,0.0,0.000101,0.0,0.383255
"id=11 sensor response 11,timeshift=999",42.138557,0.565351,0.902953,0.8972,0.002869,0.088254,0.08497,0.081467,1.39281e-05,0.106308,...,1.0,3.826741e-05,7.660806e-05,0.0001150185,0.0,0.0,0.0,0.005031,0.0,0.368201


In [9]:
X.to_excel("latest.xlsx")

In [9]:
pwd

'/home/nitin/Desktop/shanur/time series solution using tsfresh'

We make the data a bit easier to work with by giving them a multi-index instead ot the tuple index: