# Timeseries Forecasting

This notebook explains how to use `tsfresh` in time series foreacasting.
Make sure you also read through the [documentation](https://tsfresh.readthedocs.io/en/latest/text/forecasting.html) to learn more on this feature.

It is basically a copy of the other time series forecasting notebook, but this time using more than one 
stock.
This is conceptionally not much different, but the pandas multi-index magic is a bit advanced :-)

We will use the Google, Facebook and Alphabet stock.
Please find all documentation in the other notebook.

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pylab as plt

from tsfresh import extract_features, select_features
from tsfresh.utilities.dataframe_functions import roll_time_series, make_forecasting_frame
from tsfresh.utilities.dataframe_functions import impute

try:
    import pandas_datareader.data as web
except ImportError:
    print("You need to install the pandas_datareader. Run pip install pandas_datareader.")

from sklearn.ensemble import AdaBoostRegressor

## Reading the data

In [2]:
df_melted = pd.read_excel("train.xlsx")

In [3]:
df_melted.head()

Unnamed: 0.1,Unnamed: 0,Time,Symbols,High
0,0,0,Sensor1 c1,-50.85
1,1,1,Sensor1 c1,-49.4
2,2,2,Sensor1 c1,-40.04
3,3,3,Sensor1 c1,-47.14
4,4,4,Sensor1 c1,-33.58


## Create training data sample

In [4]:
df_rolled = roll_time_series(df_melted, column_id="Symbols", column_sort="Time",
                             max_timeshift=999, min_timeshift=999)

Rolling: 100%|██████████| 30/30 [00:01<00:00, 20.39it/s]


In [5]:
df_rolled.tail()

Unnamed: 0.1,Unnamed: 0,Time,Symbols,High,id
31995,8995,995,Sensor9 c1,643.57,"id=Sensor9 c1,timeshift=999"
31996,8996,996,Sensor9 c1,645.6,"id=Sensor9 c1,timeshift=999"
31997,8997,997,Sensor9 c1,653.53,"id=Sensor9 c1,timeshift=999"
31998,8998,998,Sensor9 c1,641.36,"id=Sensor9 c1,timeshift=999"
31999,8999,999,Sensor9 c1,642.65,"id=Sensor9 c1,timeshift=999"


## Extract Features

In [6]:
X = extract_features(df_rolled.drop("Symbols", axis=1), 
                     column_id="id", column_sort="Time", column_value="High", 
                     impute_function=impute, show_warnings=False)

Feature Extraction: 100%|██████████| 16/16 [00:08<00:00,  1.83it/s]


In [7]:
X.head()

variable,High__abs_energy,High__absolute_sum_of_changes,"High__agg_autocorrelation__f_agg_""mean""__maxlag_40","High__agg_autocorrelation__f_agg_""median""__maxlag_40","High__agg_autocorrelation__f_agg_""var""__maxlag_40","High__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""max""","High__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""mean""","High__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""min""","High__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""var""","High__agg_linear_trend__attr_""intercept""__chunk_len_50__f_agg_""max""",...,High__symmetry_looking__r_0.9500000000000001,High__time_reversal_asymmetry_statistic__lag_1,High__time_reversal_asymmetry_statistic__lag_2,High__time_reversal_asymmetry_statistic__lag_3,High__value_count__value_-1,High__value_count__value_0,High__value_count__value_1,High__variance,High__variance_larger_than_standard_deviation,High__variation_coefficient
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"id=Sensor1 c2,timeshift=999",1975064000.0,6549.12,0.978032,0.978772,0.000176,-308.200242,-322.104256,-335.119071,90.549462,-201.75,...,1.0,16503190.0,33104240.0,49628910.0,0.0,0.0,0.0,724380.9,1.0,0.761044
"id=Sensor1 c1,timeshift=999",1094491000.0,6401.74,0.977813,0.978643,0.00018,-220.713699,-231.722674,-241.575131,53.074467,-140.857571,...,1.0,6568537.0,13169820.0,19804200.0,0.0,1.0,0.0,396515.0,1.0,0.753719
"id=Sensor10 c2,timeshift=999",7452665000.0,10079.03,0.011889,0.007736,0.000767,2743.112499,2730.06907,2716.945297,68.797627,2748.965857,...,1.0,221623.1,497693.7,455099.3,0.0,0.0,0.0,78.91655,1.0,0.003254
"id=Sensor10 c1,timeshift=999",97312860000.0,54768.88,0.760298,0.75279,0.016021,14378.940923,13566.577716,12785.30196,545511.901531,17742.654143,...,1.0,-170612000000.0,-329296300000.0,-477723900000.0,0.0,0.0,0.0,77127150.0,1.0,1.954707
"id=Sensor11 c2,timeshift=999",16175320000.0,12561.72,0.77763,0.777902,0.000968,4074.369594,4057.496626,4039.931091,105.726122,4080.022714,...,1.0,-6576646.0,-12334380.0,-17108280.0,0.0,0.0,0.0,752.6326,1.0,0.006821


In [8]:
X.to_excel("latest.xlsx")

In [9]:
pwd

'/home/nitin/Desktop/shanur/gas'

We make the data a bit easier to work with by giving them a multi-index instead ot the tuple index: