## Rolling Window Statistics

* **Task**: add a summary of the values at previous time steps
    - We can calculate summary statistics across the values in the sliding window and include these as features in our dataset.
    - Example: the mean of the previous few values, also called the rolling mean.
* Pandas provides a `rolling()` function that creates a new data structure with the window of values at each time step.
    - We can then perform statistical functions on the window of values collected for each time step, such as calculating the mean.
        * First, the series must be shifted.
        * Then the rolling dataset can be created and the mean values calculated on each window.


In [1]:
from pandas import Series
from pandas import DataFrame
from pandas import concat

series = Series.from_csv('../data/daily_temp.csv', header=0)
temps = DataFrame(series.values)
shifted = temps.shift(1)
shifted.columns = ['shifted']


means = shifted['shifted'].rolling(2).mean()
dataframe = concat([means, temps], axis=1)
dataframe.columns = ['mean(t-1, t)', 't+1'] 

dataframe.head()


Unnamed: 0,"mean(t-1, t)",t+1
0,,20.7
1,20.7,17.9
2,17.9,18.8
3,18.8,14.6
4,14.6,15.8


In [2]:
temps.iloc[566] = 0.8
temps.iloc[565] = 0.2
temps.iloc[1290] = 0.1

In [3]:
temps.reset_index(inplace=True)
temps[0] = temps[0].astype("float")
temps.head()

Unnamed: 0,index,0
0,0,20.7
1,1,17.9
2,2,18.8
3,3,14.6
4,4,15.8


In [4]:
temps.columns = ["index", "temps"]

In [5]:
shifted = shifted[1:]

https://datamarket.com/data/set/22u3/international-airline-passengers-monthly-totals-in-thousands-jan-49-dec-60#!ds=22u3&display=line

In [6]:
from tsfresh import extract_features

  from pandas.core import datetools


In [7]:
extracted_features = extract_features(temps, column_id="index")

Feature Extraction: 100%|██████████| 3650/3650 [01:35<00:00, 38.30it/s]  


In [8]:
extracted_features.head()

variable,temps__abs_energy,temps__absolute_sum_of_changes,"temps__agg_autocorrelation__f_agg_""mean""","temps__agg_autocorrelation__f_agg_""median""","temps__agg_autocorrelation__f_agg_""var""","temps__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""intercept""","temps__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""rvalue""","temps__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""slope""","temps__agg_linear_trend__f_agg_""max""__chunk_len_10__attr_""stderr""","temps__agg_linear_trend__f_agg_""max""__chunk_len_50__attr_""intercept""",...,temps__time_reversal_asymmetry_statistic__lag_1,temps__time_reversal_asymmetry_statistic__lag_2,temps__time_reversal_asymmetry_statistic__lag_3,temps__value_count__value_-inf,temps__value_count__value_0,temps__value_count__value_1,temps__value_count__value_inf,temps__value_count__value_nan,temps__variance,temps__variance_larger_than_standard_deviation
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,428.49,0.0,0.0,0.0,0.0,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,320.41,0.0,0.0,0.0,0.0,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,353.44,0.0,0.0,0.0,0.0,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,213.16,0.0,0.0,0.0,0.0,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,249.64,0.0,0.0,0.0,0.0,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [9]:
from tsfresh import select_features
from tsfresh.utilities.dataframe_functions import impute

In [20]:
impute(extracted_features)
features_filtered = select_features(extracted_features.iloc[1:], shifted["shifted"])





























In [21]:
features_filtered.head()

variable,temps__abs_energy,temps__sum_values,temps__quantile__q_0.9,temps__quantile__q_0.8,temps__quantile__q_0.7,temps__quantile__q_0.6,temps__quantile__q_0.4,temps__quantile__q_0.3,temps__quantile__q_0.2,temps__quantile__q_0.1,...,temps__median,temps__mean,temps__maximum,"temps__fft_coefficient__coeff_0__attr_""real""","temps__fft_coefficient__coeff_0__attr_""abs""","temps__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_0__w_5","temps__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_0__w_20","temps__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_0__w_2","temps__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_0__w_10",temps__value_count__value_0
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,320.41,17.9,17.9,17.9,17.9,17.9,17.9,17.9,17.9,17.9,...,17.9,17.9,17.9,17.9,17.9,6.943044,3.471522,10.977917,4.909474,0.0
2,353.44,18.8,18.8,18.8,18.8,18.8,18.8,18.8,18.8,18.8,...,18.8,18.8,18.8,18.8,18.8,7.292136,3.646068,11.529879,5.156319,0.0
3,213.16,14.6,14.6,14.6,14.6,14.6,14.6,14.6,14.6,14.6,...,14.6,14.6,14.6,14.6,14.6,5.663042,2.831521,8.954055,4.004375,0.0
4,249.64,15.8,15.8,15.8,15.8,15.8,15.8,15.8,15.8,15.8,...,15.8,15.8,15.8,15.8,15.8,6.128497,3.064249,9.690005,4.333502,0.0
5,249.64,15.8,15.8,15.8,15.8,15.8,15.8,15.8,15.8,15.8,...,15.8,15.8,15.8,15.8,15.8,6.128497,3.064249,9.690005,4.333502,0.0
