# Chapter 7: Machine Learning Models for Time-series
## K-nearest neighbors with dynamic time warping in Python

In this section, we'll classify failures from force and torque measurements of a robot over time.
We'll use a very simple classifier, kNN, and perhaps we should give a heads-up that
this method involves taking point-wise distances, which can often be a bottleneck for
computations.
In this section, we'll combine TSFresh's feature extraction in a pipeline with a kNN
algorithm. The time-series pipeline can really help make things easy, as you'll find
when reading the code snippets.

In [10]:
from tsfresh.examples import load_robot_execution_failures
from tsfresh.examples.robot_execution_failures import download_robot_execution_failures

%matplotlib inline
%load_ext autoreload
%autoreload 2
%load_ext lab_black

We'll use the kNN classifier in tslearn. We could even have used the kNN classifier
in scikit-learn, which allows a custom metric to be specified.  
  
In the example, we will download a dataset of robotic execution failures from the
UCI machine learning repository and store it locally. This dataset contains force and
torque measurements on a robot after failure detection. For each sample, the task is
to classify whether the robot will report a failure:

In [3]:
download_robot_execution_failures()

In [4]:
df_ts, y = load_robot_execution_failures()

In [5]:
df_ts.head()

Unnamed: 0,id,time,F_x,F_y,F_z,T_x,T_y,T_z
0,1,0,-1,-1,63,-3,-1,0
1,1,1,0,0,62,-3,-1,0
2,1,2,-1,-1,61,-3,0,0
3,1,3,-1,-1,63,-2,-1,0
4,1,4,-1,-1,63,-3,-1,0


It's always important to check the frequency of the two classes:

In [6]:
print(f"{y.mean():.2f}")

0.24


We can then extract time-series features using TSFresh, as discussed in **Chapter 3,
Preprocessing Time-Series**. We can impute missing values and select features based
on relevance to the target. In TSFresh, the p-value from a statistical test is used to
calculate the feature significance:

In [8]:
from tsfresh import extract_features
from tsfresh import select_features
from tsfresh.utilities.dataframe_functions import impute

In [11]:
extracted_features = impute(extract_features(df_ts, column_id="id", column_sort="time"))
features_filtered = select_features(extracted_features, y)

Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 38/38 [00:13<00:00,  2.82it/s]
 'F_x__partial_autocorrelation__lag_8'
 'F_x__partial_autocorrelation__lag_9' ...
 'T_z__matrix_profile__feature_"median"__threshold_0.98'
 'T_z__matrix_profile__feature_"25"__threshold_0.98'
 'T_z__matrix_profile__feature_"75"__threshold_0.98'] did not have any finite values. Filling with zeros.


We can continue working with the `features_filtered DataFrame`, which contains
our features – sensor signals from before and TSFresh features.

In [12]:
features_filtered.head()

Unnamed: 0,F_x__value_count__value_-1,F_x__abs_energy,F_x__root_mean_square,T_y__absolute_maximum,F_x__mean_n_absolute_max__number_of_maxima_7,F_x__range_count__max_1__min_-1,F_y__abs_energy,F_y__root_mean_square,F_y__mean_n_absolute_max__number_of_maxima_7,T_y__variance,T_y__standard_deviation,F_y__absolute_maximum,T_x__absolute_maximum,"F_x__fft_coefficient__attr_""abs""__coeff_1",F_x__absolute_maximum,"T_y__fft_coefficient__attr_""abs""__coeff_1",T_y__root_mean_square,T_y__abs_energy,T_y__mean_n_absolute_max__number_of_maxima_7,F_z__standard_deviation,F_z__variance,"F_z__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""var""",F_x__variance,F_x__standard_deviation,F_x__ratio_value_number_to_time_series_length,T_x__variance,T_x__standard_deviation,"T_x__fft_coefficient__attr_""abs""__coeff_1","T_y__fft_coefficient__attr_""abs""__coeff_2",F_x__cid_ce__normalize_True,F_x__autocorrelation__lag_1,F_x__partial_autocorrelation__lag_1,T_y__percentage_of_reoccurring_datapoints_to_all_datapoints,T_x__mean_n_absolute_max__number_of_maxima_7,"T_y__fft_coefficient__attr_""abs""__coeff_4",T_x__ratio_value_number_to_time_series_length,F_x__lempel_ziv_complexity__bins_100,"F_z__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""min""","F_y__fft_coefficient__attr_""abs""__coeff_0",T_y__percentage_of_reoccurring_values_to_all_values,...,"F_z__change_quantiles__f_agg_""var""__isabs_True__qh_0.8__ql_0.4","F_x__fft_aggregated__aggtype_""centroid""","F_z__change_quantiles__f_agg_""mean""__isabs_True__qh_1.0__ql_0.6",T_z__energy_ratio_by_chunks__num_segments_10__segment_focus_5,"F_y__agg_linear_trend__attr_""intercept""__chunk_len_5__f_agg_""max""","F_x__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""max""","T_y__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""min""","F_z__change_quantiles__f_agg_""var""__isabs_False__qh_1.0__ql_0.4",T_x__count_below__t_0,F_z__count_above_mean,"T_z__change_quantiles__f_agg_""mean""__isabs_True__qh_0.6__ql_0.2","T_z__change_quantiles__f_agg_""mean""__isabs_True__qh_0.6__ql_0.4","F_y__change_quantiles__f_agg_""mean""__isabs_True__qh_0.6__ql_0.4",F_x__count_below_mean,T_y__fourier_entropy__bins_3,F_z__time_reversal_asymmetry_statistic__lag_2,T_y__permutation_entropy__dimension_6__tau_1,"T_z__fft_aggregated__aggtype_""variance""",F_z__permutation_entropy__dimension_5__tau_1,F_z__maximum,"F_z__change_quantiles__f_agg_""mean""__isabs_False__qh_1.0__ql_0.4",T_z__variation_coefficient,"T_x__agg_linear_trend__attr_""intercept""__chunk_len_5__f_agg_""min""",T_x__number_peaks__n_1,T_y__number_cwt_peaks__n_1,T_y__count_below__t_0,"T_x__change_quantiles__f_agg_""var""__isabs_True__qh_0.2__ql_0.0","F_z__change_quantiles__f_agg_""mean""__isabs_True__qh_1.0__ql_0.8",T_x__quantile__q_0.1,F_y__has_duplicate_max,"F_y__cwt_coefficients__coeff_13__w_2__widths_(2, 5, 10, 20)","F_y__cwt_coefficients__coeff_14__w_5__widths_(2, 5, 10, 20)",T_y__lempel_ziv_complexity__bins_3,T_y__quantile__q_0.1,F_z__time_reversal_asymmetry_statistic__lag_1,F_x__quantile__q_0.2,F_y__quantile__q_0.7,"T_x__change_quantiles__f_agg_""var""__isabs_False__qh_0.2__ql_0.0",T_z__large_standard_deviation__r_0.35000000000000003,T_z__quantile__q_0.9
1,14.0,14.0,0.966092,1.0,1.0,15.0,13.0,0.930949,1.0,0.222222,0.471405,1.0,3.0,1.0,1.0,1.165352,0.816497,10.0,1.0,1.203698,1.448889,0.65,0.062222,0.249444,0.133333,0.115556,0.339935,1.338261,0.870796,5.669467,-0.081633,-0.081633,1.0,3.0,4.165352,0.133333,0.333333,61.0,13.0,1.0,...,0.0,1.333333,0.0,0.016201,-0.333333,0.0,-1.0,0.0,1.0,10.0,0.0,0.0,0.0,14.0,0.735622,2181.909091,2.302585,4.909978,1.972247,64.0,0.0,-0.238179,-3.0,1.0,4.0,1.0,0.0,0.0,-3.0,1.0,-0.310265,-0.751682,0.4,-1.0,-596.0,-1.0,-1.0,0.0,0.0,0.0
2,7.0,25.0,1.290994,5.0,1.571429,13.0,76.0,2.250926,3.0,4.222222,2.054805,4.0,10.0,0.624118,3.0,6.020261,2.44949,90.0,3.285714,4.333846,18.782222,19.84,0.915556,0.956847,0.2,11.715556,3.422799,5.138517,8.680637,5.724246,-0.100208,-0.100208,0.8,6.285714,6.148091,0.6,0.533333,53.0,10.0,0.571429,...,0.888889,2.760761,0.666667,0.0,-0.333333,0.0,-3.0,12.916667,0.866667,8.0,0.285714,0.0,0.0,9.0,1.039721,6051.363636,2.302585,5.371714,2.397895,70.0,-1.5,-1.658312,-4.166667,4.0,4.0,0.933333,0.0,1.0,-9.2,1.0,-0.202951,0.057818,0.533333,-3.6,-680.384615,-1.0,-1.0,0.0,1.0,0.0
3,11.0,12.0,0.894427,5.0,1.0,14.0,40.0,1.632993,2.142857,3.128889,1.768867,3.0,7.0,2.203858,1.0,8.235442,2.620433,103.0,3.428571,4.616877,21.315556,22.01,0.355556,0.596285,0.2,6.933333,2.633122,8.113625,5.511715,6.27495,-0.357143,-0.357143,0.866667,6.0,8.188086,0.466667,0.466667,51.0,8.0,0.714286,...,3.1875,2.614065,2.0,0.0,0.833333,1.0,-4.0,9.142857,0.933333,7.0,0.571429,0.0,0.0,11.0,0.974315,3876.454545,2.302585,6.673949,2.397895,68.0,-1.0,-1.658312,-5.833333,6.0,3.0,0.866667,0.0,3.0,-6.6,0.0,0.539121,0.912474,0.533333,-4.0,-617.0,-1.0,0.0,0.0,1.0,0.0
4,5.0,16.0,1.032796,6.0,1.285714,10.0,60.0,2.0,2.428571,7.128889,2.669998,5.0,15.0,0.844394,2.0,12.067855,2.875181,124.0,3.714286,3.833188,14.693333,10.64,0.906667,0.95219,0.266667,12.426667,3.525148,23.423626,3.77399,6.213127,-0.327731,-0.327731,0.533333,9.142857,6.840792,0.6,0.533333,56.0,2.0,0.3,...,1.0,3.489022,3.666667,0.142857,2.333333,1.0,-5.0,8.138889,1.0,8.0,0.416667,0.0,0.666667,7.0,1.039721,11671.727273,2.302585,4.887339,2.271869,70.0,-1.833333,-1.788854,-9.333333,5.0,5.0,0.733333,0.0,0.0,-9.0,0.0,-2.64139,-0.609735,0.533333,-4.6,3426.307692,-1.0,1.0,0.0,0.0,0.0
5,9.0,17.0,1.064581,5.0,1.285714,13.0,46.0,1.75119,2.285714,4.16,2.039608,3.0,12.0,2.730599,2.0,6.44533,3.464102,180.0,4.428571,4.841487,23.44,16.0,0.773333,0.879394,0.266667,7.6,2.75681,0.947461,8.175089,5.211063,0.077586,0.077586,0.933333,9.428571,5.902516,0.6,0.533333,56.0,4.0,0.833333,...,0.0,2.678754,6.75,0.0,3.0,2.0,-5.0,41.1875,1.0,5.0,0.272727,0.0,0.666667,10.0,1.039721,7744.0,2.302585,4.509238,2.271869,73.0,3.25,-4.636809,-11.833333,5.0,5.0,0.933333,0.0,0.0,-9.6,0.0,0.591927,0.072771,0.466667,-5.0,-2609.0,-1.0,0.8,0.0,0.0,0.6


In [13]:
from sklearn.model_selection import TimeSeriesSplit, GridSearchCV
from tslearn.neighbors import KNeighborsTimeSeriesClassifier

In [14]:
knn = KNeighborsTimeSeriesClassifier()
param_search = {"metric": ["dtw"], "n_neighbors": [1, 2, 3]}
tscv = TimeSeriesSplit(n_splits=2)

In [15]:
gsearch = GridSearchCV(estimator=knn, cv=tscv, param_grid=param_search)
gsearch.fit(features_filtered, y)

GridSearchCV(cv=TimeSeriesSplit(gap=0, max_train_size=None, n_splits=2, test_size=None),
             estimator=KNeighborsTimeSeriesClassifier(),
             param_grid={'metric': ['dtw'], 'n_neighbors': [1, 2, 3]})