# TSFRESH Robot Failure Example
This example show shows how to use [tsfresh](https://tsfresh.readthedocs.io/) to exctract useful features from multiple timeseries and use them to improve classification performance.

In [None]:
%matplotlib inline
import matplotlib.pylab as plt
import seaborn as sns
from tsfresh.examples.robot_execution_failures import download_robot_execution_failures, load_robot_execution_failures
from tsfresh import extract_features, extract_relevant_features, select_features
from tsfresh.utilities.dataframe_functions import impute
from tsfresh.feature_extraction import FeatureExtractionSettings
from sklearn.tree import DecisionTreeClassifier
from sklearn.cross_validation import train_test_split
from sklearn.metrics import classification_report

## Load and visualize data
The data consists of timeseries for 88 robots (`id` 1 - 88).  For each robot, each timepoint (`time`) contains datapoints from 6 sensors (`a` - `f`) leading up to a success or failure (`y`).

In [None]:
download_robot_execution_failures()
df, y = load_robot_execution_failures()
df.head()

In [None]:
df[df.id == 3][['time', 'a', 'b', 'c', 'd', 'e', 'f']].plot(x='time', title='Success example (id 3)', figsize=(12, 6));
df[df.id == 20][['time', 'a', 'b', 'c', 'd', 'e', 'f']].plot(x='time', title='Failure example (id 20)', figsize=(12, 6));

## Extract Features

In [None]:
extraction_settings = FeatureExtractionSettings()
extraction_settings.IMPUTE = impute    # Fill in Infs and NaNs

In [None]:
%time X = extract_features(df, column_id='id', column_sort='time', feature_extraction_settings=extraction_settings);

In [None]:
X.head()

In [None]:
X.info()

In [None]:
%time X_filtered = extract_relevant_features(df, y, column_id='id', column_sort='time', feature_extraction_settings=extraction_settings)

In [None]:
X_filtered.head()

In [None]:
X_filtered.info()

## Train and evaluate classifier

In [None]:
X_train, X_test, X_filtered_train, X_filtered_test, y_train, y_test = train_test_split(X, X_filtered, y, test_size=.4)

In [None]:
cl = DecisionTreeClassifier()
cl.fit(X_train, y_train)
print(classification_report(y_test, cl.predict(X_test)))

In [None]:
cl.n_features_

In [None]:
cl2 = DecisionTreeClassifier()
cl2.fit(X_filtered_train, y_train)
print(classification_report(y_test, cl2.predict(X_filtered_test)))

In [None]:
cl2.n_features_

Compared to using all (1236) features, using only the (283) relevant features achieves better classification performance with less data.

# Extraction + filtering is the same as filtered extraction

Above, we performed two feature extractions runs. A filtered one and a non filtered one. However, the results of the filtered is equal to just extracting all features and then filtering them.

In [None]:
X_filtered_2 = select_features(X, y)

In [None]:
(X_filtered.columns == X_filtered_2.columns).all()