## Implementing TimeSeriesRandomClassifier
Open sklearn `RandomForestClassifier` to accept pipelines as `base_estimator`, this required to 
* create `RandomIntervalFeatureExtractor` transformer
* change how parameters are set in construction of individual member of ensemble 
* pass on random state from top-level to all components of pipeline with randomness, including modifying `Pipeline` to have random_state attribute
* adapt sklearn helper functions for parallel building and accumulating predictions of trees
* remove sklearn input checks for tabular data in parent classes

Still to do:
* write unit test
* ideally replace tabular-data input checks with time-series/panel checks, possibly implemented as part of the data-container, also may be worthwile including keyword argument in pipeline to sidestep input checks in members of ensembles similar to check_input in sklearn `DecisionTreeClassifier`
* Always assign the same random_state to all components as part of `TSPipeline` (rather than `TimeSeriesForestClassifier`)?
* `feature_importances_` has to be removed or adapted to a temporal importance curve (as in the paper)
* `decision_path()` has to be adapted
* `oob_score` methods have to be adapted
* `apply` method has to be adapted
* speed up transfoms, still very slow
* implement "entrance" criterion (as in paper) 

In [1]:
from sktime.tsforest import TimeSeriesForestClassifier
from sktime.load_data import load_from_web_to_xdataframe
import pandas as pd

%load_ext autoreload
%autoreload 2

In [2]:
cache_path = 'data/'
dataset_name = 'GunPoint'
X_train, y_train = load_from_web_to_xdataframe(dataset_name, is_train_file=True,
                                               cache_path=cache_path) 
X_test, y_test = load_from_web_to_xdataframe(dataset_name, is_test_file=True,
                                             cache_path=cache_path)
X_train = pd.DataFrame(X_train)
y_train = pd.Series(y_train)
X_test = pd.DataFrame(X_test)
y_test = pd.Series(y_test)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(50, 1) (50,) (150, 1) (150,)


In [4]:
clf = TimeSeriesForestClassifier(n_estimators=100, criterion='entropy', random_state=123)
%time clf.fit(X_train, y_train)
%time clf.score(X_test, y_test)

CPU times: user 29.3 s, sys: 168 ms, total: 29.5 s
Wall time: 29.3 s
CPU times: user 1min 26s, sys: 409 ms, total: 1min 27s
Wall time: 1min 26s


0.86