<a href="https://colab.research.google.com/github/timeseriesAI/tsai/blob/master/tutorial_nbs/02_ROCKET_a_new_SOTA_classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

created by Ignacio Oguiza - email: timeseriesAI@gmail.com

## Purpose 😇

The purpose of this notebook is to introduce you to Rocket. 

ROCKET (RandOm Convolutional KErnel Transform) is a new Time Series Classification (TSC) method that has just been released (Oct 29th, 2019), and has achieved **state-of-the-art performance on the UCR univariate time series classification datasets, surpassing HIVE-COTE (the previous state of the art since 2017) in accuracy, with exceptional speed compared to other traditional DL methods.** 

To achieve these 2 things at once is **VERY IMPRESSIVE**. ROCKET is certainly a new TSC method you should try.

Authors:
Dempster, A., Petitjean, F., & Webb, G. I. (2019). ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels. arXiv preprint arXiv:1910.13051.

[paper](https://arxiv.org/pdf/1910.13051)

There are 2 main limitations to the original ROCKET method though:
- Released code doesn't handle multivariate data
- It doesn't run on a GPU, so it's slow when used with a large datasets

In this notebook you will learn: 
- how you can use the original ROCKET method
- you will also learn about a new ROCKET version I have developed in Pytorch, that handles both **univariate and multivariate** data, and uses **GPU**
- you will see how you can integrate the ROCKET features with fastai or other classifiers

## Import libraries 📚

In [None]:
# ## NOTE: UNCOMMENT AND RUN THIS CELL IF YOU NEED TO INSTALL/ UPGRADE TSAI
# stable = False # True: stable version in pip, False: latest version from github
# if stable: 
#     !pip install tsai -U >> /dev/null
# else:      
#     !pip install git+https://github.com/timeseriesAI/tsai.git -U >> /dev/null
# ## NOTE: REMEMBER TO RESTART (NOT RECONNECT/ RESET) THE KERNEL/ RUNTIME ONCE THE INSTALLATION IS FINISHED

In [135]:
import numpy as np
from tqdm.notebook import tqdm

In [57]:
from tsai.all import *
computer_setup()

os             : Linux-5.4.0-80-generic-x86_64-with-debian-bullseye-sid
python         : 3.7.3
tsai           : 0.2.22
fastai         : 2.5.2
fastcore       : 1.3.26
torch          : 1.9.1+cu102
n_cpus         : 10
device         : cuda (Tesla T4)


## How to use the original ROCKET method? 🚀

ROCKET is applied in 2 phases:

1. Generate features from each time series: ROCKET calculates 20k features from each time series, independently of the sequence length. 
2. Apply a classifier to those calculated features. Those features are then used by the classifier of your choice. In the original code they use 2 simple linear classifiers: RidgeClassifierCV and Logistic Regression, but you can use any classifier.

### 1️⃣ Generate features

Let's first generate the features. We'll import data from a UCR Time Series dataset.

The original method requires the time series to be in a 2d array of shape (samples, len). Remember than only univariate sequences are allow in this original method.

In [124]:
data_p1 = np.load("../Pre-Processing/trials/subject_1_session_1_filt_ica_car.npy")
labels_p1 = np.array([0,1,2, 3]*5)

data_p2 = np.load("../Pre-Processing/trials/subject_1_session_2_filt_ica_car.npy")
labels_p2 = np.array([0,1,2, 3]*5)

In [125]:
data_p1 = np.concatenate((data_p1, data_p2), axis = 0)
labels_p1 = np.array([0,1,2,3]*10)

In [126]:
data_p1 = data_p1[:,:,:256]

In [127]:
data_p1.shape

(40, 8, 256)

## How to use ROCKET with large and/ or multivariate datasets on GPU? - Recommended ⭐️

As stated before, the current ROCKET method doesn't support multivariate time series or GPU. This may be a drawback in some cases. 

To overcome both limitations I've created a multivariate ROCKET on GPU in Pytorch. 

### 1️⃣ Generate features

First you prepare the input data and normalize it per sample. The input to ROCKET Pytorch is a 3d tensor of shape (samples, vars, len), preferrable on gpu.

The way to use ROCKET in Pytorch is the following:

* Create a dataset as you would normally do in `tsai`. 
* Create a TSDataLoaders with the following kwargs: 
    * drop_last=False. In this way we get features for every input sample.
    * shuffle_train=False
    * batch_tfms=[TSStandardize(by_sample=True)] so that input is normalized by sample, as recommended by the authors


In [128]:
#X, y, splits = get_UCR_data('HandMovementDirection', split_data=False)
#splits = RandomSplitter()(range_of(data_p1))

splits = TrainValidTestSplitter(stratify = True, random_state= 10, valid_size = 0.2)(range_of(data_p1))
tfms  = [None, [Categorize()]]
batch_tfms = [TSStandardize(by_sample=True)]
dls = get_ts_dls(data_p1, labels_p1,splits = splits, tfms=tfms, drop_last=False, 
                 shuffle_train=True, batch_tfms=batch_tfms, bs=10_000)

In [129]:
splits

((#32) [31,12,5,6,3,21,20,34,1,18...], (#8) [2,27,35,30,14,13,7,24])

☣️☣️ You will be able to create a dls (TSDataLoaders) object with unusually large batch sizes. I've tested it with a large dataset and a batch size = 100_000 and it worked fine. This is because ROCKET is not a usual Deep Learning model. It just applies convolutions (kernels) one at a time to create the features.

Instantiate a rocket model with the desired n_kernels (authors use 10_000) and kernel sizes (7, 9 and 11 in the original paper). 

In [131]:
model = build_ts_model(ROCKET, dls=dls, n_kernels = 20000, kss = [7, 9, 11]) # n_kernels=10_000, kss=[7, 9, 11] set by default, but you can pass other values as kwargs

Now generate rocket features for the entire train and valid datasets using the create_rocket_features convenience function `create_rocket_features`.

And we now transform the original data, creating 20k features per sample

In [132]:
X_train, y_train = create_rocket_features(dls.train, model)
X_valid, y_valid = create_rocket_features(dls.valid, model)
X_train.shape, X_valid.shape

((32, 40000), (8, 40000))

### 2️⃣ Apply a classifier

Once you build the 20k features per sample, you can use them to train any classifier of your choice.

#### RidgeClassifierCV

And now you apply a classifier of your choice. 
With RidgeClassifierCV in particular, there's no need to normalize the calculated features before passing them to the classifier, as it does it internally (if normalize is set to True as recommended by the authors).

In [133]:
from sklearn.linear_model import RidgeClassifierCV
ridge = RidgeClassifierCV(alphas=np.logspace(-8, 8, 17), normalize=True)
ridge.fit(X_train, y_train)
#print(f'alpha: {ridge.alpha_:.2E}  train: {ridge.score(X_train, y_train):.5f}  valid: {ridge.score(X_valid, y_valid):.5f}')


alpha: 1.00E+01  train: 1.00000  valid: 1.00000


In [None]:
valid_scores = []
    
for i in tqdm(range(0, 50)):
    splits = TrainValidTestSplitter(stratify = True, random_state= i, valid_size = 0.2)(range_of(data_p1))
    tfms  = [None, [Categorize()]]
    batch_tfms = [TSStandardize(by_sample=True)]
    dls = get_ts_dls(data_p1, labels_p1,splits = splits, tfms=tfms, drop_last=False, 
                     shuffle_train=True, batch_tfms=batch_tfms, bs=10_000)
    
    model = build_ts_model(ROCKET, dls=dls, n_kernels = 20000, kss = [7, 9, 11])
    
    X_train, y_train = create_rocket_features(dls.train, model)
    X_valid, y_valid = create_rocket_features(dls.valid, model)
    
    ridge = RidgeClassifierCV(alphas=np.logspace(-8, 8, 17), normalize=True)
    ridge.fit(X_train, y_train)
    valid_scores.append(ridge.score(X_valid, y_valid))


  0%|          | 0/50 [00:00<?, ?it/s]

In [None]:
print(sum(valid_scores)/len(valid_scores), np.std(valid_scores)/len(valid_scores))

This result is amazing!! The previous state of the art (Inceptiontime) was .37837

#### Logistic Regression

In the case of other classifiers (like Logistic Regression), the authors recommend a per-feature normalization.

In [113]:
eps = 1e-5
Cs = np.logspace(-5, 5, 11)
from sklearn.linear_model import LogisticRegression
best_loss = np.inf
for i, C in enumerate(Cs):
    f_mean = X_train.mean(axis=0, keepdims=True)
    f_std = X_train.std(axis=0, keepdims=True) + eps  # epsilon to avoid dividing by 0
    X_train_tfm2 = (X_train - f_mean) / f_std
    X_valid_tfm2 = (X_valid - f_mean) / f_std
    classifier = LogisticRegression(penalty='l2', C=C, n_jobs=-1)
    classifier.fit(X_train_tfm2, y_train)
    probas = classifier.predict_proba(X_train_tfm2)
    loss = nn.CrossEntropyLoss()(torch.tensor(probas), torch.tensor(y_train)).item()
    train_score = classifier.score(X_train_tfm2, y_train)
    val_score = classifier.score(X_valid_tfm2, y_valid)
    if loss < best_loss:
        best_eps = eps
        best_C = C
        best_loss = loss
        best_train_score = train_score
        best_val_score = val_score
    print('{:2} eps: {:.2E}  C: {:.2E}  loss: {:.5f}  train_acc: {:.5f}  valid_acc: {:.5f}'.format(
        i, eps, C, loss, train_score, val_score))
print('\nBest result:')
print('eps: {:.2E}  C: {:.2E}  train_loss: {:.5f}  train_acc: {:.5f}  valid_acc: {:.5f}'.format(
        best_eps, best_C, best_loss, best_train_score, best_val_score))

 0 eps: 1.00E-05  C: 1.00E-05  loss: 1.26409  train_acc: 0.96875  valid_acc: 0.62500
 1 eps: 1.00E-05  C: 1.00E-04  loss: 0.93947  train_acc: 1.00000  valid_acc: 0.75000
 2 eps: 1.00E-05  C: 1.00E-03  loss: 0.77945  train_acc: 1.00000  valid_acc: 0.75000
 3 eps: 1.00E-05  C: 1.00E-02  loss: 0.74896  train_acc: 1.00000  valid_acc: 0.75000
 4 eps: 1.00E-05  C: 1.00E-01  loss: 0.74438  train_acc: 1.00000  valid_acc: 0.75000
 5 eps: 1.00E-05  C: 1.00E+00  loss: 0.74376  train_acc: 1.00000  valid_acc: 0.75000
 6 eps: 1.00E-05  C: 1.00E+01  loss: 0.74368  train_acc: 1.00000  valid_acc: 0.75000
 7 eps: 1.00E-05  C: 1.00E+02  loss: 0.74367  train_acc: 1.00000  valid_acc: 0.75000
 8 eps: 1.00E-05  C: 1.00E+03  loss: 0.74367  train_acc: 1.00000  valid_acc: 0.75000
 9 eps: 1.00E-05  C: 1.00E+04  loss: 0.74367  train_acc: 1.00000  valid_acc: 0.75000
10 eps: 1.00E-05  C: 1.00E+05  loss: 0.74367  train_acc: 1.00000  valid_acc: 0.75000

Best result:
eps: 1.00E-05  C: 1.00E+02  train_loss: 0.74367  tr

☣️ Note: Epsilon has a large impact on the result. You can actually test several values to find the one that best fits your problem, but bear in mind you can only select C and epsilon based on train data!!! 

##### RandomSearch

One way to do this would be to perform a random search using several epsilon and C values

In [114]:
n_tests = 10
epss = np.logspace(-8, 0, 9)
Cs = np.logspace(-5, 5, 11)

from sklearn.linear_model import LogisticRegression
best_loss = np.inf
for i in range(n_tests):
    eps = np.random.choice(epss)
    C = np.random.choice(Cs)
    f_mean = X_train.mean(axis=0, keepdims=True)
    f_std = X_train.std(axis=0, keepdims=True) + eps  # epsilon
    X_train_tfm2 = (X_train - f_mean) / f_std
    X_valid_tfm2 = (X_valid - f_mean) / f_std
    classifier = LogisticRegression(penalty='l2', C=C, n_jobs=-1)
    classifier.fit(X_train_tfm2, y_train)
    probas = classifier.predict_proba(X_train_tfm2)
    loss = nn.CrossEntropyLoss()(torch.tensor(probas), torch.tensor(y_train)).item()
    train_score = classifier.score(X_train_tfm2, y_train)
    val_score = classifier.score(X_valid_tfm2, y_valid)
    if loss < best_loss:
        best_eps = eps
        best_C = C
        best_loss = loss
        best_train_score = train_score
        best_val_score = val_score
    print('{:2}  eps: {:.2E}  C: {:.2E}  loss: {:.5f}  train_acc: {:.5f}  valid_acc: {:.5f}'.format(
        i, eps, C, loss, train_score, val_score))
print('\nBest result:')
print('eps: {:.2E}  C: {:.2E}  train_loss: {:.5f}  train_acc: {:.5f}  valid_acc: {:.5f}'.format(
        best_eps, best_C, best_loss, best_train_score, best_val_score))

 0  eps: 1.00E-05  C: 1.00E+03  loss: 0.74367  train_acc: 1.00000  valid_acc: 0.75000
 1  eps: 1.00E-02  C: 1.00E+04  loss: 0.74367  train_acc: 1.00000  valid_acc: 0.75000
 2  eps: 1.00E-03  C: 1.00E-05  loss: 1.26586  train_acc: 0.96875  valid_acc: 0.62500
 3  eps: 1.00E-05  C: 1.00E-02  loss: 0.74896  train_acc: 1.00000  valid_acc: 0.75000
 4  eps: 1.00E-06  C: 1.00E+01  loss: 0.74368  train_acc: 1.00000  valid_acc: 0.75000
 5  eps: 1.00E-04  C: 1.00E-03  loss: 0.77949  train_acc: 1.00000  valid_acc: 0.75000
 6  eps: 1.00E-05  C: 1.00E-04  loss: 0.93947  train_acc: 1.00000  valid_acc: 0.75000
 7  eps: 1.00E-01  C: 1.00E+03  loss: 0.74367  train_acc: 1.00000  valid_acc: 0.75000
 8  eps: 1.00E-04  C: 1.00E-01  loss: 0.74438  train_acc: 1.00000  valid_acc: 0.75000
 9  eps: 1.00E+00  C: 1.00E+00  loss: 0.74389  train_acc: 1.00000  valid_acc: 0.75000

Best result:
eps: 1.00E-01  C: 1.00E+03  train_loss: 0.74367  train_acc: 1.00000  valid_acc: 0.75000
