# ROCKET: A New SOTA Classifier with tsai-rs

This notebook introduces ROCKET (RandOm Convolutional KErnel Transform) and MiniRocket using tsai-rs.

## Purpose

ROCKET is a time series classification method that achieves state-of-the-art performance with exceptional speed.

Key features:
- Uses random convolutional kernels to generate features
- State-of-the-art accuracy on UCR benchmark
- Works with both univariate and multivariate data
- Very fast compared to deep learning methods

**Authors**: Dempster, A., Petitjean, F., & Webb, G. I. (2019)

[Paper](https://arxiv.org/pdf/1910.13051)

## Install tsai-rs

```bash
cd crates/tsai_python
maturin develop --release
```

## Import Libraries

In [None]:
import tsai_rs
import numpy as np
from sklearn.linear_model import RidgeClassifierCV, LogisticRegression
import sklearn.metrics as skm

print(f"tsai-rs version: {tsai_rs.version()}")
tsai_rs.my_setup()

## Load Data

Let's load a multivariate dataset to demonstrate ROCKET/MiniRocket.

In [None]:
# Load multivariate dataset
dsid = 'NATOPS'
X_train, y_train, X_test, y_test = tsai_rs.get_UCR_data(dsid, return_split=True)

print(f"Dataset: {dsid}")
print(f"X_train shape: {X_train.shape} (samples, variables, length)")
print(f"X_test shape: {X_test.shape}")
print(f"Classes: {np.unique(y_train)}")

## Configure MiniRocket

MiniRocket is an improved version of ROCKET with fewer hyperparameters and faster computation.

In [None]:
n_vars = X_train.shape[1]
seq_len = X_train.shape[2]
n_classes = len(np.unique(y_train))

print(f"Variables: {n_vars}")
print(f"Sequence length: {seq_len}")
print(f"Classes: {n_classes}")

In [None]:
# Configure MiniRocket
minirocket_config = tsai_rs.MiniRocketConfig(
    n_vars=n_vars,
    seq_len=seq_len,
    n_classes=n_classes,
    n_features=10000  # Default number of features
)

print(f"MiniRocket Config: {minirocket_config}")
print(f"  n_vars: {minirocket_config.n_vars}")
print(f"  seq_len: {minirocket_config.seq_len}")
print(f"  n_classes: {minirocket_config.n_classes}")
print(f"  n_features: {minirocket_config.n_features}")

## Preprocessing

For ROCKET/MiniRocket, the authors recommend standardizing by sample.

In [None]:
# Standardize data (by sample, as recommended by ROCKET authors)
X_train_std = tsai_rs.ts_standardize(X_train.astype(np.float32), by_sample=True)
X_test_std = tsai_rs.ts_standardize(X_test.astype(np.float32), by_sample=True)

print(f"Before standardization:")
print(f"  Sample 0 mean: {X_train[0].mean():.4f}, std: {X_train[0].std():.4f}")
print(f"After standardization:")
print(f"  Sample 0 mean: {X_train_std[0].mean():.4f}, std: {X_train_std[0].std():.4f}")

## Try Different Datasets

In [None]:
# Test with several UCR datasets
datasets = ['ECG200', 'FordA', 'Wafer', 'GunPoint', 'Coffee']

print("Dataset configurations for MiniRocket:")
print("-" * 70)

for dsid in datasets:
    try:
        X_train, y_train, X_test, y_test = tsai_rs.get_UCR_data(dsid, return_split=True)
        n_vars = X_train.shape[1]
        seq_len = X_train.shape[2]
        n_classes = len(np.unique(y_train))
        
        config = tsai_rs.MiniRocketConfig(
            n_vars=n_vars,
            seq_len=seq_len,
            n_classes=n_classes,
            n_features=10000
        )
        
        print(f"{dsid:20} | train: {X_train.shape[0]:4d} | test: {X_test.shape[0]:4d} | "
              f"vars: {n_vars:2d} | len: {seq_len:4d} | classes: {n_classes}")
    except Exception as e:
        print(f"{dsid:20} | Error: {e}")

## Multivariate Datasets

In [None]:
# Multivariate datasets
multivariate = ['NATOPS', 'BasicMotions', 'Epilepsy']

print("Multivariate dataset configurations:")
print("-" * 70)

for dsid in multivariate:
    try:
        X_train, y_train, X_test, y_test = tsai_rs.get_UCR_data(dsid, return_split=True)
        n_vars = X_train.shape[1]
        seq_len = X_train.shape[2]
        n_classes = len(np.unique(y_train))
        
        config = tsai_rs.MiniRocketConfig(
            n_vars=n_vars,
            seq_len=seq_len,
            n_classes=n_classes
        )
        
        print(f"{dsid:20} | vars: {n_vars:2d} | len: {seq_len:4d} | classes: {n_classes} | config: {config}")
    except Exception as e:
        print(f"{dsid:20} | Error: {e}")

## Data Augmentation for ROCKET

You can apply augmentation before feature extraction.

In [None]:
# Load a dataset
dsid = 'ECG200'
X_train, y_train, X_test, y_test = tsai_rs.get_UCR_data(dsid, return_split=True)

# Standardize
X_train_std = tsai_rs.ts_standardize(X_train.astype(np.float32), by_sample=True)
X_test_std = tsai_rs.ts_standardize(X_test.astype(np.float32), by_sample=True)

# Apply augmentation (can help with small datasets)
X_train_aug1 = tsai_rs.add_gaussian_noise(X_train_std, std=0.05, seed=42)
X_train_aug2 = tsai_rs.mag_scale(X_train_std, scale_range=(0.9, 1.1), seed=42)

print(f"Original train shape: {X_train_std.shape}")
print(f"With augmentation: could concatenate {X_train_std.shape[0] * 3} samples")

## Analysis with Confusion Matrix

In [None]:
# Simulate ROCKET predictions (in practice, use actual trained model)
np.random.seed(42)
y_test_int = y_test.astype(np.int64)
n_classes = len(np.unique(y_test))

# Simulate high accuracy predictions (ROCKET typically achieves >95%)
y_pred = y_test_int.copy()
n_wrong = max(1, int(len(y_test) * 0.05))  # 5% error rate
wrong_idx = np.random.choice(len(y_test), n_wrong, replace=False)
y_pred[wrong_idx] = (y_pred[wrong_idx] + 1) % n_classes

# Compute confusion matrix
cm = tsai_rs.confusion_matrix(y_pred, y_test_int, n_classes=n_classes)
print(f"Confusion Matrix: {cm}")
print(f"Accuracy: {cm.accuracy():.4f}")
print(f"Macro F1: {cm.macro_f1():.4f}")

In [None]:
# Get matrix as numpy array
cm_matrix = cm.matrix()
print(f"Confusion matrix:\n{cm_matrix}")

## Summary

This notebook demonstrated:

1. **MiniRocket Configuration**: `MiniRocketConfig` for both univariate and multivariate data
2. **Data Loading**: UCR datasets with `get_UCR_data`
3. **Preprocessing**: `ts_standardize` with `by_sample=True` (ROCKET recommendation)
4. **Augmentation**: `add_gaussian_noise`, `mag_scale` for data augmentation
5. **Analysis**: `confusion_matrix` for evaluation

### ROCKET vs MiniRocket

| Feature | ROCKET | MiniRocket |
|---------|--------|------------|
| Kernels | 10,000 | 10,000 |
| Kernel sizes | [7, 9, 11] | Fixed |
| Features per kernel | 2 | 1 |
| Total features | 20,000 | 10,000 |
| Speed | Fast | Faster |

### Usage Example

For full training with tsai-rs CLI:

```bash
tsai train --model MiniRocket --dataset ECG200 --n-kernels 10000
```

In [None]:
# Complete workflow example
dsid = 'NATOPS'
X_train, y_train, X_test, y_test = tsai_rs.get_UCR_data(dsid, return_split=True)

# Preprocess
X_train_std = tsai_rs.ts_standardize(X_train.astype(np.float32), by_sample=True)
X_test_std = tsai_rs.ts_standardize(X_test.astype(np.float32), by_sample=True)

# Configure
n_vars, seq_len = X_train.shape[1], X_train.shape[2]
n_classes = len(np.unique(y_train))

config = tsai_rs.MiniRocketConfig(
    n_vars=n_vars,
    seq_len=seq_len,
    n_classes=n_classes,
    n_features=10000
)

print(f"Ready for MiniRocket training on {dsid}!")
print(f"Config: {config}")