<a href="https://github.com/timeseriesAI/tsai-rs" target="_parent"><img src="https://img.shields.io/badge/tsai--rs-Time%20Series%20AI%20in%20Rust-blue" alt="tsai-rs"/></a>

# How to Work with Numpy Arrays in tsai-rs

This notebook demonstrates how to use numpy arrays with **tsai-rs** for time series classification.

## Requirements

tsai-rs supports:
- Univariate and multivariate time series
- Labelled (X, y) and unlabelled (X) datasets
- Pre-split train/valid data
- In-memory and on-disk arrays (np.memmap)
- Efficient batch processing

## Install tsai-rs

```bash
cd crates/tsai_python
maturin develop --release
```

## Import Libraries

In [None]:
import tsai_rs
import numpy as np
import matplotlib.pyplot as plt

print(f"tsai-rs version: {tsai_rs.version()}")
tsai_rs.my_setup()

## Load Data

In [None]:
# Load UCR dataset with train/test split
dsid = 'NATOPS'
X_train, y_train, X_test, y_test = tsai_rs.get_UCR_data(dsid, return_split=True)

print(f"X_train: {X_train.shape}, y_train: {y_train.shape}")
print(f"X_test: {X_test.shape}, y_test: {y_test.shape}")

In [None]:
# Data shape: (samples, variables, sequence_length)
n_samples = X_train.shape[0]
n_vars = X_train.shape[1]
seq_len = X_train.shape[2]
n_classes = len(np.unique(y_train))

print(f"Samples: {n_samples}")
print(f"Variables: {n_vars}")
print(f"Sequence length: {seq_len}")
print(f"Classes: {n_classes}")

## Data Preprocessing

In [None]:
# Standardize data (per sample)
X_train_std = tsai_rs.ts_standardize(X_train.astype(np.float32), by_sample=True)
X_test_std = tsai_rs.ts_standardize(X_test.astype(np.float32), by_sample=True)

print(f"Before standardization:")
print(f"  X_train mean: {X_train.mean():.4f}, std: {X_train.std():.4f}")

print(f"\nAfter standardization:")
print(f"  X_train_std mean: {X_train_std.mean():.6f}, std: {X_train_std.std():.4f}")

## Creating Datasets

In [None]:
# Create TSDataset objects
train_ds = tsai_rs.TSDataset(X_train_std, y_train)
test_ds = tsai_rs.TSDataset(X_test_std, y_test)

print(f"Train dataset: {train_ds}")
print(f"Test dataset: {test_ds}")

## Model Configuration

In [None]:
# Configure model
config = tsai_rs.InceptionTimePlusConfig(
    n_vars=n_vars,
    seq_len=seq_len,
    n_classes=n_classes
)

print(f"Model config: {config}")

In [None]:
# Configure training
learner_config = tsai_rs.LearnerConfig(
    lr=1e-3,
    weight_decay=0.01,
    grad_clip=1.0
)

print(f"Learner config: {learner_config}")

## Data Augmentation

In [None]:
# Apply augmentation
X_aug = tsai_rs.add_gaussian_noise(X_train_std, std=0.05, seed=42)
X_aug = tsai_rs.mag_scale(X_aug, scale_range=(0.9, 1.1), seed=42)

print(f"Original shape: {X_train_std.shape}")
print(f"Augmented shape: {X_aug.shape}")

In [None]:
# Visualize augmentation
fig, axes = plt.subplots(2, 1, figsize=(12, 6))

sample_idx = 0
var_idx = 0

axes[0].plot(X_train_std[sample_idx, var_idx, :], label='Original')
axes[0].set_title('Original Time Series')

axes[1].plot(X_aug[sample_idx, var_idx, :], label='Augmented', color='orange')
axes[1].set_title('Augmented Time Series')

plt.tight_layout()
plt.show()

## Working with Different Datasets

In [None]:
# Try different UCR datasets
datasets = ['ECG200', 'GunPoint', 'FordA', 'NATOPS']

print(f"{'Dataset':<15} {'Shape':<25} {'Classes':<10}")
print("-" * 50)

for dsid in datasets:
    try:
        X, y, _, _ = tsai_rs.get_UCR_data(dsid, return_split=True)
        n_classes = len(np.unique(y))
        print(f"{dsid:<15} {str(X.shape):<25} {n_classes:<10}")
    except Exception as e:
        print(f"{dsid:<15} Error: {e}")

## Complete Pipeline

In [None]:
def classification_pipeline(dsid, model_type='inception'):
    """Complete time series classification pipeline."""
    
    # 1. Load data
    print(f"Loading {dsid}...")
    X_train, y_train, X_test, y_test = tsai_rs.get_UCR_data(dsid, return_split=True)
    
    n_vars = X_train.shape[1]
    seq_len = X_train.shape[2]
    n_classes = len(np.unique(y_train))
    
    print(f"  Shape: {X_train.shape}")
    print(f"  Variables: {n_vars}, Length: {seq_len}, Classes: {n_classes}")
    
    # 2. Preprocess
    X_train_std = tsai_rs.ts_standardize(X_train.astype(np.float32), by_sample=True)
    X_test_std = tsai_rs.ts_standardize(X_test.astype(np.float32), by_sample=True)
    
    # 3. Create datasets
    train_ds = tsai_rs.TSDataset(X_train_std, y_train)
    test_ds = tsai_rs.TSDataset(X_test_std, y_test)
    
    # 4. Configure model
    if model_type == 'inception':
        config = tsai_rs.InceptionTimePlusConfig(
            n_vars=n_vars, seq_len=seq_len, n_classes=n_classes
        )
    elif model_type == 'resnet':
        config = tsai_rs.ResNetPlusConfig(
            n_vars=n_vars, seq_len=seq_len, n_classes=n_classes
        )
    elif model_type == 'tst':
        config = tsai_rs.TSTConfig(
            n_vars=n_vars, seq_len=seq_len, n_classes=n_classes
        )
    else:
        raise ValueError(f"Unknown model type: {model_type}")
    
    print(f"  Model: {config}")
    
    # 5. Configure training
    learner_config = tsai_rs.LearnerConfig(
        lr=1e-3,
        weight_decay=0.01,
        grad_clip=1.0
    )
    
    print(f"  Ready for training!")
    
    return config, train_ds, test_ds, learner_config

# Run pipeline
config, train_ds, test_ds, learner_config = classification_pipeline('NATOPS', 'inception')

## Summary

### Key Benefits
- Easy numpy array support
- Efficient batch processing
- Support for univariate and multivariate data
- Built-in standardization and augmentation

### Pattern
```python
# Load data
X_train, y_train, X_test, y_test = tsai_rs.get_UCR_data(dsid, return_split=True)

# Standardize
X_train_std = tsai_rs.ts_standardize(X_train.astype(np.float32), by_sample=True)

# Create dataset
train_ds = tsai_rs.TSDataset(X_train_std, y_train)

# Configure model
config = tsai_rs.InceptionTimePlusConfig(n_vars, seq_len, n_classes)
```

In [None]:
# Quick reference
print("Numpy Arrays Quick Reference")
print("=" * 50)
print("\n# Load data")
print("X_train, y_train, X_test, y_test = tsai_rs.get_UCR_data(dsid, return_split=True)")
print("\n# Standardize")
print("X_std = tsai_rs.ts_standardize(X.astype(np.float32), by_sample=True)")
print("\n# Create dataset")
print("ds = tsai_rs.TSDataset(X_std, y)")
print("\n# Augment")
print("X_aug = tsai_rs.add_gaussian_noise(X, std=0.05, seed=42)")