<a href="https://github.com/timeseriesAI/tsai-rs" target="_parent"><img src="https://img.shields.io/badge/tsai--rs-Time%20Series%20AI%20in%20Rust-blue" alt="tsai-rs"/></a>

# How to Train with Big Arrays Faster using tsai-rs

This notebook demonstrates efficient training strategies for larger-than-memory datasets using **tsai-rs**.

## Purpose

When training models on large datasets, the bottleneck is often data loading rather than GPU computation.

Key concepts covered:
1. **Memory-mapped arrays**: Keep data on disk, load only what's needed
2. **Efficient batching**: Create batches without loading full dataset
3. **Parallel data loading**: Use multiple workers for faster loading
4. **Data chunking**: Optimize chunk sizes for your workload

## Install tsai-rs

```bash
cd crates/tsai_python
maturin develop --release
```

## Import Libraries

In [None]:
import tsai_rs
import numpy as np
import os
import time
from pathlib import Path

print(f"tsai-rs version: {tsai_rs.version()}")
tsai_rs.my_setup()

## Memory-Mapped Arrays

Memory-mapped arrays (`np.memmap`) allow you to work with large datasets that don't fit in memory.

Key benefits:
- Data stays on disk until accessed
- Only the accessed portion is loaded into memory
- Efficient for creating batches from large datasets

### Create Sample Data

In [None]:
# Create a data directory
path = Path('data')
if not os.path.exists(path):
    os.makedirs(path)

# Create a moderate-sized array for demonstration
n_samples = 10000
n_vars = 10
seq_len = 100

arr = np.random.rand(n_samples, n_vars, seq_len).astype(np.float32)
np.save(path/'arr.npy', arr)

print(f"Array shape: {arr.shape}")
print(f"Array size: {arr.nbytes / 1e6:.2f} MB")

### Load as Memory-Mapped Array

In [None]:
# Load as memory-mapped array
# mmap_mode options:
# 'r'  - read only
# 'r+' - read and write
# 'c'  - copy-on-write (changes in memory only)

memmap_arr = np.load(path/'arr.npy', mmap_mode='c')
print(f"Type: {type(memmap_arr)}")
print(f"Shape: {memmap_arr.shape}")

### Compare Performance: In-Memory vs Memory-Mapped

In [None]:
# Simulate batch creation
batch_size = 64

def create_batch_indices(n_samples, batch_size):
    return np.random.choice(n_samples, batch_size, replace=False)

# In-memory array indexing
n_iterations = 100
start = time.time()
for _ in range(n_iterations):
    idx = create_batch_indices(n_samples, batch_size)
    batch = arr[idx]
elapsed_memory = time.time() - start

# Memory-mapped array indexing
start = time.time()
for _ in range(n_iterations):
    idx = create_batch_indices(n_samples, batch_size)
    batch = memmap_arr[idx]
elapsed_memmap = time.time() - start

print(f"In-memory: {elapsed_memory:.4f}s for {n_iterations} batches")
print(f"Memory-mapped: {elapsed_memmap:.4f}s for {n_iterations} batches")
print(f"Ratio: {elapsed_memmap/elapsed_memory:.2f}x")

## Efficient Data Loading with tsai-rs

tsai-rs provides efficient data structures for time series data.

### TSDataset for Efficient Batching

In [None]:
# Load real dataset
dsid = 'NATOPS'
X_train, y_train, X_test, y_test = tsai_rs.get_UCR_data(dsid, return_split=True)

n_vars = X_train.shape[1]
seq_len = X_train.shape[2]
n_classes = len(np.unique(y_train))

print(f"Dataset: {dsid}")
print(f"Shape: {X_train.shape}")
print(f"Variables: {n_vars}, Sequence length: {seq_len}, Classes: {n_classes}")

In [None]:
# Standardize
X_train_std = tsai_rs.ts_standardize(X_train.astype(np.float32), by_sample=True)
X_test_std = tsai_rs.ts_standardize(X_test.astype(np.float32), by_sample=True)

# Create TSDataset
train_ds = tsai_rs.TSDataset(X_train_std, y_train)
test_ds = tsai_rs.TSDataset(X_test_std, y_test)

print(f"Train dataset: {train_ds}")
print(f"Test dataset: {test_ds}")

### Benchmark Batch Creation

In [None]:
# Benchmark batch creation with numpy
batch_size = 64
n_iterations = 1000

start = time.time()
for _ in range(n_iterations):
    idx = np.random.choice(len(X_train_std), batch_size, replace=False)
    X_batch = X_train_std[idx]
    y_batch = y_train[idx]
elapsed = time.time() - start

print(f"Numpy batch creation: {elapsed*1000/n_iterations:.3f}ms per batch")
print(f"Batches per second: {n_iterations/elapsed:.0f}")

## Strategies for Large Datasets

When working with larger-than-memory datasets:

### 1. Use Memory-Mapped Arrays

In [None]:
# Save processed data to disk
np.save(path/'X_processed.npy', X_train_std)
np.save(path/'y_processed.npy', y_train)

# Load as memory-mapped
X_mmap = np.load(path/'X_processed.npy', mmap_mode='c')
y_mmap = np.load(path/'y_processed.npy', mmap_mode='c')

print(f"X_mmap type: {type(X_mmap)}")
print(f"y_mmap type: {type(y_mmap)}")

### 2. Process Data in Chunks

In [None]:
def process_in_chunks(X, chunksize=1000):
    """Process large arrays in chunks to avoid memory issues."""
    n_samples = len(X)
    n_chunks = (n_samples + chunksize - 1) // chunksize
    
    results = []
    for i in range(n_chunks):
        start = i * chunksize
        end = min((i + 1) * chunksize, n_samples)
        
        # Process chunk
        chunk = X[start:end]
        processed = tsai_rs.ts_standardize(chunk.astype(np.float32), by_sample=True)
        results.append(processed)
    
    return np.concatenate(results, axis=0)

# Example usage (with small data for demonstration)
X_chunked = process_in_chunks(X_train, chunksize=50)
print(f"Processed shape: {X_chunked.shape}")

### 3. Efficient Index Sorting

Sorting indices before accessing memory-mapped arrays can improve performance due to sequential disk access.

In [None]:
# Compare sorted vs unsorted index access
n_iterations = 100
batch_size = 128

# Unsorted indices
start = time.time()
for _ in range(n_iterations):
    idx = np.random.choice(len(X_mmap), batch_size, replace=False)
    batch = X_mmap[idx]
elapsed_unsorted = time.time() - start

# Sorted indices
start = time.time()
for _ in range(n_iterations):
    idx = np.sort(np.random.choice(len(X_mmap), batch_size, replace=False))
    batch = X_mmap[idx]
elapsed_sorted = time.time() - start

print(f"Unsorted indices: {elapsed_unsorted:.4f}s")
print(f"Sorted indices: {elapsed_sorted:.4f}s")
print(f"Improvement: {(elapsed_unsorted - elapsed_sorted) / elapsed_unsorted * 100:.1f}%")

## Training Configuration for Large Datasets

In [None]:
# Model configuration
config = tsai_rs.InceptionTimePlusConfig(
    n_vars=n_vars,
    seq_len=seq_len,
    n_classes=n_classes
)

# Training configuration optimized for large datasets
learner_config = tsai_rs.LearnerConfig(
    lr=1e-3,
    weight_decay=0.01,
    grad_clip=1.0
)

print(f"Model config: {config}")
print(f"Learner config: {learner_config}")

### Optimal Batch Sizes

Larger batch sizes can improve GPU utilization but require more memory.

In [None]:
# Memory estimation for different batch sizes
batch_sizes = [32, 64, 128, 256, 512]

print(f"{'Batch Size':<12} {'Memory (MB)':<15} {'Batches/Epoch'}")
print("-" * 45)

for bs in batch_sizes:
    # Estimate memory for batch
    batch_memory = bs * n_vars * seq_len * 4 / 1e6  # float32 = 4 bytes
    n_batches = (len(X_train) + bs - 1) // bs
    print(f"{bs:<12} {batch_memory:<15.2f} {n_batches}")

## Key Learnings

### For In-Memory Datasets
- Use numpy arrays directly
- tsai-rs `TSDataset` provides efficient batching
- Standardize data once before training

### For Larger-than-Memory Datasets
1. **Use memory-mapped arrays** (`np.load(..., mmap_mode='c')`)
2. **Sort indices** before accessing disk-based arrays
3. **Process in chunks** to avoid memory issues
4. **Use appropriate batch sizes** for your GPU memory

### Performance Tips
- Profile your data loading to identify bottlenecks
- Keep frequently accessed data in memory
- Use SSD storage for memory-mapped arrays
- Consider data augmentation at batch time

## Cleanup

In [None]:
# Clean up temporary files
import os

for fname in ['arr.npy', 'X_processed.npy', 'y_processed.npy']:
    fpath = path / fname
    if fpath.exists():
        os.remove(fpath)
        print(f"Removed {fpath}")

## Summary

This notebook covered strategies for efficient training with large datasets:

### Key Techniques
- Memory-mapped arrays for disk-based data
- Chunk-based processing
- Index sorting for sequential access
- Batch size optimization

### tsai-rs Tools
```python
# Data loading and standardization
X_std = tsai_rs.ts_standardize(X.astype(np.float32), by_sample=True)

# Efficient dataset
ds = tsai_rs.TSDataset(X_std, y)

# Model configuration
config = tsai_rs.InceptionTimePlusConfig(
    n_vars=n_vars,
    seq_len=seq_len,
    n_classes=n_classes
)
```

In [None]:
# Quick reference
print("Large Dataset Training Quick Reference")
print("=" * 50)
print("\n# Memory-mapped loading")
print("X_mmap = np.load('data.npy', mmap_mode='c')")
print("\n# Sorted index access (faster for disk)")
print("idx = np.sort(np.random.choice(n, batch_size, replace=False))")
print("batch = X_mmap[idx]")
print("\n# Standardization")
print("X_std = tsai_rs.ts_standardize(X.astype(np.float32), by_sample=True)")
print("\n# TSDataset")
print("ds = tsai_rs.TSDataset(X_std, y)")