# TurboLoader Quick Start Guide

Welcome to TurboLoader! This notebook will guide you through the basics of using TurboLoader for high-performance ML data loading.

## What is TurboLoader?

TurboLoader is a production-ready data loading library that provides:
- **Fast**: 8,000+ images/sec with SIMD-accelerated transforms
- **Efficient**: Memory-mapped I/O and zero-copy operations
- **Easy**: Drop-in replacement for PyTorch DataLoader

Let's get started!

## Installation

If you haven't installed TurboLoader yet, run:

In [None]:
!pip install turboloader

## 1. Basic Data Loading

Let's start with the simplest example - loading images from a TAR archive.

In [None]:
import turboloader
import numpy as np

print(f"TurboLoader version: {turboloader.__version__}")

In [None]:
# Create a DataLoader
loader = turboloader.DataLoader(
    'path/to/your/dataset.tar',  # Path to TAR archive
    batch_size=32,                # Samples per batch
    num_workers=4,                # Parallel worker threads
    shuffle=False                 # Shuffle data (for training)
)

# Iterate over batches
for batch in loader:
    # Each batch is a list of samples
    print(f"Batch size: {len(batch)}")
    
    # Access first sample
    sample = batch[0]
    image = sample['image']  # NumPy array (H, W, C)
    
    print(f"Image shape: {image.shape}")
    print(f"Image dtype: {image.dtype}")
    
    break  # Just load one batch for demo

## 2. Using Transforms

TurboLoader provides 19 SIMD-accelerated transforms for data augmentation.

In [None]:
# Create transform pipeline
transforms = turboloader.Compose([
    turboloader.Resize(256, 256),              # Resize to 256x256
    turboloader.RandomCrop(224, 224),          # Random crop to 224x224
    turboloader.RandomHorizontalFlip(0.5),     # Flip with 50% probability
    turboloader.ColorJitter(0.2, 0.2, 0.2, 0.1), # Color augmentation
    turboloader.ImageNetNormalize()            # ImageNet normalization
])

print("Transform pipeline created!")
print(f"Number of transforms: {len(transforms)}")

In [None]:
# Apply transforms manually
loader = turboloader.DataLoader(
    'path/to/your/dataset.tar',
    batch_size=16,
    num_workers=4
)

for batch in loader:
    # Apply transforms to each sample
    transformed_images = []
    for sample in batch:
        img = sample['image']
        # Apply each transform in sequence
        for transform in transforms:
            img = transform.apply(img)
        transformed_images.append(img)
    
    print(f"Processed {len(transformed_images)} images")
    print(f"Transformed shape: {transformed_images[0].shape}")
    break

## 3. PyTorch Integration

TurboLoader integrates seamlessly with PyTorch.

In [None]:
import torch
import torch.nn as nn

# Create DataLoader
loader = turboloader.DataLoader(
    'path/to/your/dataset.tar',
    batch_size=64,
    num_workers=8,
    shuffle=True
)

# Convert to PyTorch tensors
to_tensor = turboloader.ToTensor(
    format=turboloader.TensorFormat.PYTORCH_CHW  # Convert to (C, H, W)
)

# Simple training loop
model = nn.Sequential(
    nn.Conv2d(3, 64, 3),
    nn.ReLU(),
    nn.AdaptiveAvgPool2d(1),
    nn.Flatten(),
    nn.Linear(64, 10)
)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
optimizer = torch.optim.Adam(model.parameters())
criterion = nn.CrossEntropyLoss()

# Training
for epoch in range(3):
    for batch in loader:
        # Convert to tensors
        images = []
        labels = []
        
        for sample in batch:
            img = to_tensor.apply(sample['image'])
            images.append(torch.from_numpy(img))
            labels.append(sample.get('label', 0))  # Default label
        
        images = torch.stack(images).float().to(device)
        labels = torch.tensor(labels).long().to(device)
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
        break  # Just one batch for demo
    break

## 4. Performance Comparison

Let's compare TurboLoader vs PyTorch DataLoader performance.

In [None]:
import time

def benchmark_dataloader(loader, num_batches=100):
    """Benchmark data loading speed."""
    start = time.time()
    samples = 0
    
    for i, batch in enumerate(loader):
        if i >= num_batches:
            break
        samples += len(batch)
    
    elapsed = time.time() - start
    throughput = samples / elapsed
    
    return throughput, elapsed

# Benchmark TurboLoader
loader = turboloader.DataLoader(
    'path/to/your/dataset.tar',
    batch_size=32,
    num_workers=4
)

throughput, time_taken = benchmark_dataloader(loader)
print(f"\nTurboLoader Performance:")
print(f"  Throughput: {throughput:.1f} images/sec")
print(f"  Time: {time_taken:.2f}s")

## 5. Advanced Features

### Distributed Training

TurboLoader supports distributed training with deterministic sharding.

In [None]:
# For distributed training (multi-GPU)
import torch.distributed as dist

# Initialize distributed training
# dist.init_process_group(backend='nccl')

# Create distributed loader
loader = turboloader.DataLoader(
    'path/to/your/dataset.tar',
    batch_size=64,
    num_workers=4,
    shuffle=True,
    enable_distributed=True,
    # world_rank=dist.get_rank(),
    # world_size=dist.get_world_size(),
    drop_last=True
)

print("Distributed DataLoader configured!")
print("Each rank will automatically get its own shard.")

### TBL v2 Format

Convert TAR archives to TurboLoader's optimized binary format for even faster loading.

In [None]:
# Convert TAR to TBL v2
writer = turboloader.TblWriterV2(
    output_path="dataset.tbl",
    compression=True  # Enable LZ4 compression
)

# Read from TAR and write to TBL
reader = turboloader.DataLoader('input.tar', batch_size=1, num_workers=1)

for batch in reader:
    for sample in batch:
        writer.add_sample(
            data=sample['image'],
            format=turboloader.SampleFormat.JPEG,
            metadata={'label': sample.get('label', 0)}
        )

writer.finalize()
print("Conversion complete!")
print("TBL format provides 40-60% space savings with LZ4 compression.")

## Summary

You've learned how to:

1. âœ… Create a basic DataLoader
2. âœ… Apply SIMD-accelerated transforms
3. âœ… Integrate with PyTorch training loops
4. âœ… Benchmark performance
5. âœ… Use advanced features (distributed training, TBL format)

## Next Steps

- Check out more examples in the `examples/` directory
- Read the [documentation](https://github.com/ALJainProjects/TurboLoader)
- Run benchmarks to see performance on your data
- Join the community discussions

Happy training! ðŸš€