# RapidFire AI with TensorBoard in Google Colab

This tutorial demonstrates how to use RapidFire AI with TensorBoard for real-time metrics visualization in Google Colab.

## Why TensorBoard in Colab?

- **Real-time visualization**: View training metrics as they happen
- **No frontend loading delay**: TensorBoard loads instantly in Colab
- **Native Colab support**: TensorBoard works natively with `%tensorboard` magic
- **Live updates**: Metrics update every 30 seconds while training cell is blocked

## Setup

First, let's install RapidFire AI and load the TensorBoard extension:

In [None]:
# Install RapidFire AI
!pip install rapidfireai

# Load TensorBoard extension
%load_ext tensorboard

## Configure RapidFire to Use TensorBoard

We'll set environment variables to tell RapidFire to use TensorBoard instead of MLflow:

In [None]:
import os

# Configure RapidFire to use TensorBoard
os.environ['RF_TRACKING_BACKEND'] = 'tensorboard'  # Options: 'mlflow', 'tensorboard', 'both'
# TensorBoard log directory will be auto-created in experiment path

## Import RapidFire Components

In [None]:
from rapidfireai import Experiment
from rapidfireai.automl import List, RFGridSearch, RFModelConfig, RFLoraConfig, RFSFTConfig

## Load Dataset

In [None]:
from datasets import load_dataset

dataset = load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset")

# Select a subset for demonstration
train_dataset = dataset["train"].select(range(128))
eval_dataset = dataset["train"].select(range(100, 124))
train_dataset = train_dataset.shuffle(seed=42)
eval_dataset = eval_dataset.shuffle(seed=42)

## Define Data Processing Function

In [None]:
def sample_formatting_function(row):
    """Function to preprocess each example from dataset"""
    SYSTEM_PROMPT = "You are a helpful and friendly customer support assistant."
    return {
        "prompt": [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": row["instruction"]},
        ],
        "completion": [
            {"role": "assistant", "content": row["response"]}
        ]
    }

## Initialize Experiment

In [None]:
# Create experiment with unique name
experiment = Experiment(experiment_name="tensorboard-demo")

## Get TensorBoard Log Directory

The TensorBoard logs are stored in the experiment directory. Let's get the path:

In [None]:
# Get experiment path
from rapidfireai.utils.datapaths import DataPath
from rapidfireai.db.rf_db import RfDb

db = RfDb()
experiment_path = db.get_experiments_path("tensorboard-demo")
tensorboard_log_dir = f"{experiment_path}/tensorboard_logs"

print(f"TensorBoard logs will be saved to: {tensorboard_log_dir}")

## Start TensorBoard

**IMPORTANT**: Start TensorBoard BEFORE running training, so you can watch metrics update in real-time!

In [None]:
# Start TensorBoard (will update automatically as training progresses)
%tensorboard --logdir {tensorboard_log_dir}

## Define Model Configuration

We'll use a small model (TinyLlama) for fast training in Colab:

In [None]:
# Define LoRA configs
peft_configs = List([
    RFLoraConfig(
        r=8,
        lora_alpha=16,
        lora_dropout=0.1,
        target_modules=["q_proj", "v_proj"],
        bias="none"
    ),
    RFLoraConfig(
        r=32,
        lora_alpha=64,
        lora_dropout=0.1,
        target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
        bias="none"
    )
])

# Define model configs
config_set = List([
    RFModelConfig(
        model_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
        peft_config=peft_configs,
        training_args=RFSFTConfig(
            learning_rate=1e-3,
            lr_scheduler_type="linear",
            per_device_train_batch_size=4,
            per_device_eval_batch_size=4,
            max_steps=64,  # Short training for demo
            gradient_accumulation_steps=1,
            logging_steps=2,  # Frequent logging for TensorBoard
            eval_strategy="steps",
            eval_steps=8,
            fp16=True,
        ),
        model_type="causal_lm",
        model_kwargs={"device_map": "auto", "torch_dtype": "auto", "use_cache": False},
        formatting_func=sample_formatting_function,
    )
])

## Define Model Creation Function

In [None]:
def sample_create_model(model_config):
    """Function to create model object for any given config"""
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_name = model_config["model_name"]
    model_kwargs = model_config["model_kwargs"]
    
    model = AutoModelForCausalLM.from_pretrained(model_name, **model_kwargs)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return (model, tokenizer)

## Create Config Group

In [None]:
# Simple grid search
config_group = RFGridSearch(
    configs=config_set,
    trainer_type="SFT"
)

## Run Training

**IMPORTANT**: While this cell is running:
1. Switch to the TensorBoard tab above
2. Watch metrics update in real-time (every 30 seconds)
3. See training loss, learning rate, and other metrics appear

This is the key advantage of TensorBoard in Colab - you can monitor training progress even while the cell is blocked!

In [None]:
# Launch training - metrics will appear in TensorBoard above!
experiment.run_fit(
    config_group, 
    sample_create_model, 
    train_dataset, 
    eval_dataset, 
    num_chunks=2,  # 2 chunks for demo
    seed=42
)

## End Experiment

In [None]:
experiment.end()

## View TensorBoard Logs

After training completes, you can still view the full logs:

In [None]:
# View final logs
%tensorboard --logdir {tensorboard_log_dir}

## Using Both MLflow and TensorBoard

You can also log to both backends simultaneously by setting:

```python
os.environ['RF_TRACKING_BACKEND'] = 'both'
```

This gives you:
- **TensorBoard**: Real-time visualization during training
- **MLflow**: Experiment comparison and model registry

## Tips for Colab + TensorBoard

1. **Start TensorBoard first**: Always start TensorBoard before training
2. **Frequent logging**: Set `logging_steps` to a small value (e.g., 2-5) for responsive updates
3. **Refresh rate**: TensorBoard polls logs every 30 seconds in Colab
4. **Multiple experiments**: Use different experiment names for different runs
5. **Clean logs**: Delete old logs with `!rm -rf {tensorboard_log_dir}` to start fresh

## Comparison: TensorBoard vs MLflow in Colab

| Feature | TensorBoard | MLflow |
|---------|-------------|--------|
| Real-time updates | ✅ Yes (30s polling) | ❌ No (frontend load time) |
| Colab native | ✅ %tensorboard magic | ❌ Requires tunneling |
| Load time | ✅ Instant | ❌ 3-5 minutes via tunnel |
| Model registry | ❌ No | ✅ Yes |
| Experiment comparison | ✅ Basic | ✅ Advanced |

**Recommendation**: Use `'both'` backend to get the best of both worlds!

## Next Steps

- Try different model configs and compare in TensorBoard
- Experiment with `'both'` backend for comprehensive tracking
- Check out other RapidFire tutorials for DPO and GRPO training

Happy training! 🚀