# Tiny Alignment Studio - Colab Demo

This notebook demonstrates the end-to-end workflow of Tiny Alignment Studio on Google Colab.
It covers setup, data preparation, DPO training, and evaluation.

## 1. Setup Environment

In [None]:
# Check GPU status
!nvidia-smi

In [None]:
# Clone the repository
!git clone https://github.com/QuackPhuc/tiny-alignment-studio.git
%cd tiny-alignment-studio

In [None]:
# Pull latest changes (in case of updates)
!git pull origin main

In [None]:
# Upgrade pip and setuptools to avoid build errors
!pip install --upgrade pip setuptools wheel

# Install dependencies (approx. 1-2 minutes)
!pip install -e ".[dev]"
!pip install pyngrok  # Required for dashboard tunneling

## 2. Prepare Data

Download a sample of the Anthropic HH-RLHF dataset, validate it, and format it for DPO.

In [None]:
!python scripts/prepare_data.py --source Anthropic/hh-rlhf --max-samples 1000 --output-dir outputs/data

## 3. Train DPO Model

Run the alignment training using the default configuration (TinyLlama + QLoRA).
On a T4 GPU, this should take about 5-10 minutes for 1 epoch on 1000 samples.

In [None]:
!python scripts/train.py --config configs/base.yaml

## 4. Evaluate

Load the base model and the trained adapter to generate a response.

In [None]:
!python scripts/evaluate.py --adapter outputs/adapter --prompt "What is the best way to invest money?"

## 5. Launch Dashboard (Optional)

To view the Streamlit dashboard, use ngrok to tunnel the port.

In [None]:
from pyngrok import ngrok

# Terminate open tunnels if any
ngrok.kill()

# UNCOMMENT AND PASTE YOUR AUTHTOKEN HERE
# ngrok.set_auth_token("YOUR_AUTHTOKEN_HERE")

# Open an HTTPs tunnel on port 8501
try:
    public_url = ngrok.connect(8501)
    print(f"Streamlit active at: {public_url}")
    
    # Run Streamlit in background
    !streamlit run src/ui/app.py &>/dev/null&
except Exception as e:
    print("Ngrok error:", e)
    print("Make sure you have set your authtoken!")