# SQL-R1: Text-to-SQL RL Training on Kaggle

**Requirements**: Kaggle GPU Runtime (T4 16GB)

## Overview
- **Paper**: SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
- **Algorithm**: GRPO (Group Relative Policy Optimization)
- **Model**: Qwen2.5-Coder-3B-Instruct

## 1. Environment Setup

⚠️ **After running the installation cell, RESTART THE KERNEL before continuing!**

In [None]:
!nvidia-smi

In [None]:
# Step 1: Install dependencies (RESTART KERNEL after this cell)

!pip install vllm==0.6.3 ray transformers accelerate bitsandbytes --quiet
!pip install flash-attn --no-build-isolation --quiet
!pip install wandb sqlparse func_timeout nltk ijson --quiet
!pip install hydra-core omegaconf --quiet

# Clone SQL-R1
%rm -rf SellWizr-Assignment
!git clone https://github.com/dancinglightning/SellWizr-Assignment.git

%cd SellWizr-Assignment/SQL-R1
!pip install -e . --quiet

print("\n" + "="*60)
print("RESTART KERNEL NOW: Runtime -> Restart runtime")
print("Then skip this cell and run the next one.")
print("="*60)

In [None]:
# Step 2: Run AFTER kernel restart
import os
os.chdir('/kaggle/working/SellWizr-Assignment/SQL-R1')

import torch
import pandas as pd
import numpy as np

print(f"PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}")
print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

## 2. Download Databases

In [None]:
import os, sqlite3, shutil

os.makedirs('data/NL2SQL/SynSQL-2.5M/databases', exist_ok=True)
os.makedirs('data/spider/database', exist_ok=True)

# Create test databases
test_schemas = {
    'concert_singer': ['CREATE TABLE singer (singer_id INT, name TEXT, country TEXT)'],
    'employee_hire_evaluation': ['CREATE TABLE employees (id INT, name TEXT, salary INT)'],
    'world_1': ['CREATE TABLE country (code TEXT, name TEXT, population INT)']
}

for db_name, schemas in test_schemas.items():
    for path in ['data/spider/database', 'data/NL2SQL/SynSQL-2.5M/databases']:
        db_dir = f'{path}/{db_name}'
        os.makedirs(db_dir, exist_ok=True)
        conn = sqlite3.connect(f'{db_dir}/{db_name}.sqlite')
        for s in schemas: conn.execute(s)
        conn.commit(); conn.close()

print(f"Created {len(test_schemas)} databases!")

## 3. Download Model

In [None]:
from huggingface_hub import snapshot_download
import os

MODEL_PATH = "models/Qwen2.5-Coder-3B-Instruct"
if not os.path.exists(MODEL_PATH):
    print("Downloading Qwen2.5-Coder-3B-Instruct...")
    snapshot_download(repo_id="Qwen/Qwen2.5-Coder-3B-Instruct", local_dir=MODEL_PATH, local_dir_use_symlinks=False)
print("Model ready!")

## 4. Check Training Data

In [None]:
import pandas as pd
train_df = pd.read_parquet('example_data/train.parquet')
print(f"Training samples: {len(train_df)}")

## 5. RL Training with GRPO (Memory Optimized for 3B on T4)

**Key optimizations to fit 3B model on 16GB T4:**
- Disabled KL loss → No reference model needed (saves ~6GB)
- Uses float16 (T4 doesn't support bfloat16)
- Reduced vLLM memory to 20%
- Batch size = 1
- Full CPU offloading

In [None]:
import os
os.environ['VLLM_ATTENTION_BACKEND'] = 'XFORMERS'
os.environ['TOKENIZERS_PARALLELISM'] = 'false'
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'

# Ultra memory-optimized config for 3B on T4
TRAIN_CONFIG = {
    # Data - minimal batch sizes
    'data.train_files': 'example_data/train.parquet',
    'data.val_files': 'example_data/test.parquet',
    'data.train_batch_size': 1,  # Minimum
    'data.val_batch_size': 1,
    'data.max_prompt_length': 512,  # Reduced
    'data.max_response_length': 256,  # Reduced
    
    # Model
    'actor_rollout_ref.model.path': 'models/Qwen2.5-Coder-3B-Instruct',
    'actor_rollout_ref.model.enable_gradient_checkpointing': True,
    
    # Actor - aggressive offloading
    'actor_rollout_ref.actor.ppo_mini_batch_size': 1,
    'actor_rollout_ref.actor.ppo_micro_batch_size': 1,
    'actor_rollout_ref.actor.fsdp_config.param_offload': True,
    'actor_rollout_ref.actor.fsdp_config.grad_offload': True,
    'actor_rollout_ref.actor.fsdp_config.optimizer_offload': True,
    'actor_rollout_ref.actor.optim.lr': '1e-6',
    
    # DISABLE KL loss - removes reference model, saves ~6GB!
    'actor_rollout_ref.actor.use_kl_loss': False,
    
    # Rollout - minimal vLLM memory
    'actor_rollout_ref.rollout.name': 'vllm',
    'actor_rollout_ref.rollout.tensor_model_parallel_size': 1,
    'actor_rollout_ref.rollout.gpu_memory_utilization': 0.2,  # Only 20%
    'actor_rollout_ref.rollout.n': 2,  # Fewer samples per prompt
    'actor_rollout_ref.rollout.temperature': 1.0,
    'actor_rollout_ref.rollout.log_prob_micro_batch_size': 1,
    
    # Reference model - also offload (even though KL is off)
    'actor_rollout_ref.ref.fsdp_config.param_offload': True,
    'actor_rollout_ref.ref.log_prob_micro_batch_size': 1,
    
    # Algorithm
    'algorithm.adv_estimator': 'grpo',
    'algorithm.kl_ctrl.kl_coef': 0.0,  # No KL penalty
    
    # Trainer
    'trainer.n_gpus_per_node': 1,
    'trainer.nnodes': 1,
    'trainer.total_epochs': 1,
    'trainer.save_freq': 100,
    'trainer.test_freq': 50,
    'trainer.critic_warmup': 0,
    'trainer.logger': "['console']",
    'trainer.project_name': 'SQL-R1-Kaggle',
    'trainer.experiment_name': '3B-T4-GRPO',
    'trainer.default_local_dir': 'logs/kaggle_run',
}

cmd_args = ' '.join([f"{k}={v}" for k, v in TRAIN_CONFIG.items()])
print("Config ready! Key memory savings:")
print("- KL loss disabled (no reference model)")
print("- Model uses float16 (half memory)")
print("- vLLM memory: 20%")
print("- Batch size: 1")

In [None]:
# Clear GPU memory before training
import torch
import gc
gc.collect()
torch.cuda.empty_cache()
print(f"Free GPU memory: {torch.cuda.memory_reserved(0)/1e9:.2f} GB reserved")

In [None]:
# Run training
!python -m verl.trainer.main_ppo {cmd_args}

## 6. Test Reward Function

In [None]:
from verl.utils.reward_score.synsql import extract_solution

test = '<think>Query analysis</think><answer>```sql\nSELECT * FROM employees\n```</answer>'
answer, think, _ = extract_solution(test)
print(f"Answer: {answer}\nThink: {think}")

In [None]:
!nvidia-smi