# V2 STE-FP16 Training (Terminal-Based)

This notebook runs V2 training via a bash script for reliability.

**Steps:**
1. Mount Google Drive
2. Clone/update repo
3. Run training script

**Config:** Q2_A4 (MLP=2bit/rank32, Attn=4bit/rank8)

## Step 1: Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Step 2: Clone/Update Repository

In [None]:
import os
os.chdir('/content')

if os.path.exists('qwen3_apple_style_2bit_qat_lora'):
    print('Repo exists, pulling latest...')
    !cd qwen3_apple_style_2bit_qat_lora && git fetch && git reset --hard origin/main
else:
    print('Cloning repo...')
    !git clone https://github.com/anemll/qwen3_apple_style_2bit_qat_lora.git

os.chdir('/content/qwen3_apple_style_2bit_qat_lora')
print(f'\nWorking directory: {os.getcwd()}')

## Step 3: Verify Files on Google Drive

Make sure these files exist:
- `/content/drive/MyDrive/qwen3_runs/q2_pt_good1.tgz` (V1 checkpoint)
- `/content/drive/MyDrive/qwen3_caches/alpaca_chat_think_both_L128_K128_R1024.tgz` (KD cache)

In [None]:
# Check Drive files
import os

checkpoint = '/content/drive/MyDrive/qwen3_runs/q2_pt_good1.tgz'
cache = '/content/drive/MyDrive/qwen3_caches/alpaca_chat_think_both_L128_K128_R1024.tgz'

print('Checking files on Google Drive...')
print(f'  Checkpoint: {"OK" if os.path.exists(checkpoint) else "MISSING"} - {checkpoint}')
print(f'  Cache:      {"OK" if os.path.exists(cache) else "MISSING"} - {cache}')

if not os.path.exists(checkpoint) or not os.path.exists(cache):
    print('\nERROR: Missing files! Upload them to Google Drive before continuing.')
else:
    print('\nAll files found. Ready to train!')

## Step 4: Run Training Script

Runs `scripts/run_v2_training.sh` which:
1. Installs dependencies (transformers, accelerate, etc.)
2. Extracts checkpoint from Drive → `runs/tmp/backup_mlp_e2e_w_0.3824.pt`
3. Extracts L128 cache from Drive → `caches/alpaca_chat_think_both_L128_K128_R1024/`
4. Runs `scripts/train_v2_simple.py` for V2 STE-FP16 training
5. Saves results to `runs/v2_output/` and copies to Drive

In [None]:
%cd /content/qwen3_apple_style_2bit_qat_lora
!bash scripts/run_v2_training.sh

## Step 5: Check Results

After training completes, check the output:

In [None]:
# List local output
!ls -la runs/v2_output/ 2>/dev/null || echo 'No local output yet'

print('\n--- Google Drive output ---')
!ls -la /content/drive/MyDrive/qwen3_runs/v2_output/ 2>/dev/null || echo 'No Drive output yet'

---

## Optional: Run with Custom Parameters

If you need to customize training, run the Python script directly:

In [None]:
# Custom training (example)
# Uncomment and modify as needed:

# !python scripts/train_v2_simple.py \
#     --v1-checkpoint runs/tmp/backup_mlp_e2e_w_0.3824.pt \
#     --cache-dir caches/alpaca_chat_think_both_L128_K128_R1024 \
#     --batch-size 4 \
#     --max-steps 2000 \
#     --lr 5e-5