# Ghost Architect — Colab T4 Main Notebook

This is the single notebook to run full **Gemma-3 Trinity training** on Google Colab T4 (16GB).

## What this notebook does
1. Validates T4 runtime
2. Installs exact dependencies
3. Syncs project files
4. Writes full T4 training configuration
5. Launches training
6. Exports GGUF artifacts


## 1) Runtime Check (must be T4 GPU)

In [None]:
import torch

assert torch.cuda.is_available(), 'CUDA is not available. Set Runtime > GPU in Colab.'
gpu_name = torch.cuda.get_device_name(0)
gpu_mem_gb = torch.cuda.get_device_properties(0).total_memory / 1024**3
print(f'GPU: {gpu_name}')
print(f'VRAM: {gpu_mem_gb:.1f} GB')
if 'T4' not in gpu_name:
    print('Warning: This notebook is tuned for T4; adjust config if using a different GPU.')

!nvidia-smi


## 2) Install Dependencies

In [None]:
!pip install -q unsloth[colab-new]==2026.1.4
!pip install -q "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes
!pip install -q "torch>=2.1.0" "transformers>=4.38.0" datasets numpy scipy tqdm

print('Dependencies installed.')

## 3) Mount Drive and Sync Repository

In [None]:
from google.colab import drive
import os

drive.mount('/content/drive')

# Option A: clone from GitHub
# REPO_URL = 'https://github.com/<your-org>/<your-repo>.git'
# !git clone $REPO_URL /content/ghost_architect_gemma3

# Option B: copy from Drive (recommended if already uploaded)
# !cp -r /content/drive/MyDrive/ghost_architect_gemma3 /content/

os.makedirs('/content/ghost_architect_gemma3', exist_ok=True)
%cd /content/ghost_architect_gemma3


## 4) Full T4 Trinity Training Config

In [None]:
config_yaml = '''
model_name: "unsloth/gemma-3-12b-it-bnb-4bit"
max_seq_length: 4096
load_in_4bit: true

lora:
  r: 64
  lora_alpha: 32
  target_modules:
    - q_proj
    - k_proj
    - v_proj
    - o_proj
    - gate_proj
    - up_proj
    - down_proj
  use_rslora: true
  use_dora: true
  lora_dropout: 0.1

training:
  per_device_train_batch_size: 1
  gradient_accumulation_steps: 4
  learning_rate: 2e-4
  max_steps: 60
  warmup_steps: 10
  logging_steps: 1
  save_steps: 20
  optimizer: "adamw_8bit"
  lr_scheduler_type: "cosine"

output:
  adapters_dir: "output/adapters"
  checkpoints_dir: "output/checkpoints"
  gguf_dir: "output/gguf"

oom_fallbacks:
  - {action: reduce_seq_len, value: 2048}
  - {action: reduce_rank, value: 32}
  - {action: disable_dora, value: false}
'''

os.makedirs('configs', exist_ok=True)
with open('configs/training_config_colab_t4.yaml', 'w') as f:
    f.write(config_yaml)

print('Wrote configs/training_config_colab_t4.yaml')

## 5) Validate Dataset

In [None]:
import os, json

dataset_path = 'data/dataset.json'
assert os.path.exists(dataset_path), f'Missing dataset: {dataset_path}'

with open(dataset_path, 'r') as f:
    raw = f.read().strip()

if raw:
    _ = json.loads(raw)
    print('Dataset JSON is valid.')
else:
    print('Dataset file is empty. Populate data/dataset.json before training.')

## 6) Launch Full Training

In [None]:
# Expected CLI in future implementation:
# !python src/train.py --config configs/training_config_colab_t4.yaml --dataset data/dataset.json

print('Implement src/train.py next, then run the command above.')

## 7) Export to GGUF

In [None]:
# Expected CLI in future implementation:
# !python src/export.py --adapter_dir output/adapters --output_dir output/gguf --quantization q4_k_m

print('Implement src/export.py next, then run the command above.')