# ReFeynman Training Pipeline
## 完整的SFT + GRPO训练流程

**运行环境**: Google Colab (免费T4 GPU)

**训练步骤**:
1. 环境配置
2. 数据生成 (Gemini API)
3. SFT微调 (LoRA)
4. GRPO强化学习
5. 模型评估

## Step 1: 环境设置

In [None]:
# 克隆仓库
!git clone https://github.com/SeanDF333/ReFeynman.git
%cd ReFeynman

In [None]:
# 安装依赖
!pip install -q -r requirements.txt

In [None]:
# 配置环境变量
import os
from google.colab import userdata

# 在Colab Secrets里添加这些keys
os.environ['HF_TOKEN'] = userdata.get('HF_TOKEN')
os.environ['GEMINI_API_KEY'] = userdata.get('GEMINI_API_KEY')

# 或者直接输入(不安全,仅测试用)
# os.environ['HF_TOKEN'] = 'your_token_here'
# os.environ['GEMINI_API_KEY'] = 'your_key_here'

In [None]:
# 检查GPU
!nvidia-smi

## Step 2: 生成训练数据

In [None]:
# 使用Gemini生成费曼风格对话数据
!python data/generate_data.py

In [None]:
# 检查生成的数据
import json

with open('data/train.jsonl', 'r') as f:
    sample = json.loads(f.readline())

print("Sample dialogue:")
print(json.dumps(sample, indent=2))

## Step 3: SFT训练 (LoRA)

In [None]:
# 执行SFT训练
# 预计时间: 2-3小时 (T4 GPU)
!python models/sft_trainer.py

## Step 4: GRPO训练

In [None]:
# 执行GRPO强化学习
# 预计时间: 1-2小时
!python models/grpo_trainer.py

## Step 5: 测试模型

In [None]:
# 加载训练好的模型
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

model_path = "checkpoints/grpo_final"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    load_in_4bit=True,
    device_map="auto"
)
model = PeftModel.from_pretrained(model, model_path)

print("✅ Model loaded!")

In [None]:
# 测试推理
def ask_feynman(question: str):
    prompt = f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=400,
        temperature=0.7,
        top_p=0.9,
        do_sample=True
    )
    
    response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
    return response

# 测试问题
questions = [
    "Can you explain quantum entanglement like I'm 10 years old?",
    "What is calculus and why should I care?",
    "How do neural networks actually learn?"
]

for q in questions:
    print(f"\n{'='*60}")
    print(f"Q: {q}")
    print(f"{'='*60}")
    print(f"Feynman: {ask_feynman(q)}")
    print()

## Step 6: 保存到HuggingFace (可选)

In [None]:
# 推送到HuggingFace Hub
# model.push_to_hub("YourUsername/ReFeynman-7B-GRPO")
# tokenizer.push_to_hub("YourUsername/ReFeynman-7B-GRPO")

## Step 7: 下载模型到本地

In [None]:
# 压缩checkpoint
!zip -r refeynman_checkpoint.zip checkpoints/grpo_final/

# 下载
from google.colab import files
files.download('refeynman_checkpoint.zip')