DeepSeek-R1 Deploy and Finetune step by step guide
- 基于DeepSeek-R1-Zero范式对phi4/qwen模型进行GRPO强化学习:
deepseek-r1-zero-phi4.ipynb
deepseek-r1-zero-qwen7b.ipynb
- 基于基于DeepSeek-R1范式对phi4模型进行蒸馏distill:
deepseek_r1_distill_phi4.py
- 基于deepseek_r1_distill-qwen 进行medical 数据的SFT :
medical-finetune-DeepSeek-R1-Distill-qwen7b