Gen-Verse / ReasonFlux Star 351 Code Issues Pull requests ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates reinforcement-learning chain-of-thought llm-rlhf sft-data o1-mini o1-preview deepseek-v3 deepseek-r1 Updated Mar 22, 2025 Python
ssbuild / llm_rlhf Star 26 Code Issues Pull requests realize the reinforcement learning training for gpt2 llama bloom and so on llm model lora reward trl llm rlhf trlx llm-rlhf Updated Sep 19, 2023 Python
Evil-cyber65 / ReasonFlux Star 0 Code Issues Pull requests ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates reinforcement-learning chain-of-thought llm-rlhf sft-data o1-mini o1-preview deepseek-v3 deepseek-r1 Updated Mar 23, 2025