-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Insights: huggingface/open-r1
Overview
Could not load contribution data
Please try again later
6 Pull requests merged by 3 people
-
Async code reward fixes
#546 merged
Mar 28, 2025 -
fix dataset parsing error
#540 merged
Mar 28, 2025 -
Restore single-node instructions to run GRPO
#549 merged
Mar 27, 2025 -
[WIP] RL goes brrr
#533 merged
Mar 24, 2025 -
Fixes missing exception in run_script
#532 merged
Mar 24, 2025 -
fix get_reward_funcs bug
#535 merged
Mar 22, 2025
1 Pull request opened by 1 person
-
Configurable reward functions
#552 opened
Mar 27, 2025
7 Issues closed by 5 people
-
log info
#554 closed
Mar 28, 2025 -
Stuck in evaluation after maximum concurrency info
#551 closed
Mar 28, 2025 -
how to train grpo on 2 nodes(16gpus)
#370 closed
Mar 26, 2025 -
Error with latest setup.py trl.extras.vllm_client - Server is not up yet
#543 closed
Mar 25, 2025 -
Stuck at lighteval "COMPUTING METRICS"
#531 closed
Mar 25, 2025 -
`trl` version mismatch
#541 closed
Mar 24, 2025 -
NCCL problem occured when multiple GPU cards are saving model.safetensors
#160 closed
Mar 24, 2025
12 Issues opened by 12 people
-
accuracy_reward: difference in ordering of arguments in verify?
#557 opened
Mar 28, 2025 -
Bug: sft.py Doesn't Work for Non-Qwen Models & Has Issues with Generation
#556 opened
Mar 28, 2025 -
The responses are always "!!!!!!!!!!!!!!!!!!!!!!!!!" during grpo training.
#555 opened
Mar 28, 2025 -
Cannot Resume Training From a Trained Checkpoint. Is this a bug ?
#553 opened
Mar 28, 2025 -
please fix grpo default config bug
#550 opened
Mar 27, 2025 -
memory usage of different length
#548 opened
Mar 26, 2025 -
Proposal: extensible reward functions
#547 opened
Mar 26, 2025 -
saving checkpoints error while grpo
#544 opened
Mar 25, 2025 -
SFT on Qwen2.5-1.5B-Instruct is failed
#539 opened
Mar 24, 2025 -
get error when set packing=false
#536 opened
Mar 22, 2025
25 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
🦜Enhance repetition penalty reward for language that cannot be split by whitespace
#516 commented on
Mar 25, 2025 • 2 new comments -
sft learn to generate eos token
#494 commented on
Mar 25, 2025 • 0 new comments -
Extend max_model_length to prevent context truncation
#463 commented on
Mar 27, 2025 • 0 new comments -
Resolve double BOS token issue
#462 commented on
Mar 27, 2025 • 0 new comments -
[DO NOT MERGE] SFT configs for Qwen coder models
#438 commented on
Mar 23, 2025 • 0 new comments -
New GRPO dataset and tasks: formally-verified program correctness
#379 commented on
Mar 23, 2025 • 0 new comments -
Crazy VRAM usage with longer prompts
#47 commented on
Mar 27, 2025 • 0 new comments -
lighteval script failed
#468 commented on
Mar 27, 2025 • 0 new comments -
The kl divergence collapses but the format reward becomes larger
#373 commented on
Mar 27, 2025 • 0 new comments -
Is it normal for a 1.5B model on an H100 80G to require several hundred hours for LiveCodeBench?
#466 commented on
Mar 26, 2025 • 0 new comments -
Prefix Caching should be turned off for GRPO
#491 commented on
Mar 26, 2025 • 0 new comments -
Instead of rising steadily, the reward fluctuates wildly
#403 commented on
Mar 26, 2025 • 0 new comments -
Does anyone have an working SFT training script for 1xH100? - OOM Error
#332 commented on
Mar 26, 2025 • 0 new comments -
grpo with multiple GPUs got stuck
#478 commented on
Mar 26, 2025 • 0 new comments -
How to increase the context window from 4k to 32k on qwen models ?
#444 commented on
Mar 26, 2025 • 0 new comments -
[Installation] Failed to Build vllm on ARM Architecture with uv pip install - Unknown Runtime Environment
#510 commented on
Mar 25, 2025 • 0 new comments -
GRPO OOM
#475 commented on
Mar 25, 2025 • 0 new comments -
Evaluate GRPO vs. other RL algorithms
#11 commented on
Mar 25, 2025 • 0 new comments -
Can I use two GPUs for VLLM?
#471 commented on
Mar 25, 2025 • 0 new comments -
failed (exitcode: -8) local_rank: 6 (pid: 58423) of binary: /opt/miniconda/bin/python When run GRPO
#254 commented on
Mar 25, 2025 • 0 new comments -
SFT model make repetitions during the inference phase
#492 commented on
Mar 24, 2025 • 0 new comments -
Fail to parse gold solution
#503 commented on
Mar 24, 2025 • 0 new comments -
different max_position_embeddings and rope_theta in and OpenR1-Qwen-7B-SFT and it's base Qwen2.5-Math-7B-Instruct ?
#469 commented on
Mar 23, 2025 • 0 new comments -
OOM, SFT Qwen2.5-1.5B-Instruct OpenR1-Math-220k
#506 commented on
Mar 22, 2025 • 0 new comments -
After completing Step-1 training using the given example of Qwen2.5-1.5B-Instruct, the performance has decreased. Is this normal?
#355 commented on
Mar 22, 2025 • 0 new comments