Describe the bug
During large-scale reinforcement learning (RLVR) training, we identified several critical issues affecting the stability and correctness of the ROLL framework:
- LoRA Weight Inconsistency: When training with LoRA adapters, parameter updates are not correctly gathered and broadcasted from training nodes to rollout workers. This results in the inference phase using outdated base model weights.
- State Recovery Failure: When resuming training, DynamicSamplingScheduler attempts to call get_next_dataset_item() before dataset_iter is properly initialized in the init sequence, causing a crash.
- DeepSpeed Group Initialization: DeepSpeed initialization fails if it receives an empty parameter group (common when freezing layers in LoRA).
- Ray Metadata Overflow: Long-running sessions can exhaust the system's /tmp partition. There is no current way to redirect Ray's temporary directory.
Logs
1 (DynamicSamplingScheduler) Traceback (most recent call last):
2 File ".../roll/distributed/scheduler/generate_scheduler.py", line 478, in init
3 self.get_next_dataset_item()
4 File ".../roll/distributed/scheduler/generate_scheduler.py", line 727, in get_next_dataset_item
5 if self.dataset_iter is None:
6 AttributeError: 'DynamicSamplingScheduler' object has no attribute 'dataset_iter'
DeepSpeed Error:
1 ValueError: optimizer got an empty parameter list
Environment:
- Hardware: NVIDIA H200 Cluster
- Backend: DeepSpeed + Ray
Describe the bug
During large-scale reinforcement learning (RLVR) training, we identified several critical issues affecting the stability and correctness of the ROLL framework:
Logs
1 (DynamicSamplingScheduler) Traceback (most recent call last):
2 File ".../roll/distributed/scheduler/generate_scheduler.py", line 478, in init
3 self.get_next_dataset_item()
4 File ".../roll/distributed/scheduler/generate_scheduler.py", line 727, in get_next_dataset_item
5 if self.dataset_iter is None:
6 AttributeError: 'DynamicSamplingScheduler' object has no attribute 'dataset_iter'
DeepSpeed Error:
1 ValueError: optimizer got an empty parameter list
Environment: