[examples][bug] fix silent eval max generate length not overriding#1317
[examples][bug] fix silent eval max generate length not overriding#1317erictang000 merged 2 commits intoNovaSky-AI:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request correctly fixes an issue where eval_sampling_params.max_generate_length was not being overridden in example scripts. The fix is applied consistently across numerous scripts. However, I've noticed several changes, particularly in run_dapo_qwen3_1.7b_aime.sh and run_dapo_qwen3_30b_a3b_megatron_aime.sh, that seem unrelated to this bug fix. These include changes to resource configurations, hardcoded paths, and experiment-specific naming, which should be reverted to keep the example scripts generic and maintainable. Please address the comments to remove these out-of-scope changes.
I am having trouble creating individual review comments. Click here to see my feedback.
examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (120-126)
These changes to trainer.project_name, run_name, export_path, and ckpt_path appear to be from a specific, local experimental run (e.g., project router_replay, date 02-27). Committing such personalized configurations to a shared example script should be avoided. Please revert these to their original, more generic values.
trainer.project_name="dapo_aime" \
trainer.run_name="dapo_qwen3_30b_a3b_base_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}" \
trainer.export_path="$HOME/exports/dapo_qwen3_30b_a3b_base_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}" \
trainer.hf_save_interval=300 \
trainer.resume_mode=latest \
trainer.max_ckpts_to_keep=3 \
trainer.ckpt_path="$HOME/ckpts/dapo_qwen3_30b_a3b_base_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}" \
examples/train/algorithms/dapo/run_dapo_qwen3_1.7b_aime.sh (11-13)
These changes to NUM_NODES and NUM_INFERENCE_ENGINES seem unrelated to the PR's goal of fixing the max_generate_length override. This doubles the resource requirements for this example script. If this change is intentional and necessary, it should be documented in the PR description. Otherwise, to keep this PR focused on the bug fix, it would be better to revert these changes.
examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (10)
Hardcoding the DATA_DIR to /mnt/cluster_storage/data/dapo makes this example script less portable. The previous use of $HOME is more generic. Please revert this change to avoid committing environment-specific paths.
DATA_DIR="$HOME/data/dapo"
examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (48)
This change to MEGATRON_PP seems out of scope for this PR, which is focused on fixing max_generate_length. If this is an intentional configuration update, please provide context in the PR description. Otherwise, it should be reverted to maintain the PR's focus.
MEGATRON_PP=1
examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (58-61)
These environment variable exports appear to be specific to a particular execution environment (e.g., AWS with EFA). Including them directly in the example script reduces its portability. It would be better to document these as prerequisites for running in such an environment, rather than hardcoding them here. Please consider removing these lines.
uv run --isolated --extra megatron -m examples.train.algorithms.dapo.main_dapo \
# What does this PR do? Follow up to #1317 - we need to also add this to the doc pages. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1318" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> --------- Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
After #1187, the
eval_sampling_params.max_generate_lengthno longer was automatically set tosampling_params.max_generate_lengthin the case thateval_sampling_paramswas notNone. (code link) This was causing example scripts to have matching training behavior but diverging eval behavior. Manually settingeval_sampling_params.max_generate_lengthin example scripts should fix thisExamples of divergence due to unexpected lower


eval_sampling_params.max_generate_lengthcc: @justinvyu