[examples][bug] fix silent eval max generate length not overriding by erictang000 · Pull Request #1317 · NovaSky-AI/SkyRL

erictang000 · 2026-03-11T19:52:41Z

After #1187, the eval_sampling_params.max_generate_length no longer was automatically set to sampling_params.max_generate_length in the case that eval_sampling_params was not None. (code link) This was causing example scripts to have matching training behavior but diverging eval behavior. Manually setting eval_sampling_params.max_generate_length in example scripts should fix this

Examples of divergence due to unexpected lower eval_sampling_params.max_generate_length cc: @justinvyu

gemini-code-assist

Code Review

This pull request correctly fixes an issue where eval_sampling_params.max_generate_length was not being overridden in example scripts. The fix is applied consistently across numerous scripts. However, I've noticed several changes, particularly in run_dapo_qwen3_1.7b_aime.sh and run_dapo_qwen3_30b_a3b_megatron_aime.sh, that seem unrelated to this bug fix. These include changes to resource configurations, hardcoded paths, and experiment-specific naming, which should be reverted to keep the example scripts generic and maintainable. Please address the comments to remove these out-of-scope changes.

I am having trouble creating individual review comments. Click here to see my feedback.

examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (120-126)

These changes to trainer.project_name, run_name, export_path, and ckpt_path appear to be from a specific, local experimental run (e.g., project router_replay, date 02-27). Committing such personalized configurations to a shared example script should be avoided. Please revert these to their original, more generic values.

  trainer.project_name="dapo_aime" \
  trainer.run_name="dapo_qwen3_30b_a3b_base_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}" \
  trainer.export_path="$HOME/exports/dapo_qwen3_30b_a3b_base_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}" \
  trainer.hf_save_interval=300 \
  trainer.resume_mode=latest \
  trainer.max_ckpts_to_keep=3 \
  trainer.ckpt_path="$HOME/ckpts/dapo_qwen3_30b_a3b_base_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}" \

examples/train/algorithms/dapo/run_dapo_qwen3_1.7b_aime.sh (11-13)

These changes to NUM_NODES and NUM_INFERENCE_ENGINES seem unrelated to the PR's goal of fixing the max_generate_length override. This doubles the resource requirements for this example script. If this change is intentional and necessary, it should be documented in the PR description. Otherwise, to keep this PR focused on the bug fix, it would be better to revert these changes.

examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (10)

Hardcoding the DATA_DIR to /mnt/cluster_storage/data/dapo makes this example script less portable. The previous use of $HOME is more generic. Please revert this change to avoid committing environment-specific paths.

DATA_DIR="$HOME/data/dapo"

examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (48)

This change to MEGATRON_PP seems out of scope for this PR, which is focused on fixing max_generate_length. If this is an intentional configuration update, please provide context in the PR description. Otherwise, it should be reverted to maintain the PR's focus.

MEGATRON_PP=1

examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (58-61)

These environment variable exports appear to be specific to a particular execution environment (e.g., AWS with EFA). Including them directly in the example script reduces its portability. It would be better to document these as prerequisites for running in such an environment, rather than hardcoding them here. Please consider removing these lines.

uv run --isolated --extra megatron -m examples.train.algorithms.dapo.main_dapo \

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 1 additional finding.

# What does this PR do? Follow up to #1317 - we need to also add this to the doc pages.  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1318" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>  --------- Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

erictang000 added 2 commits March 11, 2026 19:46

fix silent eval max generate length not overriding

bd3ebf3

x

0d4ccce

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

devin-ai-integration bot reviewed Mar 11, 2026

View reviewed changes

SumanthRH approved these changes Mar 11, 2026

View reviewed changes

erictang000 merged commit 4e19c1f into NovaSky-AI:main Mar 11, 2026
1 check passed

erictang000 deleted the eval_max_gen_length branch March 11, 2026 20:22

SumanthRH mentioned this pull request Mar 11, 2026

[docs] Add explicit eval_sampling_params.max_generate_length #1318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[examples][bug] fix silent eval max generate length not overriding#1317

[examples][bug] fix silent eval max generate length not overriding#1317
erictang000 merged 2 commits intoNovaSky-AI:mainfrom
erictang000:eval_max_gen_length

erictang000 commented Mar 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

erictang000 commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (120-126)

examples/train/algorithms/dapo/run_dapo_qwen3_1.7b_aime.sh (11-13)

examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (10)

examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (48)

examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (58-61)

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erictang000 commented Mar 11, 2026 •

edited

Loading