Skip to content

[examples][bug] fix silent eval max generate length not overriding#1317

Merged
erictang000 merged 2 commits intoNovaSky-AI:mainfrom
erictang000:eval_max_gen_length
Mar 11, 2026
Merged

[examples][bug] fix silent eval max generate length not overriding#1317
erictang000 merged 2 commits intoNovaSky-AI:mainfrom
erictang000:eval_max_gen_length

Conversation

@erictang000
Copy link
Collaborator

@erictang000 erictang000 commented Mar 11, 2026

After #1187, the eval_sampling_params.max_generate_length no longer was automatically set to sampling_params.max_generate_length in the case that eval_sampling_params was not None. (code link) This was causing example scripts to have matching training behavior but diverging eval behavior. Manually setting eval_sampling_params.max_generate_length in example scripts should fix this

Examples of divergence due to unexpected lower eval_sampling_params.max_generate_length cc: @justinvyu
image
image


Open with Devin

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes an issue where eval_sampling_params.max_generate_length was not being overridden in example scripts. The fix is applied consistently across numerous scripts. However, I've noticed several changes, particularly in run_dapo_qwen3_1.7b_aime.sh and run_dapo_qwen3_30b_a3b_megatron_aime.sh, that seem unrelated to this bug fix. These include changes to resource configurations, hardcoded paths, and experiment-specific naming, which should be reverted to keep the example scripts generic and maintainable. Please address the comments to remove these out-of-scope changes.

I am having trouble creating individual review comments. Click here to see my feedback.

examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (120-126)

high

These changes to trainer.project_name, run_name, export_path, and ckpt_path appear to be from a specific, local experimental run (e.g., project router_replay, date 02-27). Committing such personalized configurations to a shared example script should be avoided. Please revert these to their original, more generic values.

  trainer.project_name="dapo_aime" \
  trainer.run_name="dapo_qwen3_30b_a3b_base_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}" \
  trainer.export_path="$HOME/exports/dapo_qwen3_30b_a3b_base_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}" \
  trainer.hf_save_interval=300 \
  trainer.resume_mode=latest \
  trainer.max_ckpts_to_keep=3 \
  trainer.ckpt_path="$HOME/ckpts/dapo_qwen3_30b_a3b_base_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}" \

examples/train/algorithms/dapo/run_dapo_qwen3_1.7b_aime.sh (11-13)

medium

These changes to NUM_NODES and NUM_INFERENCE_ENGINES seem unrelated to the PR's goal of fixing the max_generate_length override. This doubles the resource requirements for this example script. If this change is intentional and necessary, it should be documented in the PR description. Otherwise, to keep this PR focused on the bug fix, it would be better to revert these changes.

examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (10)

medium

Hardcoding the DATA_DIR to /mnt/cluster_storage/data/dapo makes this example script less portable. The previous use of $HOME is more generic. Please revert this change to avoid committing environment-specific paths.

DATA_DIR="$HOME/data/dapo"

examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (48)

medium

This change to MEGATRON_PP seems out of scope for this PR, which is focused on fixing max_generate_length. If this is an intentional configuration update, please provide context in the PR description. Otherwise, it should be reverted to maintain the PR's focus.

MEGATRON_PP=1

examples/train/algorithms/dapo/run_dapo_qwen3_30b_a3b_megatron_aime.sh (58-61)

medium

These environment variable exports appear to be specific to a particular execution environment (e.g., AWS with EFA). Including them directly in the example script reduces its portability. It would be better to document these as prerequisites for running in such an environment, rather than hardcoding them here. Please consider removing these lines.

uv run --isolated --extra megatron -m examples.train.algorithms.dapo.main_dapo \

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 1 additional finding.

Open in Devin Review

@erictang000 erictang000 merged commit 4e19c1f into NovaSky-AI:main Mar 11, 2026
1 check passed
@erictang000 erictang000 deleted the eval_max_gen_length branch March 11, 2026 20:22
SumanthRH added a commit that referenced this pull request Mar 11, 2026
# What does this PR do?

Follow up to #1317 - we need to also add this to the doc pages. 
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1318"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

---------

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants