Skip to content

【CI】update rl cases#1835

Merged
hhaAndroid merged 11 commits into
mainfrom
ci/rl_new
May 27, 2026
Merged

【CI】update rl cases#1835
hhaAndroid merged 11 commits into
mainfrom
ci/rl_new

Conversation

@kkscilife
Copy link
Copy Markdown
Collaborator

No description provided.

@kkscilife kkscilife requested a review from hhaAndroid May 25, 2026 11:11
@kkscilife
Copy link
Copy Markdown
Collaborator Author

@claude review

Comment thread autotest/module/train.py Outdated
return command, config
elif train_type == "rl":
infer_type = config.get("parameters", {}).get("infer_backend", "lmdeploy")
acceleator = config.get("parameters", {}).get("acceleator", "GPU")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Typo: acceleator should be accelerator — both the variable name and the config key lookup string are misspelled. This means if someone adds accelerator: NPU to config.yaml, it won't be matched.

Suggested change
acceleator = config.get("parameters", {}).get("acceleator", "GPU")
accelerator = config.get("parameters", {}).get("accelerator", "GPU")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread autotest/module/train.py Outdated
f"cd {current_dir}; pwd; pip install -e .[all]; export GITHUB_RUN_ID={config.get('run_id')}; export WORK_DIR={work_dir}; "
+ cudnn_patch
+ f"bash -x examples/v1/scripts/run_rl.sh {config_path} {infer_type} ${{MODEL_PATH}} ${{DATA_PATH}} ${{EVAL_DATA_PATH}}"
+ f"bash -x autotest/utils/ci_run_rl.sh {acceleator} {infer_type} {config_path} ${{MODEL_PATH}} ${{DATA_PATH}} ${{EVAL_DATA_PATH}}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Also needs to be updated to match the corrected variable name:

Suggested change
+ f"bash -x autotest/utils/ci_run_rl.sh {acceleator} {infer_type} {config_path} ${{MODEL_PATH}} ${{DATA_PATH}} ${{EVAL_DATA_PATH}}"
+ f"bash -x autotest/utils/ci_run_rl.sh {accelerator} {infer_type} {config_path} ${{MODEL_PATH}} ${{DATA_PATH}} ${{EVAL_DATA_PATH}}"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,208 @@
set -ex
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Missing shebang line. The script should start with #!/usr/bin/env bash (or #!/bin/bash) to explicitly declare the shell interpreter. Without it, the script behavior depends on whichever shell the caller uses.

Suggested change
set -ex
#!/usr/bin/env bash
set -ex

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore; copyed from original script

Comment on lines +82 to 89
model_cfg.text_config.balancing_loss_cfg = None
if hasattr(model_cfg.text_config, "z_loss_cfg"):
model_cfg.text_config.z_loss_cfg = None
optim_cfg = AdamWConfig(lr=1e-6, foreach=False, weight_decay=0.1)
loss_cfg = GRPOLossConfig(
policy_loss_cfg=dict(
cliprange_high=0.2,
cliprange_high=0.28,
cliprange_low=0.2,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Inconsistent indentation — the inner dict entries use 12 spaces while the outer list uses 4. This looks like it was intended to be 8-space aligned (matching the GSM8K config style). Consider reformatting for consistency:

Suggested change
model_cfg.text_config.balancing_loss_cfg = None
if hasattr(model_cfg.text_config, "z_loss_cfg"):
model_cfg.text_config.z_loss_cfg = None
optim_cfg = AdamWConfig(lr=1e-6, foreach=False, weight_decay=0.1)
loss_cfg = GRPOLossConfig(
policy_loss_cfg=dict(
cliprange_high=0.2,
cliprange_high=0.28,
cliprange_low=0.2,
"dataset": DatasetConfig(
name="geo3k",
anno_path=data_path,
class_name="VLMJsonlDataset",
media_root=media_root,
sample_ratio=1.0,
),
"tokenize_fn": RLQwen3VLTokenizeFnConfig(processor_path=model_path,
max_length=max_prompt_length),

Also, per project style (CLAUDE.md), use double quotes "VLMJsonlDataset" instead of single quotes 'VLMJsonlDataset'.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore

Comment on lines +127 to +136
dataloader_cfg = DataloaderConfig(
dataset_config_list=train_dataset_cfg,
pack_max_length=pack_max_length,
collator="fake_collator",
pack_level="none",
num_workers=8,
)
sampler_config = SamplerConfig(
dataloader_cfg=dataloader_cfg,
prompt_repeat_k=prompt_repeat_k,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Same formatting issues here — mixed indentation and single quotes. These should match the project's double-quote convention ("VLMJsonlDataset").

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 25, 2026

Claude: ## Review Summary

This PR updates CI RL test cases: fixes a broken config reference, migrates to a dedicated CI script (ci_run_rl.sh), updates the tracker path, rewrites the VL config to use newer APIs, and adds exp_tracker="jsonl" to both RL configs.

Issues Found

Bug (must fix):

  • autotest/module/train.py:60 — Typo acceleator should be accelerator (both variable name and config key lookup). Works today only because the default is "GPU", but will silently ignore a future accelerator: NPU entry in config.yaml.

Should fix:

  • autotest/utils/ci_run_rl.sh:1 — Missing shebang (#\!/usr/bin/env bash). The script is invoked via bash -x so it works, but best practice is to include it.
  • autotest/config/rl_qwen3_vl_geometry3k_grpo.py — Inconsistent indentation in dataset config dicts and use of single quotes ('VLMJsonlDataset') where project standard requires double quotes.

Nits

  • Chinese comments in ci_run_rl.sh are fine for internal CI scripts but worth noting for consistency.
  • Typo in the shell comment on line 5: auotest/config/rl_qwen3_8B_grpo.py (should be autotest).

Overall the changes look reasonable — the migration to a self-contained CI script and updated config APIs make the RL test infrastructure more maintainable. Please fix the acceleator typo before merging.

total_train_steps = 15 # TODO: total_epoch
evaluate_step = 15
train_optimizer_steps = 4
train_batch_size = 1024
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个太大了,可以参考以前值 64

train_batch_size = 1024
prompt_repeat_k = 5
rollout_tp_size = 2
rollout_tp_size = 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

测试 tp2,能测的多一点

ignore_idx=-100,
use_kl_loss=True,
kl_loss_coef=0.001,
use_kl_loss=False,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个也不要改,测试覆盖率更高

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其实这个配置是examples/v1/config/rl_grpo_geo3k_judge.py中的

@@ -1,57 +1,56 @@
"""RL Colocate Trainer 示例配置(GRPO + GSM8K)。
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我觉得 autotest 要不直接测试 rl_qwen3p5_vl_35B_grpo_mixdata.py 这个配置。覆盖率更高,而且是 qwen3.5

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

第一批先加已有的吧

@hhaAndroid hhaAndroid merged commit 6bf05f0 into main May 27, 2026
7 checks passed
@hhaAndroid hhaAndroid deleted the ci/rl_new branch May 27, 2026 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants