【CI】update rl cases by kkscilife · Pull Request #1835 · InternLM/xtuner

kkscilife · 2026-05-25T11:11:49Z

No description provided.

kkscilife · 2026-05-25T11:12:52Z

claude · 2026-05-25T11:16:59Z

                return command, config
            elif train_type == "rl":
                infer_type = config.get("parameters", {}).get("infer_backend", "lmdeploy")
+                acceleator = config.get("parameters", {}).get("acceleator", "GPU")


Claude: Typo: acceleator should be accelerator — both the variable name and the config key lookup string are misspelled. This means if someone adds accelerator: NPU to config.yaml, it won't be matched.

Suggested change

acceleator = config.get("parameters", {}).get("acceleator", "GPU")

accelerator = config.get("parameters", {}).get("accelerator", "GPU")

claude · 2026-05-25T11:17:01Z

                    f"cd {current_dir}; pwd; pip install -e .[all]; export GITHUB_RUN_ID={config.get('run_id')}; export WORK_DIR={work_dir}; "
                    + cudnn_patch
-                    + f"bash -x examples/v1/scripts/run_rl.sh {config_path} {infer_type} ${{MODEL_PATH}} ${{DATA_PATH}} ${{EVAL_DATA_PATH}}"
+                    + f"bash -x autotest/utils/ci_run_rl.sh {acceleator} {infer_type} {config_path} ${{MODEL_PATH}} ${{DATA_PATH}} ${{EVAL_DATA_PATH}}"


Claude: Also needs to be updated to match the corrected variable name:

Suggested change

+ f"bash -x autotest/utils/ci_run_rl.sh {acceleator} {infer_type} {config_path} ${{MODEL_PATH}} ${{DATA_PATH}} ${{EVAL_DATA_PATH}}"

+ f"bash -x autotest/utils/ci_run_rl.sh {accelerator} {infer_type} {config_path} ${{MODEL_PATH}} ${{DATA_PATH}} ${{EVAL_DATA_PATH}}"

claude · 2026-05-25T11:17:04Z

@@ -0,0 +1,208 @@
+set -ex


Claude: Missing shebang line. The script should start with #!/usr/bin/env bash (or #!/bin/bash) to explicitly declare the shell interpreter. Without it, the script behavior depends on whichever shell the caller uses.

Suggested change

set -ex

#!/usr/bin/env bash

set -ex

Ignore; copyed from original script

claude · 2026-05-25T11:17:09Z

+    model_cfg.text_config.balancing_loss_cfg = None
+if hasattr(model_cfg.text_config, "z_loss_cfg"):
+    model_cfg.text_config.z_loss_cfg = None
+optim_cfg = AdamWConfig(lr=1e-6, foreach=False, weight_decay=0.1)
 loss_cfg = GRPOLossConfig(
    policy_loss_cfg=dict(
-        cliprange_high=0.2,
+        cliprange_high=0.28,
        cliprange_low=0.2,


Claude: Inconsistent indentation — the inner dict entries use 12 spaces while the outer list uses 4. This looks like it was intended to be 8-space aligned (matching the GSM8K config style). Consider reformatting for consistency:

Suggested change

model_cfg.text_config.balancing_loss_cfg = None

if hasattr(model_cfg.text_config, "z_loss_cfg"):

model_cfg.text_config.z_loss_cfg = None

optim_cfg = AdamWConfig(lr=1e-6, foreach=False, weight_decay=0.1)

loss_cfg = GRPOLossConfig(

policy_loss_cfg=dict(

cliprange_high=0.2,

cliprange_high=0.28,

cliprange_low=0.2,

"dataset": DatasetConfig(

name="geo3k",

anno_path=data_path,

class_name="VLMJsonlDataset",

media_root=media_root,

sample_ratio=1.0,

),

"tokenize_fn": RLQwen3VLTokenizeFnConfig(processor_path=model_path,

max_length=max_prompt_length),

Also, per project style (CLAUDE.md), use double quotes "VLMJsonlDataset" instead of single quotes 'VLMJsonlDataset'.

claude · 2026-05-25T11:17:11Z

+dataloader_cfg = DataloaderConfig(
+    dataset_config_list=train_dataset_cfg,
+    pack_max_length=pack_max_length,
+    collator="fake_collator",
+    pack_level="none",
+    num_workers=8,
+)
+sampler_config = SamplerConfig(
+    dataloader_cfg=dataloader_cfg,
+    prompt_repeat_k=prompt_repeat_k,


Claude: Same formatting issues here — mixed indentation and single quotes. These should match the project's double-quote convention ("VLMJsonlDataset").

claude · 2026-05-25T11:17:36Z

Claude: ## Review Summary

This PR updates CI RL test cases: fixes a broken config reference, migrates to a dedicated CI script (ci_run_rl.sh), updates the tracker path, rewrites the VL config to use newer APIs, and adds exp_tracker="jsonl" to both RL configs.

Issues Found

Bug (must fix):

autotest/module/train.py:60 — Typo acceleator should be accelerator (both variable name and config key lookup). Works today only because the default is "GPU", but will silently ignore a future accelerator: NPU entry in config.yaml.

Should fix:

autotest/utils/ci_run_rl.sh:1 — Missing shebang (#\!/usr/bin/env bash). The script is invoked via bash -x so it works, but best practice is to include it.
autotest/config/rl_qwen3_vl_geometry3k_grpo.py — Inconsistent indentation in dataset config dicts and use of single quotes ('VLMJsonlDataset') where project standard requires double quotes.

Nits

Chinese comments in ci_run_rl.sh are fine for internal CI scripts but worth noting for consistency.
Typo in the shell comment on line 5: auotest/config/rl_qwen3_8B_grpo.py (should be autotest).

Overall the changes look reasonable — the migration to a self-contained CI script and updated config APIs make the RL test infrastructure more maintainable. Please fix the acceleator typo before merging.

hhaAndroid · 2026-05-25T12:17:03Z

+total_train_steps = 15  # TODO: total_epoch
+evaluate_step = 15
+train_optimizer_steps = 4
+train_batch_size = 1024


这个太大了，可以参考以前值 64

hhaAndroid · 2026-05-25T12:17:21Z

+train_batch_size = 1024
 prompt_repeat_k = 5
-rollout_tp_size = 2
+rollout_tp_size = 1


测试 tp2，能测的多一点

hhaAndroid · 2026-05-25T12:18:50Z

    ignore_idx=-100,
-    use_kl_loss=True,
-    kl_loss_coef=0.001,
+    use_kl_loss=False,


这个也不要改，测试覆盖率更高

其实这个配置是examples/v1/config/rl_grpo_geo3k_judge.py中的

hhaAndroid · 2026-05-25T12:20:16Z

@@ -1,57 +1,56 @@
+"""RL Colocate Trainer 示例配置（GRPO + GSM8K）。


我觉得 autotest 要不直接测试 rl_qwen3p5_vl_35B_grpo_mixdata.py 这个配置。覆盖率更高，而且是 qwen3.5

第一批先加已有的吧

kkscilife added 9 commits May 18, 2026 18:44

adapt to new code

2620c5b

new config

aa512a4

output to jsonl

71cdcef

use new script

2192551

try vl

c357e11

Merge branch 'main' into ci/rl_new

5a0673b

vl case

a10a582

change step

4f11574

set output format

d90d677

kkscilife requested a review from hhaAndroid May 25, 2026 11:11

claude Bot reviewed May 25, 2026

View reviewed changes

fix spell error

20e7aa8

hhaAndroid reviewed May 25, 2026

View reviewed changes

adjust grad_norm for running resume

e387118

hhaAndroid approved these changes May 27, 2026

View reviewed changes

hhaAndroid merged commit 6bf05f0 into main May 27, 2026
7 checks passed

hhaAndroid deleted the ci/rl_new branch May 27, 2026 02:21

	acceleator = config.get("parameters", {}).get("acceleator", "GPU")
	accelerator = config.get("parameters", {}).get("accelerator", "GPU")

	+ f"bash -x autotest/utils/ci_run_rl.sh {acceleator} {infer_type} {config_path} ${{MODEL_PATH}} ${{DATA_PATH}} ${{EVAL_DATA_PATH}}"
	+ f"bash -x autotest/utils/ci_run_rl.sh {accelerator} {infer_type} {config_path} ${{MODEL_PATH}} ${{DATA_PATH}} ${{EVAL_DATA_PATH}}"

		@@ -1,57 +1,56 @@
		"""RL Colocate Trainer 示例配置（GRPO + GSM8K）。

Conversation

kkscilife commented May 25, 2026

Uh oh!

kkscilife commented May 25, 2026

Uh oh!

claude Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 25, 2026

Issues Found

Nits

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants