【CI】update rl cases#1835
Conversation
|
@claude review |
| return command, config | ||
| elif train_type == "rl": | ||
| infer_type = config.get("parameters", {}).get("infer_backend", "lmdeploy") | ||
| acceleator = config.get("parameters", {}).get("acceleator", "GPU") |
There was a problem hiding this comment.
Claude: Typo: acceleator should be accelerator — both the variable name and the config key lookup string are misspelled. This means if someone adds accelerator: NPU to config.yaml, it won't be matched.
| acceleator = config.get("parameters", {}).get("acceleator", "GPU") | |
| accelerator = config.get("parameters", {}).get("accelerator", "GPU") |
| f"cd {current_dir}; pwd; pip install -e .[all]; export GITHUB_RUN_ID={config.get('run_id')}; export WORK_DIR={work_dir}; " | ||
| + cudnn_patch | ||
| + f"bash -x examples/v1/scripts/run_rl.sh {config_path} {infer_type} ${{MODEL_PATH}} ${{DATA_PATH}} ${{EVAL_DATA_PATH}}" | ||
| + f"bash -x autotest/utils/ci_run_rl.sh {acceleator} {infer_type} {config_path} ${{MODEL_PATH}} ${{DATA_PATH}} ${{EVAL_DATA_PATH}}" |
There was a problem hiding this comment.
Claude: Also needs to be updated to match the corrected variable name:
| + f"bash -x autotest/utils/ci_run_rl.sh {acceleator} {infer_type} {config_path} ${{MODEL_PATH}} ${{DATA_PATH}} ${{EVAL_DATA_PATH}}" | |
| + f"bash -x autotest/utils/ci_run_rl.sh {accelerator} {infer_type} {config_path} ${{MODEL_PATH}} ${{DATA_PATH}} ${{EVAL_DATA_PATH}}" |
| @@ -0,0 +1,208 @@ | |||
| set -ex | |||
There was a problem hiding this comment.
Claude: Missing shebang line. The script should start with #!/usr/bin/env bash (or #!/bin/bash) to explicitly declare the shell interpreter. Without it, the script behavior depends on whichever shell the caller uses.
| set -ex | |
| #!/usr/bin/env bash | |
| set -ex |
There was a problem hiding this comment.
Ignore; copyed from original script
| model_cfg.text_config.balancing_loss_cfg = None | ||
| if hasattr(model_cfg.text_config, "z_loss_cfg"): | ||
| model_cfg.text_config.z_loss_cfg = None | ||
| optim_cfg = AdamWConfig(lr=1e-6, foreach=False, weight_decay=0.1) | ||
| loss_cfg = GRPOLossConfig( | ||
| policy_loss_cfg=dict( | ||
| cliprange_high=0.2, | ||
| cliprange_high=0.28, | ||
| cliprange_low=0.2, |
There was a problem hiding this comment.
Claude: Inconsistent indentation — the inner dict entries use 12 spaces while the outer list uses 4. This looks like it was intended to be 8-space aligned (matching the GSM8K config style). Consider reformatting for consistency:
| model_cfg.text_config.balancing_loss_cfg = None | |
| if hasattr(model_cfg.text_config, "z_loss_cfg"): | |
| model_cfg.text_config.z_loss_cfg = None | |
| optim_cfg = AdamWConfig(lr=1e-6, foreach=False, weight_decay=0.1) | |
| loss_cfg = GRPOLossConfig( | |
| policy_loss_cfg=dict( | |
| cliprange_high=0.2, | |
| cliprange_high=0.28, | |
| cliprange_low=0.2, | |
| "dataset": DatasetConfig( | |
| name="geo3k", | |
| anno_path=data_path, | |
| class_name="VLMJsonlDataset", | |
| media_root=media_root, | |
| sample_ratio=1.0, | |
| ), | |
| "tokenize_fn": RLQwen3VLTokenizeFnConfig(processor_path=model_path, | |
| max_length=max_prompt_length), |
Also, per project style (CLAUDE.md), use double quotes "VLMJsonlDataset" instead of single quotes 'VLMJsonlDataset'.
| dataloader_cfg = DataloaderConfig( | ||
| dataset_config_list=train_dataset_cfg, | ||
| pack_max_length=pack_max_length, | ||
| collator="fake_collator", | ||
| pack_level="none", | ||
| num_workers=8, | ||
| ) | ||
| sampler_config = SamplerConfig( | ||
| dataloader_cfg=dataloader_cfg, | ||
| prompt_repeat_k=prompt_repeat_k, |
There was a problem hiding this comment.
Claude: Same formatting issues here — mixed indentation and single quotes. These should match the project's double-quote convention ("VLMJsonlDataset").
|
Claude: ## Review Summary This PR updates CI RL test cases: fixes a broken config reference, migrates to a dedicated CI script ( Issues FoundBug (must fix):
Should fix:
Nits
Overall the changes look reasonable — the migration to a self-contained CI script and updated config APIs make the RL test infrastructure more maintainable. Please fix the |
| total_train_steps = 15 # TODO: total_epoch | ||
| evaluate_step = 15 | ||
| train_optimizer_steps = 4 | ||
| train_batch_size = 1024 |
| train_batch_size = 1024 | ||
| prompt_repeat_k = 5 | ||
| rollout_tp_size = 2 | ||
| rollout_tp_size = 1 |
| ignore_idx=-100, | ||
| use_kl_loss=True, | ||
| kl_loss_coef=0.001, | ||
| use_kl_loss=False, |
There was a problem hiding this comment.
其实这个配置是examples/v1/config/rl_grpo_geo3k_judge.py中的
| @@ -1,57 +1,56 @@ | |||
| """RL Colocate Trainer 示例配置(GRPO + GSM8K)。 | |||
There was a problem hiding this comment.
我觉得 autotest 要不直接测试 rl_qwen3p5_vl_35B_grpo_mixdata.py 这个配置。覆盖率更高,而且是 qwen3.5
No description provided.