Dev #161

xming521 · 2025-06-19T02:35:47Z

更新torch版本至2.7.0，vllm版本到0.9.1，离线推理改为chat方式调用
添加test_model_args and vllm_args配置项，允许自定义测试集文件
CLI中添加配置文件路径选项，支持设置WECLONE_CONFIG_PATH环境变量
更新数据清理策略中的max_new_tokens和enable_thinking参数以优化推理过程
部分功能适配qwen3

…置以优化数据清理策略。

Fixes #158

Copilot

Pull Request Overview

This PR adds support for VLLM and test-model configurations across the codebase, extends the CLI with a custom --config-path option, refactors the testing suite to run the full pipeline against multiple config files, and bumps the project and config versions.

Introduce VllmArgs and TestModelArgs in config_models.py and wire them into WcConfig and create_config_by_arg_type
Update offline_infer.vllm_infer signature and ChatModel initialization in api_service.py
Refactor tests/test_full_pipeV2.py to parameterize tests over all configs, replacing the removed test_full_pipe.py

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
weclone/utils/config_models.py	Add `VllmArgs` and `TestModelArgs`, comment out deprecated enum items
weclone/utils/config.py	Handle `"vllm"` and `"test_model"` in `create_config_by_arg_type`
weclone/server/api_service.py	Pass a JSON dict to `ChatModel` via `model_dump`
weclone/prompts/clean_data.py	Fix markdown list numbering
weclone/eval/test_model.py	Use `TestModelArgs.test_data_path` and `completion_config`
weclone/data/qa_generator.py	Replace `QaPairV2` with `QaPair`
weclone/data/models.py	Rename `QaPairV2` to `QaPair`
weclone/data/clean/strategies.py	Update type hints from `QaPairV2` to `QaPair`
weclone/core/inference/offline_infer.py	Extend `vllm_infer` and inject `VllmArgs`
weclone/cli.py	Add `--config-path` option (removed `@click.group()`)
tests/test_full_pipeV2.py	Restructure and parameterize full-pipeline tests
tests/configs/*.jsonc	Bump versions, add `test_model_args` and `vision_api` sections
settings.template.jsonc & pyproject.toml	Version bumps and dependency updates
README.md	Added a note about model-size behavior

Comments suppressed due to low confidence (7)

weclone/cli.py:61

The @click.group() decorator was removed, so no CLI commands will be registered. Re-add @click.group() (and apply the option) to ensure subcommands load correctly.

@click.option("--config-path", default=None, help="指定配置文件路径，会设置WECLONE_CONFIG_PATH环境变量")

weclone/prompts/clean_data.py:15

The bullet starts with +3. which breaks standard markdown numbering. Change to 3. to restore correct formatting.

3. **风格代表性**  (Style Representativeness): 评估【回答 A】是否展现了自然、独特的人类对话风格特征。它是否仅仅是功能性的信息传递，还是带有个性化的色彩？关注点包括但不限于：是否体现了特定的语气（如友好、幽默、不耐烦、正式、脏话），是否包含口头禅、俚语、网络用语（如“yyds”、“绝绝子”）、表情符号 Emoji、颜文字、标点符号的特殊使用如“!!!”、“???”、“~”等表达、特定的缩写或短语、非标准的但一致的表达方式（如方言词汇、个人口癖）？如果包含请给予5分，不包含给予5分以下分数

README.md:45

[nitpick] This note is unprofessional and may confuse readers. Consider revising or removing to maintain a professional project tone.

> - 7B模型很容易训练成为二傻子，14B模型勉强可以交流，32B及以上的模型效果会更好。

weclone/utils/config_models.py:126

Using Field(VisionApiConfig()) shares one default instance across all models. Switch back to default_factory=VisionApiConfig to ensure each MakeDatasetArgs gets its own VisionApiConfig.

    vision_api: VisionApiConfig = Field(VisionApiConfig())

weclone/utils/config_models.py:182

Providing a mutable default model instance can cause shared-state bugs. Use default_factory=VllmArgs instead of a direct instance.

    vllm_args: VllmArgs = Field(VllmArgs())

weclone/utils/config_models.py:183

Similarly, use default_factory=TestModelArgs to avoid sharing one instance of TestModelArgs across all WcConfig instances.

    test_model_args: TestModelArgs = Field(TestModelArgs())

weclone/core/inference/offline_infer.py:120

json_schema is not defined in this scope, causing a NameError. Either import or compute the schema or remove this field from extra_body.

    extra_body = {"guided_json": json_schema, "enable_thinking": False}

Copilot · 2025-06-19T08:57:15Z

weclone/utils/config_models.py

+    # FULL = "full"
+    # FREEZE = "freeze"


Commented-out enum members clutter the code. If these values are deprecated, remove them or add a proper deprecation notice instead of leaving commented code.

Suggested change

# FULL = "full"

# FREEZE = "freeze"

xming521 added 12 commits June 13, 2025 15:49

feat(tests): 添加测试模型参数配置，允许自定义测试集文件

e239893

更新版本号至0.2.24；在CLI中添加配置文件路径选项，支持设置WECLONE_CONFIG_PATH环境变量

9ae5f3a

新增vllm_args配置，更新测试模型参数配置，允许自定义测试数据路径。

55fdfe2

新增enable_thinking参数以支持推理过程中的思考功能；更新repetition_penalty和max_new_tokens配…

4ca7157

…置以优化数据清理策略。

更新数据清理策略中的max_new_tokens和enable_thinking参数以优化推理过程。

19e1fde

Merge remote-tracking branch 'origin/master' into dev

73fd179

更新pyproject.toml中的vllm依赖版本至0.9.1；离线推理改为chat方式调用。

f3c2fa6

更新pyproject.toml中的torch版本至2.7.0，并将pytorch源从cu124更新至cu126。

1f43e63

更新pyproject.toml中的triton依赖版本至3.3.0。

01a5deb

0.2.23 运行 weclone-cli server 报错

bc121f0

Fixes #158

更新版本号至0.2.24，修改torch依赖版本至2.7.1，更新README.md。

a8bde3f

将torch依赖版本从2.7.1降级至2.7.0。

cb870d6

xming521 requested a review from Copilot June 19, 2025 08:54

Copilot AI reviewed Jun 19, 2025

View reviewed changes

更新README.md。

39717ab

xming521 merged commit dda773a into master Jun 19, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dev #161

Dev #161

Uh oh!

xming521 commented Jun 19, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

Dev #161

Dev #161

Uh oh!

Conversation

xming521 commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

xming521 commented Jun 19, 2025 •

edited

Loading