Skip to content

Dev #161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 19, 2025
Merged

Dev #161

merged 13 commits into from
Jun 19, 2025

Conversation

xming521
Copy link
Owner

@xming521 xming521 commented Jun 19, 2025

  • 更新torch版本至2.7.0,vllm版本到0.9.1,离线推理改为chat方式调用
  • 添加test_model_args and vllm_args配置项,允许自定义测试集文件
  • CLI中添加配置文件路径选项,支持设置WECLONE_CONFIG_PATH环境变量
  • 更新数据清理策略中的max_new_tokens和enable_thinking参数以优化推理过程
  • 部分功能适配qwen3

fix #158 fix #83 fix #77 fix #69

@xming521 xming521 requested a review from Copilot June 19, 2025 08:54
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for VLLM and test-model configurations across the codebase, extends the CLI with a custom --config-path option, refactors the testing suite to run the full pipeline against multiple config files, and bumps the project and config versions.

  • Introduce VllmArgs and TestModelArgs in config_models.py and wire them into WcConfig and create_config_by_arg_type
  • Update offline_infer.vllm_infer signature and ChatModel initialization in api_service.py
  • Refactor tests/test_full_pipeV2.py to parameterize tests over all configs, replacing the removed test_full_pipe.py

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
weclone/utils/config_models.py Add VllmArgs and TestModelArgs, comment out deprecated enum items
weclone/utils/config.py Handle "vllm" and "test_model" in create_config_by_arg_type
weclone/server/api_service.py Pass a JSON dict to ChatModel via model_dump
weclone/prompts/clean_data.py Fix markdown list numbering
weclone/eval/test_model.py Use TestModelArgs.test_data_path and completion_config
weclone/data/qa_generator.py Replace QaPairV2 with QaPair
weclone/data/models.py Rename QaPairV2 to QaPair
weclone/data/clean/strategies.py Update type hints from QaPairV2 to QaPair
weclone/core/inference/offline_infer.py Extend vllm_infer and inject VllmArgs
weclone/cli.py Add --config-path option (removed @click.group())
tests/test_full_pipeV2.py Restructure and parameterize full-pipeline tests
tests/configs/*.jsonc Bump versions, add test_model_args and vision_api sections
settings.template.jsonc & pyproject.toml Version bumps and dependency updates
README.md Added a note about model-size behavior
Comments suppressed due to low confidence (7)

weclone/cli.py:61

  • The @click.group() decorator was removed, so no CLI commands will be registered. Re-add @click.group() (and apply the option) to ensure subcommands load correctly.
@click.option("--config-path", default=None, help="指定配置文件路径,会设置WECLONE_CONFIG_PATH环境变量")

weclone/prompts/clean_data.py:15

  • The bullet starts with +3. which breaks standard markdown numbering. Change to 3. to restore correct formatting.
3. **风格代表性**  (Style Representativeness): 评估【回答 A】是否展现了自然、独特的人类对话风格特征。它是否仅仅是功能性的信息传递,还是带有个性化的色彩?关注点包括但不限于:是否体现了特定的语气(如友好、幽默、不耐烦、正式、脏话),是否包含口头禅、俚语、网络用语(如“yyds”、“绝绝子”)、表情符号 Emoji、颜文字、标点符号的特殊使用如“!!!”、“???”、“~”等表达、特定的缩写或短语、非标准的但一致的表达方式(如方言词汇、个人口癖)?如果包含请给予5分,不包含给予5分以下分数

README.md:45

  • [nitpick] This note is unprofessional and may confuse readers. Consider revising or removing to maintain a professional project tone.
> - 7B模型很容易训练成为二傻子,14B模型勉强可以交流,32B及以上的模型效果会更好。   

weclone/utils/config_models.py:126

  • Using Field(VisionApiConfig()) shares one default instance across all models. Switch back to default_factory=VisionApiConfig to ensure each MakeDatasetArgs gets its own VisionApiConfig.
    vision_api: VisionApiConfig = Field(VisionApiConfig())

weclone/utils/config_models.py:182

  • Providing a mutable default model instance can cause shared-state bugs. Use default_factory=VllmArgs instead of a direct instance.
    vllm_args: VllmArgs = Field(VllmArgs())

weclone/utils/config_models.py:183

  • Similarly, use default_factory=TestModelArgs to avoid sharing one instance of TestModelArgs across all WcConfig instances.
    test_model_args: TestModelArgs = Field(TestModelArgs())

weclone/core/inference/offline_infer.py:120

  • json_schema is not defined in this scope, causing a NameError. Either import or compute the schema or remove this field from extra_body.
    extra_body = {"guided_json": json_schema, "enable_thinking": False}

Comment on lines +61 to +62
# FULL = "full"
# FREEZE = "freeze"
Copy link
Preview

Copilot AI Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out enum members clutter the code. If these values are deprecated, remove them or add a proper deprecation notice instead of leaving commented code.

Suggested change
# FULL = "full"
# FREEZE = "freeze"

Copilot uses AI. Check for mistakes.

@xming521 xming521 merged commit dda773a into master Jun 19, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant