-
Notifications
You must be signed in to change notification settings - Fork 1.1k
project translated to English #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…聊天记录导出说明及LangBot集成信息。
…py中修正vllm_infer的输出处理逻辑。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR aims to update and partially translate the project into English while introducing several improvements and new features. Key changes include:
- Addition of a new function (calculate_token_length) with logging in length_cdf.py.
- Updates to configuration models and arguments (including Telegram support) with updated version strings.
- Documentation and README adjustments to provide both Chinese and English details.
Reviewed Changes
Copilot reviewed 19 out of 20 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
weclone/utils/length_cdf.py | Added calculate_token_length function; log messages and docstring remain in Chinese. |
weclone/utils/config_models.py | Refactored BaseModel usage and introduced TelegramArgs. |
weclone/data/clean/strategies.py | Increased max_new_tokens from 100 to 200 in vllm_infer call. |
tests/tests_data/test_person/test_0_730.csv | Updated test data rows to numeric/empty messages. |
README.md, settings.template.jsonc, etc. | Version and documentation updates for an English translation. |
Comments suppressed due to low confidence (4)
weclone/utils/length_cdf.py:41
- The log message in calculate_token_length is still in Chinese. Consider translating it (as well as the docstring) into English for consistency with the project's translated content.
logger.info(f"正在计算文本token长度: {text[:50]}...")
weclone/data/clean/strategies.py:124
- The max_new_tokens default has been increased from 100 to 200. Please confirm that this change is intentional and will not negatively affect performance or output length expectations.
max_new_tokens=200,
tests/tests_data/test_person/test_0_730.csv:7
- The updated message content is a numeric value ('2.0156416') instead of text. Verify if this change is deliberate and does not break downstream assumptions about the message format.
10,5704142615879617852,文本,0,wxid_6789z5qlxzfj22,wxid_6789z5qlxzfj22,2.0156416,,2024/10/4 11:43
tests/tests_data/test_person/test_0_730.csv:8
- This test data row now has an empty message field. Please confirm that having an empty string as message content is intentional for the test scenario.
11,1337798072543283708,文本,0,wxid_6789z5qlxzfj22,wxid_6789z5qlxzfj22,,,2024/10/4 11:43
No description provided.