feat(翻译功能): ✨ 新增功能助手和谷歌翻译模块 #104

ChinaGodMan · 2025-03-23T02:49:26Z

拆分原来的谷歌翻译重复代码为模块，方便修改

变更内容

将原来的translate_chinese_to_filelang.py和translate_force_chinese_to_lang.py重复代码拆分为独立的模块，方便修改

变更类型

修复 Bug
新功能
代码优化
QA

测试情况

本机测试无错。

好的，这是将 pull request 总结翻译成中文的结果：

Sourcery 总结

重构了翻译功能，通过将 Google Translate API 调用和相关实用程序提取到一个单独的模块中，以实现更好的代码组织和可重用性。它还引入了一个辅助模块，用于文件操作和其他实用功能。

增强功能：

将 Google Translate API 调用和实用程序提取到 google_translate 模块中。
引入了一个辅助模块，用于文件操作和其他实用功能。
通过分离关注点，提高代码模块化和可重用性。
更新翻译脚本以使用新模块。
添加了一个从文件名中提取语言代码的函数。
删除冗余代码并提高代码可读性。
添加了一个读取 json 文件的函数。
添加了一个从行中提取中文文本的函数。
添加了一个用 UTF-8 文本替换编码文本的函数。

Original summary in English

Summary by Sourcery

Refactors the translation functionality by extracting the Google Translate API calls and related utilities into a separate module for better code organization and reusability. It also introduces a helper module for file operations and other utility functions.

Enhancements:

Extracts Google Translate API calls and utilities into a google_translate module.
Introduces a helper module for file operations and other utility functions.
Improves code modularity and reusability by separating concerns.
Updates the translation scripts to use the new modules.
Adds a function to extract language codes from filenames.
Removes redundant code and improves code readability.
Adds a function to read json files.
Adds a function to extract chinese texts from lines.
Adds a function to replace encoded text with UTF-8 text.

* 拆分原来的谷歌翻译重复代码为模块，方便修改

changeset-bot · 2025-03-23T02:49:30Z

⚠️ No Changeset found

Latest commit: 947df86

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

pr-code-reviewer · 2025-03-23T02:49:32Z

👋 Hi there!

Add a descriptive docstring at the beginning of the file to explain its purpose and usage.
Utilize functions for repetitive tasks instead of placing all the code in the global scope.
Use more descriptive variable names and comments to improve code readability and maintainability.

_{^{Automatically generated with the help of gpt-3.5-turbo.

Feedback? Please don't hesitate to drop me an email at webber@takken.io.}}

sourcery-ai · 2025-03-23T02:49:36Z

## Sourcery 提供的审查者指南

此 pull request 通过将常用功能提取到可重用模块（`google_translate.py` 和 `helper.py`）中，重构了翻译脚本。此更改减少了代码重复，提高了可维护性，并简化了主翻译脚本（`translate_force_chinese_to_lang.py` 和 `translate_chinese_to_filelang.py`）。

#### `translate_readme` 函数的序列图

```mermaid
sequenceDiagram
    participant TFCTL as translate_force_chinese_to_lang.py
    participant Helper as helper.py
    participant GT as google_translate.py

    TFCTL->>Helper: is_file_updated_more_than(readme_path, timeout)
    activate Helper
    Helper-->>TFCTL: True/False
deactivate Helper

    alt File needs translation
        TFCTL->>Helper: read_file_to_memory(readme_path)
        activate Helper
        Helper-->>TFCTL: lines
        deactivate Helper

        TFCTL->>GT: replace_encoded_with_utf8(lines)
        activate GT
        GT-->>TFCTL: lines
        deactivate GT

        TFCTL->>GT: extract_chinese_texts(lines)
        activate GT
        GT-->>TFCTL: chinese_texts
        deactivate GT

        loop for each language
            TFCTL->>GT: translate_and_save(lines, chinese_texts, lang, True, translatefile)
            activate GT
            GT->>GT: translate_text(chinese_text, lang)
            activate GT
            GT-->>GT: translated_text
            deactivate GT
            GT-->>TFCTL: 
            deactivate GT
        end
    else File does not need translation
        TFCTL->>TFCTL: Skip translation
    end

翻译模块的更新后的类图

classDiagram
    class translate_force_chinese_to_lang {
        +translate_readme(data)
    }
    class translate_chinese_to_filelang {
        +process_files()
        +process_file(root, file, lang_code)
    }
    class google_translate {
        +replace_encoded_with_utf8(lines)
        +extract_chinese_texts(lines)
        +translate_text(text, target_lang)
        +translate_and_save(lines, chinese_texts, lang, shrink, file_path)
    }
    class helper {
        +read_file_to_memory(file_path)
        +is_file_updated_more_than(file_path, timeout_minutes)
        +read_json(file_path)
        +extract_lang_code(file)
    }

    translate_force_chinese_to_lang --|> google_translate : uses
    translate_force_chinese_to_lang --|> helper : uses
    translate_chinese_to_filelang --|> google_translate : uses
    translate_chinese_to_filelang --|> helper : uses

    note for translate_force_chinese_to_lang "用于翻译 README 文件的主要脚本，现在使用 google_translate 和 helper 模块。"
    note for translate_chinese_to_filelang "用于将文件翻译成不同语言的主要脚本，现在使用 google_translate 和 helper 模块。"
    note for google_translate "包含翻译函数和数据的模块。"
    note for helper "包含文件读取和时间检查等辅助函数的模块。"

文件级别更改

变更	详情	文件
将翻译逻辑重构为可重用模块。	创建了 `google_translate.py` 以封装 Google Translate API 交互、文本编码替换和翻译缓存。创建了 `helper.py` 以封装文件读取、git 提交时间检查和 JSON 加载。从 `translate_force_chinese_to_lang.py` 和 `translate_chinese_to_filelang.py` 中删除了重复代码。	`utils/translate_force_chinese_to_lang.py` `utils/translate_chinese_to_filelang.py` `utils/google_translate.py` `utils/helper.py`
简化了主脚本中的翻译过程。	用对新模块的调用替换了直接 API 调用和编码处理。减少了代码重复并提高了可读性。改进了并发翻译的线程管理。	`utils/translate_force_chinese_to_lang.py` `utils/translate_chinese_to_filelang.py`

提示和命令

与 Sourcery 互动

触发新的审查： 在 pull request 上评论 @sourcery-ai review。
继续讨论： 直接回复 Sourcery 的审查评论。
从审查评论生成 GitHub issue： 通过回复审查评论，要求 Sourcery 从审查评论创建一个 issue。您也可以回复审查评论并使用 @sourcery-ai issue 从中创建一个 issue。
生成 pull request 标题： 在 pull request 标题中的任何位置写入 @sourcery-ai 以随时生成标题。您也可以在 pull request 上评论 @sourcery-ai title 以随时（重新）生成标题。
生成 pull request 摘要： 在 pull request 正文中的任何位置写入 @sourcery-ai summary 以随时在您想要的位置生成 PR 摘要。您也可以在 pull request 上评论 @sourcery-ai summary 以随时（重新）生成摘要。
生成审查者指南： 在 pull request 上评论 @sourcery-ai guide 以随时（重新）生成审查者指南。
解决所有 Sourcery 评论： 在 pull request 上评论 @sourcery-ai resolve 以解决所有 Sourcery 评论。如果您已经解决了所有评论并且不想再看到它们，这将非常有用。
驳回所有 Sourcery 审查： 在 pull request 上评论 @sourcery-ai dismiss 以驳回所有现有的 Sourcery 审查。如果您想从新的审查开始，这将特别有用 - 不要忘记评论 @sourcery-ai review 以触发新的审查！
为 issue 生成行动计划： 在 issue 上评论 @sourcery-ai plan 以为其生成行动计划。

自定义您的体验

访问您的仪表板以：

启用或禁用审查功能，例如 Sourcery 生成的 pull request 摘要、审查者指南等。
更改审查语言。
添加、删除或编辑自定义审查说明。
调整其他审查设置。

获得帮助

联系我们的支持团队提出问题或反馈。
访问我们的文档以获取详细指南和信息。
通过在 X/Twitter、LinkedIn 或 GitHub 上关注我们，与 Sourcery 团队保持联系。

```

Original review guide in English

Reviewer's Guide by Sourcery

This pull request refactors the translation scripts by extracting common functionalities into reusable modules (google_translate.py and helper.py). This change reduces code duplication, improves maintainability, and simplifies the main translation scripts (translate_force_chinese_to_lang.py and translate_chinese_to_filelang.py).

Sequence diagram for translate_readme function

sequenceDiagram
    participant TFCTL as translate_force_chinese_to_lang.py
    participant Helper as helper.py
    participant GT as google_translate.py

    TFCTL->>Helper: is_file_updated_more_than(readme_path, timeout)
    activate Helper
    Helper-->>TFCTL: True/False
deactivate Helper

    alt File needs translation
        TFCTL->>Helper: read_file_to_memory(readme_path)
        activate Helper
        Helper-->>TFCTL: lines
        deactivate Helper

        TFCTL->>GT: replace_encoded_with_utf8(lines)
        activate GT
        GT-->>TFCTL: lines
        deactivate GT

        TFCTL->>GT: extract_chinese_texts(lines)
        activate GT
        GT-->>TFCTL: chinese_texts
        deactivate GT

        loop for each language
            TFCTL->>GT: translate_and_save(lines, chinese_texts, lang, True, translatefile)
            activate GT
            GT->>GT: translate_text(chinese_text, lang)
            activate GT
            GT-->>GT: translated_text
            deactivate GT
            GT-->>TFCTL: 
            deactivate GT
        end
    else File does not need translation
        TFCTL->>TFCTL: Skip translation
    end

Updated class diagram for translation modules

classDiagram
    class translate_force_chinese_to_lang {
        +translate_readme(data)
    }
    class translate_chinese_to_filelang {
        +process_files()
        +process_file(root, file, lang_code)
    }
    class google_translate {
        +replace_encoded_with_utf8(lines)
        +extract_chinese_texts(lines)
        +translate_text(text, target_lang)
        +translate_and_save(lines, chinese_texts, lang, shrink, file_path)
    }
    class helper {
        +read_file_to_memory(file_path)
        +is_file_updated_more_than(file_path, timeout_minutes)
        +read_json(file_path)
        +extract_lang_code(file)
    }

    translate_force_chinese_to_lang --|> google_translate : uses
    translate_force_chinese_to_lang --|> helper : uses
    translate_chinese_to_filelang --|> google_translate : uses
    translate_chinese_to_filelang --|> helper : uses

    note for translate_force_chinese_to_lang "Main script for translating README files, now using google_translate and helper modules."
    note for translate_chinese_to_filelang "Main script for translating files to different languages, now using google_translate and helper modules."
    note for google_translate "Module containing translation functions and data."
    note for helper "Module containing helper functions such as file reading and time checking."

File-Level Changes

Change	Details	Files
Refactored translation logic into reusable modules.	Created `google_translate.py` to encapsulate Google Translate API interactions, text encoding replacement, and translation caching. Created `helper.py` to encapsulate file reading, git commit time checking, and JSON loading. Removed duplicate code from `translate_force_chinese_to_lang.py` and `translate_chinese_to_filelang.py`.	`utils/translate_force_chinese_to_lang.py` `utils/translate_chinese_to_filelang.py` `utils/google_translate.py` `utils/helper.py`
Simplified the translation process in main scripts.	Replaced direct API calls and encoding handling with calls to the new modules. Reduced code duplication and improved readability. Improved thread management for concurrent translations.	`utils/translate_force_chinese_to_lang.py` `utils/translate_chinese_to_filelang.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!
Generate a plan of action for an issue: Comment @sourcery-ai plan on
an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

ChinaGodBot · 2025-03-23T02:49:36Z

@ChinaGodMan 你好,人民的勤务员将尽快审查合并此次请求！🚀 [自动回复,请勿跟帖]

instapr · 2025-03-23T02:49:39Z

Feedback:

Great job splitting the repetitive code into separate modules google_translate.py and helper.py.
The changes look good and the code is well-organized.
Ensure consistent formatting and comments across all files.
Good work on optimizing the code!
Consider adding docstrings for functions for better clarity.

sourcery-ai

Hey @ChinaGodMan - I've reviewed your changes - here's some feedback:

Overall Comments:

Consider adding error handling for file operations, especially in google_translate.py and helper.py.
The threading logic could be simplified by using a thread pool executor.

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-03-23T02:50:24Z

utils/google_translate.py

+
+
+# 翻译并保存结果，覆盖原文件
+def translate_and_save(lines, chinese_texts, lang, shrink, file_path):


issue (complexity): Consider using a ThreadPoolExecutor for concurrency and extracting the line replacement logic into a helper function to improve code readability and maintainability by reducing low-level threading and nested replacement logic..

Consider replacing the manual thread creation/locking and the reversed list comprehension logic with higher-level abstractions. For example, you can use a ThreadPoolExecutor to manage concurrency and collect results, and extract the in-place line replacement logic into a small helper function. This would make the code easier to read and maintain while keeping all functionality. Example for concurrency: ```python from concurrent.futures import ThreadPoolExecutor, as_completed def translate_worker(line_num, chinese_text, lang): translated = translate_text(chinese_text, lang) return (line_num, chinese_text, translated) def translate_and_save(lines, chinese_texts, lang, shrink, file_path): translations = {} with ThreadPoolExecutor(max_workers=5) as executor: futures = { executor.submit(translate_worker, ln, ct, lang): (ln, ct) for ln, ct in chinese_texts } for future in as_completed(futures): ln, ct = futures[future] result = future.result() if result[2]: translations[(ln, ct)] = result[2] new_lines = update_lines(lines, chinese_texts, translations) # ... rest of the file write logic remains unchanged ...

And then extract line replacement logic:

def update_lines(lines, chinese_texts, translations): updated_lines = list(lines) for ln, ct, translated in reversed( [(ln, ct, translations.get((ln, ct))) for ln, ct in chinese_texts if (ln, ct) in translations] ): updated_lines[ln] = updated_lines[ln].replace(ct, translated, 1) return updated_lines

These changes reduce low-level threading and nested replacement logic while preserving behavior.

sourcery-ai · 2025-03-23T02:50:24Z

utils/google_translate.py

+    # 从后往前替换中文文本
+    new_lines = lines[:]
+    for line_number, chinese_text, translated_text in reversed(
+            [(ln, ct, translations.get((ln, ct), None)) for ln, ct in chinese_texts if (ln, ct) in translations]):


suggestion (code-quality): Replace dict.get(x, None) with dict.get(x) (remove-none-from-default-get)

Suggested change

[(ln, ct, translations.get((ln, ct), None)) for ln, ct in chinese_texts if (ln, ct) in translations]):

[(ln, ct, translations.get((ln, ct))) for ln, ct in chinese_texts if (ln, ct) in translations]):

Explanation
When using a dictionary's get method you can specify a default to return if
the key is not found. This defaults to None, so it is unnecessary to specify
None if this is the required behaviour. Removing the unnecessary argument
makes the code slightly shorter and clearer.

sourcery-ai · 2025-03-23T02:50:24Z

utils/google_translate.py

+    # 调用翻译 API 进行翻译
+    api_url = 'https://translate.googleapis.com/translate_a/single'
+    params = {'client': 'gtx', 'dt': 't', 'sl': 'auto', 'tl': target_lang, 'q': text}
+    full_url = api_url + '?' + urlencode(params)


suggestion (code-quality): Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)

Suggested change

full_url = api_url + '?' + urlencode(params)

full_url = f'{api_url}?{urlencode(params)}'

sourcery-ai · 2025-03-23T02:50:24Z

utils/google_translate.py

+        translated_text = translate_text(chinese_text, lang)
+        if translated_text:


suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change

translated_text = translate_text(chinese_text, lang)

if translated_text:

if translated_text := translate_text(chinese_text, lang):

sourcery-ai · 2025-03-23T02:50:24Z

utils/google_translate.py

+        output_dir = os.path.dirname(file_path)
+        dir_with_lang = os.path.join(output_dir, lang)
+        if not os.path.exists(dir_with_lang):
+            os.makedirs(dir_with_lang)
+        output_path = os.path.join(dir_with_lang, 'README.md')
+        with open(output_path, 'w', encoding='utf-8') as f_out:
+            f_out.writelines(new_lines)
+        print(f"翻译完成，收缩到 [{lang}]目录,写入内容到'{output_path}'")


issue (code-quality): Extract code out into function (extract-method)

sourcery-ai · 2025-03-23T02:50:24Z

utils/helper.py

+    with open(file_path, 'r', encoding='utf-8') as f_in:
+        content = f_in.read()
+    virtual_file = io.StringIO(content)
+    lines = [line for line in virtual_file]


suggestion (code-quality): Replace identity comprehension with call to collection constructor (identity-comprehension)

Suggested change

lines = [line for line in virtual_file]

lines = list(virtual_file)

Explanation
Convert list/set/tuple comprehensions that do not change the input elements into.

Before

# List comprehensions [item for item in coll] [item for item in friends.names()] # Dict comprehensions {k: v for k, v in coll} {k: v for k, v in coll.items()} # Only if we know coll is a `dict` # Unneeded call to `.items()` dict(coll.items()) # Only if we know coll is a `dict` # Set comprehensions {item for item in coll}

After

# List comprehensions list(iter(coll)) list(iter(friends.names())) # Dict comprehensions dict(coll) dict(coll) # Unneeded call to `.items()` dict(coll) # Set comprehensions set(coll)

All these comprehensions are just creating a copy of the original collection.
They can all be simplified by simply constructing a new collection directly. The
resulting code is easier to read and shows the intent more clearly.

sourcery-ai · 2025-03-23T02:50:24Z

utils/helper.py

+    match = re.match(r'README_([a-zA-Z\-]+)\.md', file)
+    if match:
+        return match.group(1)


suggestion (code-quality): We've found these issues:

Use named expression to simplify assignment and conditional (use-named-expression)

Replace m.group(x) with m[x] for re.Match objects (use-getitem-for-re-match-groups)

Suggested change

match = re.match(r'README_([a-zA-Z\-]+)\.md', file)

if match:

return match.group(1)

if match := re.match(r'README_([a-zA-Z\-]+)\.md', file):

return match[1]

llamapreview

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 Core Changes

Primary purpose and scope: Refactor the translation functionality to improve code organization, reusability, and maintainability. The main goal is to extract the Google Translate API calls and related utilities into separate modules.
Key components modified: translate_chinese_to_filelang.py, translate_force_chinese_to_lang.py are modified, and two new files google_translate.py and helper.py are added.
Cross-component impacts: The changes introduce new dependencies between the main translation scripts and the newly created modules (google_translate.py and helper.py).
Business value alignment: The refactoring reduces code duplication (by ~368 lines), improves system scalability and maintainability, and separates concerns, leading to easier future enhancements and bug fixes.

1.2 Technical Architecture

System design modifications: The original monolithic structure within translate_*.py files, which included translation logic, file operations, and utility functions, has been refactored into a modular structure.
Component interaction changes:
- google_translate.py: Contains the core translation logic using the Google Translate API.
- helper.py: Provides utility functions like file reading, commit time checking, JSON loading, and language code extraction.
- translate_*.py: Act as business entry points, utilizing the google_translate and helper modules.
Integration points impact: The main scripts now depend on the new modules.
Dependency changes and implications: New cross-file dependencies are introduced.

2. Critical Findings

2.1 Must Fix (P0🔴)

Issue: Incorrect mapping in translation_cache leading to erroneous translations.

Analysis Confidence: High
Impact: Critical; key business terms are mistranslated, rendering the output incorrect. Affects the core functionality of the translation process.
Resolution: Remove or correct the incorrect cache entries in translation_cache. Prioritize API translation for these terms.

Issue: Potential thread safety issue with the global translation_cache.

Analysis Confidence: High
Impact: Critical; concurrent access to translation_cache without proper locking can lead to race conditions and inconsistent data.
Resolution: Introduce a dedicated lock (cache_lock) to protect all read and write operations on translation_cache.

Issue: Incomplete error handling when translated_text is None.

Analysis Confidence: High
Impact: Critical; can lead to unexpected behavior or crashes when a translation fails and returns None.
Resolution: Add a check for None after calling translate_text and handle the case appropriately (e.g., logging an error, skipping the replacement).

2.2 Should Fix (P1🟡)

Issue: Inefficient string replacement in replace_encoded_with_utf8.

Analysis Confidence: High
Impact: Performance; the current implementation uses nested loops and repeated string replacements, which can be slow for large files or numerous replacements.
Suggested Solution: Use a regular expression-based approach for significantly faster replacement.

Issue: Incomplete Chinese character detection.

Analysis Confidence: High
Impact: Functionality; the current regular expression does not cover all Chinese characters, potentially missing some characters during translation.
Suggested Solution: Expand the regular expression to include extended Chinese character ranges.

2.3 Consider (P2🟢)

Area: Configuration Management

Analysis Confidence: Medium
Improvement Opportunity: Improve maintainability and flexibility by moving hardcoded values (like json_data and blacklist) to external configuration files.

Area: Persistent Translation Cache

Analysis Confidence: Medium
Improvement Opportunity: Enhance performance and reduce API calls by implementing a persistent cache (e.g., using shelve) to store translations across multiple runs.

2.4 Summary of Action Items

Immediate (P0🔴): Fix incorrect translation_cache mappings, address thread safety for translation_cache, and handle None return values from translate_text.
High Priority (P1🟡): Optimize replace_encoded_with_utf8 for performance and improve Chinese character detection.
Medium Priority (P2🟢): Consider externalizing configuration and implementing a persistent translation cache.

3. Technical Analysis

3.1 Code Logic Analysis

📁 utils/google_translate.py - translate_text

Submitted PR Code:

    def translate_text(text, target_lang):
        if text in blacklist:
            return text
        # 如果在缓存中，判断布尔值
        if text in translation_cache:
            cached_translation, needs_api_translation = translation_cache[text]
            # 如果缓存中的布尔值为 False，直接使用缓存翻译
            if not needs_api_translation:
                # print(f"从缓存中获取翻译：{text} -> {cached_translation}")
                return cached_translation
            # 如果布尔值为 True，强制调用 API 翻译，不使用缓存的翻译
            else:
                print(f"{text} 在缓存中，但需要通过 API 翻译。")
        # 调用翻译 API 进行翻译
        api_url = 'https://translate.googleapis.com/translate_a/single'
        params = {'client': 'gtx', 'dt': 't', 'sl': 'auto', 'tl': target_lang, 'q': text}
        full_url = api_url + '?' + urlencode(params)
        try:
            # 调用 API 获取翻译
            response = urlopen(full_url)
            data = response.read().decode('utf-8')
            translated_text = json.loads(data.replace("'", "\u2019"))[0][0][0]
            # 如果缓存中该词条的布尔值为 True，进行 URL 编码
            if text in translation_cache and translation_cache[text][1]:
                translated_text = urllib.parse.quote(translated_text)
                # print(f"URL 编码后的翻译：{translated_text}")
            return translated_text
        except Exception as e:
            print(f"翻译错误：{e}")
            return None

Analysis:
- Current logic: Checks for blacklist, then cache, then calls Google Translate API. Handles exceptions during API calls. Conditionally URL-encodes the result.
- Potential issues: Incorrect cache entries (as noted in Critical Findings). No validation of the structure of the API response. No thread safety for translation_cache access.
- Cross-component impact: This function is the core of the translation process and is used by other functions in the module.
- Business logic considerations: The caching logic and URL encoding are specific to the use case.
LlamaPReview Suggested Improvements:

    from threading import Lock
    cache_lock = Lock()

    def translate_text(text, target_lang):
        if text in blacklist:
            return text

        with cache_lock:  # Protect translation_cache access
            if text in translation_cache:
                cached_translation, needs_api_translation = translation_cache[text]
                if not needs_api_translation:
                    return cached_translation
                else:
                    print(f"{text} in cache, but needs API translation.")

        api_url = 'https://translate.googleapis.com/translate_a/single'
        params = {'client': 'gtx', 'dt': 't', 'sl': 'auto', 'tl': target_lang, 'q': text}
        full_url = api_url + '?' + urlencode(params)
        try:
            response = urlopen(full_url)
            data = response.read().decode('utf-8')
            # Improved JSON parsing and validation
            try:
                translated_text = json.loads(data.replace("'", "\u2019"))[0][0][0]
                if not isinstance(translated_text, str):
                    raise ValueError("Invalid translation format")
            except (IndexError, TypeError, json.JSONDecodeError, ValueError) as e:
                print(f"API response parsing failed: {e}")
                return None

            with cache_lock: # Protect translation_cache access
                if text in translation_cache and translation_cache[text][1]:
                    translated_text = urllib.parse.quote(translated_text)
            return translated_text
        except Exception as e:
            print(f"Translation error: {e}")
            return None

Improvement rationale:
- Technical benefits: Adds thread safety for translation_cache access. Includes basic validation of the API response to prevent unexpected errors.
- Business value: Improves the robustness and reliability of the translation process.
- Risk assessment: Low risk; the changes are localized and address potential errors.

📁 utils/google_translate.py - replace_encoded_with_utf8

Submitted PR Code:

    def replace_encoded_with_utf8(lines):
        updated_lines = []
        for line in lines:
            for chinese_text, encoded_value in json_data.items():
                if encoded_value in line:
                    line = line.replace(encoded_value, chinese_text)
            updated_lines.append(line)
        return updated_lines

Analysis:
- Current logic: Iterates through each line and each key-value pair in json_data, performing string replacement.
- Potential issues: Inefficient due to nested loops and repeated replace calls.
- Cross-component impact: Used by both translate_chinese_to_filelang.py and translate_force_chinese_to_lang.py.
- Business logic considerations: This function handles the specific encoding used in the project.
LlamaPReview Suggested Improvements:

    import re

    def replace_encoded_with_utf8(lines):
        encoded_map = {v: k for k, v in json_data.items()}  # Reverse the dictionary
        pattern = re.compile("|".join(map(re.escape, encoded_map.keys()))) # Create regex

        def replacer(match):
            return encoded_map[match.group(0)]

        updated_lines = []
        for line in lines:
            updated_lines.append(pattern.sub(replacer, line)) # Apply regex substitution
        return updated_lines

Improvement rationale:
- Technical benefits: Significantly improves performance by using regular expressions for replacement, which is much more efficient than nested loops and repeated string replacements.
- Business value: Reduces processing time, especially for large files.
- Risk assessment: Low risk; the logic remains the same, but the implementation is optimized.

📁 utils/google_translate.py - extract_chinese_texts

Submitted PR Code:

    def extract_chinese_texts(lines):
        chinese_pattern = re.compile(r'[\u4e00-\u9fff]+')  # Matches Chinese characters
        chinese_texts = []

        for line_number, line in enumerate(lines):
            if "<!--AUTO" in line:
                continue
            for match in chinese_pattern.finditer(line):
                chinese_text = match.group()
                chinese_texts.append((line_number, chinese_text))
        return chinese_texts

Analysis:
- Current logic: Iterates through lines, finds Chinese characters using a regular expression, and stores their line number and text.
- Potential issues: The regular expression might not cover all Chinese characters (e.g., extended CJK Unified Ideographs).
- Cross-component impact: Used by both translation scripts.
- Business logic considerations: Accurate Chinese text extraction is crucial for correct translation.
LlamaPReview Suggested Improvements:

    def extract_chinese_texts(lines):
        chinese_pattern = re.compile(
            r'[\u4E00-\u9FFF\u3400-\u4DBF\u20000-\u2A6DF\u2A700-\u2B73F]+'  # Expanded range
        )
        chinese_texts = []

        for line_number, line in enumerate(lines):
            if "<!--AUTO" in line:
                continue
            for match in chinese_pattern.finditer(line):
                chinese_text = match.group()
                chinese_texts.append((line_number, chinese_text))
        return chinese_texts

Improvement rationale:
- Technical benefits: More comprehensive Chinese character detection.
- Business value: Improves the accuracy of the translation by capturing all relevant Chinese text.
- Risk assessment: Low risk; the change only expands the range of characters matched.

📁 utils/google_translate.py - translate_and_save

    def translate_and_save(lines, chinese_texts, lang, shrink, file_path):
        translations = {}  # 每种语言有自己的翻译结果
        threads = []
        chunk_size = len(chinese_texts) // 5 or 1  # 假设5个线程，按块划分
        for i in range(0, len(chinese_texts), chunk_size):
            chunk = chinese_texts[i:i + chunk_size]
            thread = threading.Thread(target=translate_worker, args=(chunk, translations, lang))
            threads.append(thread)
            thread.start()
        # 等待所有线程完成
        for thread in threads:
            thread.join()
        # 从后往前替换中文文本
        new_lines = lines[:]
        for line_number, chinese_text, translated_text in reversed(
                [(ln, ct, translations.get((ln, ct), None)) for ln, ct in chinese_texts if (ln, ct) in translations]):
            new_lines[line_number] = new_lines[line_number].replace(
                chinese_text, translated_text, 1)
        if shrink:  # 允许创建多级目录,将每个语言作为单独的readme.md文件
            output_dir = os.path.dirname(file_path)
            dir_with_lang = os.path.join(output_dir, lang)
            if not os.path.exists(dir_with_lang):
                os.makedirs(dir_with_lang)
            output_path = os.path.join(dir_with_lang, 'README.md')
            with open(output_path, 'w', encoding='utf-8') as f_out:
                f_out.writelines(new_lines)
            print(f"翻译完成，收缩到 [{lang}]目录,写入内容到'{output_path}'")
        else:
            with open(file_path, 'w', encoding='utf-8') as f_out:
                f_out.writelines(new_lines)
            print(f"翻译完成，已将结果覆盖保存到 '{file_path}'")

Analysis:
- Current Logic: This function orchestrates the translation process. It divides the work into chunks, creates threads for parallel translation, waits for threads to complete, and then replaces the original Chinese text with the translated text. It handles saving the translated content to the appropriate file, either overwriting the original or creating a new file in a language-specific directory.
- Potential Issues: The code does not handle the case where translated_text might be None (due to translation errors). It directly uses replace which could lead to incorrect results if the translation fails.
- Cross-component impact: This is a key function that integrates the translation logic with file I/O.
- Business logic considerations: The shrink parameter controls whether to create separate files for each language or overwrite the original.
LlamaPReview Suggested Improvements:

    def translate_and_save(lines, chinese_texts, lang, shrink, file_path):
        translations = {}  # 每种语言有自己的翻译结果
        threads = []
        chunk_size = len(chinese_texts) // 5 or 1  # 假设5个线程，按块划分
        for i in range(0, len(chinese_texts), chunk_size):
            chunk = chinese_texts[i:i + chunk_size]
            thread = threading.Thread(target=translate_worker, args=(chunk, translations, lang))
            threads.append(thread)
            thread.start()
        # 等待所有线程完成
        for thread in threads:
            thread.join()
        # 从后往前替换中文文本
        new_lines = lines[:]
        for line_number, chinese_text, translated_text in reversed(
                [(ln, ct, translations.get((ln, ct), None)) for ln, ct in chinese_texts if (ln, ct) in translations]):
            if translated_text is not None:  # Check for None before replacing
                new_lines[line_number] = new_lines[line_number].replace(
                    chinese_text, translated_text, 1)
            else:
                print(f"Translation failed for: {chinese_text} at line {line_number}")

        if shrink:  # 允许创建多级目录,将每个语言作为单独的readme.md文件
            output_dir = os.path.dirname(file_path)
            dir_with_lang = os.path.join(output_dir, lang)
            if not os.path.exists(dir_with_lang):
                os.makedirs(dir_with_lang)
            output_path = os.path.join(dir_with_lang, 'README.md')
            with open(output_path, 'w', encoding='utf-8') as f_out:
                f_out.writelines(new_lines)
            print(f"翻译完成，收缩到 [{lang}]目录,写入内容到'{output_path}'")
        else:
            with open(file_path, 'w', encoding='utf-8') as f_out:
                f_out.writelines(new_lines)
            print(f"翻译完成，已将结果覆盖保存到 '{file_path}'")

Improvement rationale:
- Technical benefits: Adds a check for None before replacing the text, preventing potential errors.
- Business value: Improves the robustness of the translation process.
- Risk Assessment: Low risk, this is a simple check to handle a potential error case.

3.2 Key Quality Aspects

System scalability considerations: The use of threading allows for parallel translation of different text chunks, improving scalability. The modular design also makes it easier to add new features or languages in the future.
Performance bottlenecks and optimizations: The main potential bottleneck is the Google Translate API call itself. The caching mechanism helps mitigate this, and the use of threading improves performance. The optimized replace_encoded_with_utf8 function further enhances performance.
Testing strategy and coverage: The PR description mentions local testing, but no specific test cases or coverage metrics are provided. Unit tests for the new modules and functions would be highly beneficial.
Documentation needs: The code is relatively well-commented, but additional documentation explaining the overall architecture and the purpose of each module could be helpful.

4. Overall Evaluation

Technical assessment: The PR represents a significant improvement in terms of code organization, maintainability, and potential performance. The modular design and use of threading are positive aspects. However, critical issues related to cache correctness, thread safety, and error handling must be addressed.
Business impact: The refactoring reduces maintenance costs, improves the speed of adding new languages, and potentially reduces translation error rates.
Risk evaluation: Medium; while the changes improve the system, the identified critical issues pose a risk to the correctness of the translation process.
Notable positive aspects and good practices: Modular design, use of threading, caching mechanism, separation of concerns.
Implementation quality: Generally good, but needs improvements in error handling, thread safety, and input validation.
Final recommendation: Request Changes; the P0 issues must be addressed before merging. The P1 issues should also be addressed before merging, or very shortly after. The P2 suggestions are recommended for future improvements.

💡 LlamaPReview Community
Have feedback on this AI Code review tool? Join our GitHub Discussions to share your thoughts and help shape the future of LlamaPReview.

cloudflare-workers-and-pages · 2025-03-23T03:58:35Z

Deploying qinwuyuan with Cloudflare Pages

Latest commit:	`947df86`
Status:	✅ Deploy successful!
Preview URL:	https://8f35ae85.qinwuyuan.pages.dev
Branch Preview URL:	https://google-translate.qinwuyuan.pages.dev

View logs

* 将返回的翻译结果数据串联在一起 * 如果需要翻译的文本中含有禁止翻译的文本，直接用占位符批量替换。 * 翻译完成后，将占位符对应的黑名单字符替换掉占位符。

sonarqubecloud · 2025-03-23T18:33:39Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

feat(翻译功能): ✨ 新增功能助手和谷歌翻译模块

feat(翻译功能): ✨ 新增功能助手和谷歌翻译模块

7b16849

* 拆分原来的谷歌翻译重复代码为模块，方便修改

pull-request-size bot added the size/XL label Mar 23, 2025

labels-and-badges bot added NO JIRA This PR does not have a Jira Ticket PR:size/XL Denotes a Pull Request that changes 500-999 lines. release This PR is a release labels Mar 23, 2025

ChinaGodBot assigned ChinaGodMan Mar 23, 2025

sourcery-ai bot reviewed Mar 23, 2025

View reviewed changes

llamapreview bot reviewed Mar 23, 2025

View reviewed changes

pull-request-size bot removed the size/XL label Mar 23, 2025

labels-and-badges bot added the PR:size/XXL Denotes a Pull Request that changes 1000+ lines. label Mar 23, 2025

pull-request-size bot added the size/XXL label Mar 23, 2025

labels-and-badges bot removed the PR:size/XL Denotes a Pull Request that changes 500-999 lines. label Mar 23, 2025

ChinaGodMan force-pushed the google_translate branch from e9be4df to aa0dff8 Compare March 23, 2025 04:03

pull-request-size bot added size/XL and removed size/XXL labels Mar 23, 2025

labels-and-badges bot added PR:size/XL Denotes a Pull Request that changes 500-999 lines. and removed PR:size/XXL Denotes a Pull Request that changes 1000+ lines. labels Mar 23, 2025

feat(区域化翻译): ✨ 从模块导入翻译接口，并添加123个语言代码

947df86

ChinaGodMan merged commit 2e9435e into main Mar 23, 2025
16 of 17 checks passed

ChinaGodBot deleted the google_translate branch March 23, 2025 19:04

ChinaGodMan added a commit that referenced this pull request Mar 26, 2025

Merge pull request #104 from ChinaGodMan/google_translate

07b6546

feat(翻译功能): ✨ 新增功能助手和谷歌翻译模块

ChinaGodMan added a commit that referenced this pull request Mar 26, 2025

Merge pull request #104 from ChinaGodMan/google_translate

473dbee

feat(翻译功能): ✨ 新增功能助手和谷歌翻译模块



		# 翻译并保存结果，覆盖原文件
		def translate_and_save(lines, chinese_texts, lang, shrink, file_path):

	[(ln, ct, translations.get((ln, ct), None)) for ln, ct in chinese_texts if (ln, ct) in translations]):
	[(ln, ct, translations.get((ln, ct))) for ln, ct in chinese_texts if (ln, ct) in translations]):

	full_url = api_url + '?' + urlencode(params)
	full_url = f'{api_url}?{urlencode(params)}'

		translated_text = translate_text(chinese_text, lang)
		if translated_text:

	translated_text = translate_text(chinese_text, lang)
	if translated_text:
	if translated_text := translate_text(chinese_text, lang):

	lines = [line for line in virtual_file]
	lines = list(virtual_file)

Uh oh!

feat(翻译功能): ✨ 新增功能助手和谷歌翻译模块 #104

feat(翻译功能): ✨ 新增功能助手和谷歌翻译模块 #104

Uh oh!

Conversation

ChinaGodMan commented Mar 23, 2025 • edited by pull-request-badge bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

变更内容

变更类型

测试情况

Sourcery 总结

Summary by Sourcery

Uh oh!

changeset-bot bot commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

pr-code-reviewer bot commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👋 Hi there!

Uh oh!

sourcery-ai bot commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

翻译模块的更新后的类图

文件级别更改

与 Sourcery 互动

自定义您的体验

获得帮助

Reviewer's Guide by Sourcery

Sequence diagram for translate_readme function

Updated class diagram for translation modules

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

ChinaGodBot commented Mar 23, 2025

Uh oh!

instapr bot commented Mar 23, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 23, 2025

Choose a reason for hiding this comment

Before

After

Uh oh!

sourcery-ai bot Mar 23, 2025

Choose a reason for hiding this comment

Uh oh!

llamapreview bot left a comment

Choose a reason for hiding this comment

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 Core Changes

1.2 Technical Architecture

2. Critical Findings

2.1 Must Fix (P0🔴)

2.2 Should Fix (P1🟡)

2.3 Consider (P2🟢)

2.4 Summary of Action Items

3. Technical Analysis

3.1 Code Logic Analysis

3.2 Key Quality Aspects

ChinaGodMan commented Mar 23, 2025 •

edited by pull-request-badge bot

Loading

changeset-bot bot commented Mar 23, 2025 •

edited

Loading

pr-code-reviewer bot commented Mar 23, 2025 •

edited

Loading

sourcery-ai bot commented Mar 23, 2025 •

edited

Loading

cloudflare-workers-and-pages bot commented Mar 23, 2025 •

edited

Loading