Conversation
- 重构优化: 移除不稳定的内容读取判断,改用静态规则 * 删除 `BINARY_EXTENSIONS` 常量及 `_DETECTION_CHUNK_SIZE` 变量 * 新增 `_TEXT_EXTENSIONS` 常量,明确包含 `.vue`, `.svelte`, `.bat`, `.svg` 等文本格式 * 调整 `_BINARY_EXTENSIONS` 集合,移除 `.bat`, `.svg`, `.env` 以避免误判 * 移除 `is_binary_file` 函数中的 `encoding` 参数 * 删除文件头读取逻辑 (`open(p, "rb")`, `chunk.decode`) 及 `UnicodeDecodeError` 异常捕获 * 引入 `mimetypes.guess_type()` 作为兜底策略,识别 `text/` 前缀及 `application/json/xml` 等文本 MIME - 修复问题: 解决特定文本扩展名被误判为二进制的缺陷 * 修正 `.bat` (Windows批处理) 和 `.svg` (矢量图形) 从二进制列表移至文本列表 * 修正 `.env` 配置文件不再被视为二进制文件 * 确保空文件和不存在文件的处理逻辑符合新规则
- 重构优化: 将二进制检测策略从“内容读取”迁移至“扩展名+MIME类型” * 删除 `is_binary_file` 函数中的 `encoding` 参数及文件头读取逻辑 (`open`, `chunk.decode`) * 移除 `BINARY_EXTENSIONS` 常量,新增 `_TEXT_EXTENSIONS` 和 `_BINARY_EXTENSIONS` 集合以明确分类 * 引入 `mimetypes.guess_type()` 作为未知扩展名的兜底判断机制 * 修正 `.bat`, `.svg`, `.env` 等文件的分类,将其从二进制列表移至文本列表 - 破坏性变更: 调用 `is_binary_file` 的接口发生签名变更 * 所有调用方必须移除传入的 `encoding` 参数 (如 `src/workspace/tools/read_tool.py`, `read_lines_tool.py`) * 旧版基于内容的误判逻辑被移除,不再支持通过编码参数强制识别非 UTF-8 文本 - 文档更新: 同步更新单元测试以适配新逻辑 * 重写 `tests/utils/test_binary_detector.py`,删除所有基于文件内容 (`null bytes`, `invalid utf8`) 的检测用例 * 在 `tests/workspace/tools/test_binary_protection.py` 中移除依赖内容检测的二进制文件模拟 fixture 及相关断言 * 在 `tests/core/test_audit_committer.py` 中移除针对二进制内容拦截的测试用例
- 重构优化: 精简 `is_binary` 函数中的条件分支 * 移除对 "text/" 前缀的显式检查,依赖后续逻辑或默认值 * 将 `application/json`, `application/xml`, `application/javascript`, `application/x-yaml` 的判断合并为单一表达式 `mime_type not in (...)` * 消除冗余注释,提升代码可读性
- 修复问题: 清理冗余的边界情况测试逻辑 * 删除 `test_unknown_extension_text_content` 方法,该测试验证未知扩展名但内容为纯文本的场景 * 删除 `test_unknown_extension_xml_mime` 方法,该测试验证未知扩展名但 MIME 类型为 XML 的场景 * 保留 `test_unknown_extension_binary_content` 方法以继续覆盖二进制内容检测
Owner
Author
|
Completely removed the unstable logic for detecting file types based on content, removed related tests, and removed some edge-case tests. / 完全去掉了不稳定的根据内容判断文件类型的逻辑,去掉了相关测试,去掉了部分边界测试 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.