Skip to content

fix: 清洗远程文件消息段文件名#8318

Open
liuwanwan1 wants to merge 3 commits into
AstrBotDevs:masterfrom
liuwanwan1:master
Open

fix: 清洗远程文件消息段文件名#8318
liuwanwan1 wants to merge 3 commits into
AstrBotDevs:masterfrom
liuwanwan1:master

Conversation

@liuwanwan1
Copy link
Copy Markdown

@liuwanwan1 liuwanwan1 commented May 24, 2026

变更内容

  • File 消息段远程下载增加文件名清洗,仅保留 basename 并移除空字节
  • 下载前主动创建 AstrBot 临时目录,避免目录不存在导致写入失败
  • 补充单元测试,覆盖包含路径分隔符和空字节的远程文件名

关联 Issue

Closes #8317

验证

  • UV_CACHE_DIR=/tmp/uv-cache UV_PYTHON_INSTALL_DIR=/tmp/uv-python uv run pytest tests/unit/test_file_message_component.py
  • UV_CACHE_DIR=/tmp/uv-cache UV_PYTHON_INSTALL_DIR=/tmp/uv-python uv run ruff format .
  • UV_CACHE_DIR=/tmp/uv-cache UV_PYTHON_INSTALL_DIR=/tmp/uv-python uv run ruff check .

Summary by Sourcery

Sanitize remote file names in File message downloads and ensure temporary directories exist before saving, with tests verifying the new behavior.

Bug Fixes:

  • Prevent unsafe or malformed remote file names from being used directly when downloading File message components by normalizing and sanitizing the basename.
  • Avoid download failures caused by missing AstrBot temporary directories by creating the directory tree before writing files.

Tests:

  • Add async unit test covering File component downloads with remote names containing path separators and null bytes to verify sanitization and temp directory handling.

@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend labels May 24, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a filename sanitization function for file components and refactors the file download logic to utilize pathlib for better path management. It also includes a new unit test to verify that remote filenames are correctly sanitized. The reviewer suggested extending the sanitization logic to handle characters that are invalid on Windows filesystems (such as ':', '*', and '?') to ensure cross-platform compatibility.

Comment on lines +666 to +674
def _sanitize_file_component_name(name: str | None) -> str:
if not name:
return "file"

normalized = str(name).replace("\\", "/")
basename = PurePosixPath(normalized).name.replace("\x00", "").strip()
if basename in {"", ".", ".."}:
return "file"
return basename
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _sanitize_file_component_name function correctly handles path traversal by extracting the basename and removing null bytes. However, it does not account for other characters that are invalid in filenames on certain operating systems, particularly Windows (e.g., :, *, ?, ", <, >, |). If a remote filename contains these characters, the subsequent download_file call will fail with an OSError on Windows. Consider replacing these characters with an underscore or removing them to ensure cross-platform compatibility. Additionally, as this is new functionality for handling file components, please ensure it is accompanied by unit tests.

Suggested change
def _sanitize_file_component_name(name: str | None) -> str:
if not name:
return "file"
normalized = str(name).replace("\\", "/")
basename = PurePosixPath(normalized).name.replace("\x00", "").strip()
if basename in {"", ".", ".."}:
return "file"
return basename
def _sanitize_file_component_name(name: str | None) -> str:
if not name:
return "file"
normalized = str(name).replace("\\", "/")
basename = PurePosixPath(normalized).name.replace("\\x00", "").strip()
# Remove or replace characters that are invalid on Windows filesystems
for char in ':*?"<>|':
basename = basename.replace(char, "_")
if basename in {"", ".", ".."}:
return "file"
return basename
References
  1. New functionality, such as handling attachments, should be accompanied by corresponding unit tests.

@liuwanwan1
Copy link
Copy Markdown
Author

已根据 Gemini Code Assist 的建议补充处理:

  • 将 Windows 文件系统非法字符 :*?"<>| 替换为 _
  • 扩展单元测试覆盖这些字符和空字节/路径分隔符组合

本地验证:

  • UV_CACHE_DIR=/tmp/uv-cache UV_PYTHON_INSTALL_DIR=/tmp/uv-python uv run pytest tests/unit/test_file_message_component.py
  • UV_CACHE_DIR=/tmp/uv-cache UV_PYTHON_INSTALL_DIR=/tmp/uv-python uv run ruff format .
  • UV_CACHE_DIR=/tmp/uv-cache UV_PYTHON_INSTALL_DIR=/tmp/uv-python uv run ruff check .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

File 消息段下载远程文件时未清洗文件名导致路径异常

1 participant