[Feature] implement log channel separation and request log level system#7190
[Feature] implement log channel separation and request log level system#7190Jiang-Jia-Jun merged 3 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #7190 +/- ##
==========================================
Coverage ? 73.75%
==========================================
Files ? 397
Lines ? 54828
Branches ? 8587
==========================================
Hits ? 40439
Misses ? 11682
Partials ? 2707
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
该 PR 在 FastDeploy 现有日志体系上引入“日志通道化(main/request/console)”与“请求日志 L0-L3 分级”的统一机制,并将多处请求生命周期/调度/输出相关日志迁移到 request 通道,同时补充相应单测与文档说明。
Changes:
- 新增
fastdeploy.logger.request_logger(L0-L3 请求日志)与fastdeploy.logger.config(日志级别/请求日志默认配置解析)。 setup_logging支持 main/request/console 三通道 logger 配置,并在大量业务模块中用log_request/log_request_error替换原散落 logger 调用。- 更新与新增日志相关单测,并同步更新中英文日志文档。
Reviewed changes
Copilot reviewed 48 out of 48 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/scheduler/test_local_scheduler.py | 将断言从 scheduler_logger 迁移为验证 log_request 调用 |
| tests/scheduler/test_dp_scheduler.py | 通过 mock log_request 适配 DP scheduler 的请求日志断言 |
| tests/output/test_process_batch_output.py | 用 log_request 替换对 llm_logger 的断言,覆盖 preemption 场景 |
| tests/output/test_process_batch_output_use_zmq.py | 用 log_request 替换对 llm_logger 的断言,覆盖 abort recycle 场景 |
| tests/logger/test_setup_logging.py | 更新默认 handler 断言并新增通道 logger/handler 配置测试 |
| tests/logger/test_request_logger.py | 新增 request_logger 的 _should_log/_truncate/log_request 测试 |
| tests/logger/test_logging_config.py | 新增 resolve_log_level 与请求日志默认值解析测试 |
| tests/logger/test_logger.py | 更新 unified logger 命名规则(默认走 main 通道)相关测试 |
| tests/input/test_video_utils.py | 调整 assertLogs 指向 main 通道 logger 名 |
| tests/entrypoints/test_llm.py | 将异常日志断言改为验证 log_request_error 调用 |
| tests/entrypoints/test_engine_client.py | 将参数告警断言改为验证 log_request 调用;修正 getenv mock 签名 |
| tests/entrypoints/test_abort.py | 将 abort 日志断言改为验证 log_request 调用 |
| tests/entrypoints/openai/v1/test_serving_completion_v1.py | 将异常日志断言改为验证 log_request_error 调用 |
| fastdeploy/utils.py | get_logger 增加 channel;全局 logger 默认归入 main/console/comm 等通道 |
| fastdeploy/scheduler/splitwise_scheduler.py | 将多处错误日志改为 log_request_error |
| fastdeploy/scheduler/local_scheduler.py | 将请求入队/出队/完成等日志迁移到 log_request |
| fastdeploy/scheduler/global_scheduler.py | 增加 log_request/log_request_error 并调整部分请求相关日志 |
| fastdeploy/scheduler/dp_scheduler.py | 将请求/结果相关日志迁移到 log_request |
| fastdeploy/output/token_processor.py | 将大量请求生命周期/异常日志迁移到 log_request/log_request_error |
| fastdeploy/logger/setup_logging.py | 默认 dictConfig 改为三通道配置并使用 LazyFileHandler |
| fastdeploy/logger/request_logger.py | 新增请求日志模块(L0-L3、截断、error 日志接口) |
| fastdeploy/logger/logger.py | 支持按 channel 获取 logger,并调整 unified logger 默认落 main 通道 |
| fastdeploy/logger/config.py | 新增日志级别解析与请求日志默认配置解析 |
| fastdeploy/input/tokenizer_client.py | 将轮询/解码异常日志迁移到 request 通道日志接口 |
| fastdeploy/input/qwen3_vl_processor/qwen3_vl_processor.py | 将“Processed request”日志迁移到 request 通道 |
| fastdeploy/input/qwen_vl_processor/qwen_vl_processor.py | 将“Processed request”日志迁移到 request 通道 |
| fastdeploy/input/ernie4_5_vl_processor/ernie4_5_vl_processor.py | 将“Processed request”日志迁移到 request 通道 |
| fastdeploy/input/base_processor.py | 将 token_ids/请求处理过程等日志迁移到 request 通道并分级 |
| fastdeploy/entrypoints/openai/v1/serving_completion.py | 将异常与流式关键点日志迁移到 request 通道 |
| fastdeploy/entrypoints/openai/v1/serving_chat.py | 将异常与流式关键点日志迁移到 request 通道 |
| fastdeploy/entrypoints/openai/tool_parsers/ernie_x1_tool_parser.py | 将错误日志迁移到 log_request_error |
| fastdeploy/entrypoints/openai/tool_parsers/ernie_45_vl_thinking_tool_parser.py | 将错误日志迁移到 log_request_error |
| fastdeploy/entrypoints/openai/serving_reward.py | 将关键输出日志迁移到 log_request |
| fastdeploy/entrypoints/openai/serving_models.py | 将错误日志迁移到 log_request_error |
| fastdeploy/entrypoints/openai/serving_engine.py | 将 semaphore/初始化/错误日志迁移到 request 通道接口 |
| fastdeploy/entrypoints/openai/serving_embedding.py | 将关键输出日志迁移到 log_request |
| fastdeploy/entrypoints/openai/serving_completion.py | 将异常与流式关键点日志迁移到 request 通道接口 |
| fastdeploy/entrypoints/openai/serving_chat.py | 将异常与流式关键点日志迁移到 request 通道接口 |
| fastdeploy/entrypoints/openai/protocol.py | 将 metadata obsolete 警告日志迁移到 log_request |
| fastdeploy/entrypoints/openai/api_server.py | 将接收请求/异常日志迁移到 request 通道接口 |
| fastdeploy/entrypoints/llm.py | 将 _receive_output 异常日志迁移到 log_request_error |
| fastdeploy/entrypoints/engine_client.py | 将校验/接收/abort/异常日志迁移到 request 通道接口 |
| fastdeploy/entrypoints/api_server.py | 将接收请求与异常日志迁移到 request 通道接口 |
| fastdeploy/engine/request.py | 将 from_dict/mm_positions 异常日志迁移到 request 通道接口 |
| fastdeploy/engine/engine.py | 将 add_requests/generate 相关日志迁移到 request 通道接口 |
| fastdeploy/engine/async_llm.py | 将 abort/generate 异常日志迁移到 log_request_error |
| docs/zh/usage/log.md | 补充日志通道与请求日志级别、环境变量说明(中文) |
| docs/usage/log.md | 补充日志通道与请求日志级别、环境变量说明(英文) |
| from fastdeploy.engine.expert_service import start_data_parallel_service | ||
| from fastdeploy.engine.request import Request | ||
| from fastdeploy.inter_communicator import EngineWorkerQueue, IPCSignal | ||
| from fastdeploy.logger.request_logger import log_request, log_request_error |
There was a problem hiding this comment.
error分开打印,这个是一种通用的做法吗?是会形成一个类似request_error.log?
There was a problem hiding this comment.
这个分开是因为, log_request 会分 4 个级别,然后根据当前设置的级别过滤日志。 log_request_error 是不分级别。两个都会在 request.log 里打印,然后 request_error 会同时映射到 终端 和 error.log
| llm_logger.info(f"Cache task with request_id ({request.get('request_id')})") | ||
| llm_logger.debug(f"cache task: {request}") | ||
| log_request( | ||
| level=1, |
There was a problem hiding this comment.
这里的level=1~n,文档提到对应不同程度,是否可以改为枚举变量,代码中更清楚知道含义
There was a problem hiding this comment.
好的,预计改成 4 个 枚举变量,把名称和对应的程度对齐,方便清楚含义
|
|
||
| if not function_call_arr: | ||
| data_processor_logger.error("No valid tool calls found") | ||
| log_request_error(message="No valid tool calls found") |
There was a problem hiding this comment.
这种打印,是针对请求的话,建议加入request_id
|
|
||
| except Exception as e: | ||
| data_processor_logger.error(f"Error in extracting tool call from response: {str(e)}") | ||
| log_request_error(message="Error in extracting tool call from response: {error}", error=str(e)) |
| api_server_logger.info(f"Chat Streaming response last send: {chunk.model_dump_json()}") | ||
| log_request( | ||
| level=0, | ||
| message="Chat Streaming response last send: request_id={request_id}, finish_reason={finish_reason}, completion_tokens={completion_tokens}", |
There was a problem hiding this comment.
如果有logprob,logprob也增加打印,训练中依赖根据这个值判断是否出NaN
| """ | ||
| return { | ||
| "enabled": int(os.getenv("FD_LOG_REQUESTS", "1")), | ||
| "level": int(os.getenv("FD_LOG_REQUESTS_LEVEL", "0")), |
There was a problem hiding this comment.
level=默认值是0,是当前的输出日志没变吗?
There was a problem hiding this comment.
level=2是对齐当前 FastDeploy 的 info 级别的请求相关的输出日志
| scheduler_logger = get_logger("scheduler", "scheduler.log") | ||
| api_server_logger = get_logger("api_server", "api_server.log") | ||
| console_logger = get_logger("console", "console.log", print_to_console=True) | ||
| llm_logger = get_logger("fastdeploy", channel="main") |
There was a problem hiding this comment.
日志的打印目前有多处实现,在这一版升级中,可全部统一,不用在logger下存在,又在utils下有一份类似的了
|
对于error类的打印,都Review下是否针对请求级别的,如果是,增加request_id便于追溯。 |
|
针对 review 意见的修改和说明: 1、log_request 分 4 个级别,根据当前设置的级别过滤日志。 log_request_error 不分级别,打印错误日志。两者都会在 request.log 里打印, log_request_error 还会同时映射到 终端 和 error.log。 2、log_request 的 4 个级别,分别对应有枚举变量:
3、对于error类的打印,针对请求级别的,增加request_id便于追溯。 4、logprob也增加打印 5、统一将日志相关的实现,整合到 fastdeploy/logger,在 utils 做了兼容处理 6、FD_LOG_REQUESTS_LEVEL 默认值设为了 2,该级别对应改造前的 FastDeploy 各相关模块的 info 级别日志 |
…ger implementation from utils to logger module
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-16
📋 Review 摘要
PR 概述:实现日志通道划分和请求日志级别系统,将日志分为 main/request/console 三个通道,支持 L0-L3 四级请求日志详细度。
变更范围:fastdeploy/logger/(新增模块)、entrypoints/、engine/、scheduler/、docs/(文档更新)
影响面 Tag:[Feature] [APIServer] [Engine] [Scheduler] [Docs]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | docs/usage/environment_variables.md:28 |
FD_LOG_REQUESTS_LEVEL 默认值变更属于 Breaking Change,需要明确说明 |
总体评价
日志系统重构整体设计合理,代码结构清晰,测试覆盖较为完善。实现了日志通道划分和分级控制功能,核心实现质量良好。但默认值变更属于 Breaking Change,建议在文档中增加醒目的迁移说明。
|
|
||
| # Request logging detail level (0-3). Higher level means more verbose output. | ||
| "FD_LOG_REQUESTS_LEVEL": lambda: int(os.getenv("FD_LOG_REQUESTS_LEVEL", "0")), | ||
| "FD_LOG_REQUESTS_LEVEL": lambda: int(os.getenv("FD_LOG_REQUESTS_LEVEL", "2")), |
There was a problem hiding this comment.
🟡 建议 FD_LOG_REQUESTS_LEVEL 默认值从 0 改为 2,这是一个 Breaking Change。
影响说明:
- 原默认值 0 (LIFECYCLE): 只记录请求创建/完成/中止等关键生命周期事件
- 新默认值 2 (CONTENT): 记录请求参数、调度信息、响应内容
潜在影响:
- 现有部署如果不设置
FD_LOG_REQUESTS_LEVEL环境变量,会产生更多的日志输出 - 日志文件大小可能显著增加,可能影响磁盘 I/O 和存储成本
建议:
- 在
log.md文档中增加「重要提示」或「Breaking Change」标记,提醒用户注意日志行为的变化 - 考虑在 PR 描述中增加 Breaking Change 说明
- 或者设置一个迁移期,在下一个版本再调整默认值
Motivation
针对 FastDeploy 的日志系统进行优化,预计分 4 个 pr 完成。
Modifications
Usage or Command
日志通道划分
FastDeploy 将日志分为三个通道:
fastdeploy.main.*fastdeploy.logfastdeploy.request.*request.logfastdeploy.console.*console.log请求日志级别
请求日志 (
request.log) 支持 4 个级别,通过环境变量FD_LOG_REQUESTS_LEVEL控制:默认级别为 L2 (CONTENT),记录请求参数、调度信息和响应内容。如需更精简的日志,可设置 FD_LOG_REQUESTS_LEVEL=0 只记录关键生命周期事件。
Accuracy Tests
1
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.