Skip to content

[Feature] implement log channel separation and request log level system#7190

Merged
Jiang-Jia-Jun merged 3 commits intoPaddlePaddle:developfrom
xyxinyang:dev-log-v2
Apr 16, 2026
Merged

[Feature] implement log channel separation and request log level system#7190
Jiang-Jia-Jun merged 3 commits intoPaddlePaddle:developfrom
xyxinyang:dev-log-v2

Conversation

@xyxinyang
Copy link
Copy Markdown
Collaborator

@xyxinyang xyxinyang commented Apr 3, 2026

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

针对 FastDeploy 的日志系统进行优化,预计分 4 个 pr 完成。

pr 内容 状态
1 新增日志相关参数、错误同时输出到终端 已合入
2 日志通道划分、request.log 级别划分和聚合 当前 pr
3 worker_process.log、cache_manager.log、comm.log、paddle日志收敛和简化 待提交
4 trace.log 日志的规范化和整合 待提交

Modifications

  • 日志文件精简:十几个散落文件 → 几个核心文件(fastdeploy.log、request.log、error.log、comm.log)
  • 请求日志聚合:请求全链路日志统一写入 request.log,不再散落在 api_server.log、data_processor.log、scheduler.log 等多个文件。FD_LOG_REQUESTS 控制是否产生该日志文件。
  • 请求日志分级:新增 FD_LOG_REQUESTS_LEVEL 环境变量,支持 4 级请求日志(0-LIFECYCLE / 1-STAGES / 2-CONTENT / 3-FULL),可按需调整详细程度。
  • 错误日志增强:请求相关错误带 request_id,便于问题追踪。
  • Logger 代码统一:日志实现统一到 fastdeploy/logger/ 模块,保持向后兼容。

Usage or Command

日志通道划分

FastDeploy 将日志分为三个通道:

通道 Logger 名称 输出文件 说明
main fastdeploy.main.* fastdeploy.log 主日志,记录系统配置、启动信息等
request fastdeploy.request.* request.log 请求日志,记录请求生命周期和处理细节
console fastdeploy.console.* console.log 控制台日志,输出到终端和 console.log

请求日志级别

请求日志 (request.log) 支持 4 个级别,通过环境变量 FD_LOG_REQUESTS_LEVEL 控制:

级别 说明 示例内容
L0 关键生命周期事件 请求创建/初始化、完成统计(InputToken/OutputToken/耗时)、流式响应首次和最后发送、请求中止
L1 处理阶段细节 信号量获取/释放、首 token 时间记录、信号处理(preemption/abortion/recovery)、缓存任务、预处理耗时、参数调整警告
L2 请求/响应内容和调度 调度信息(入队/拉取/完成)、请求和响应内容(超长内容会被截断)
L3 完整数据(原先debug级别) 完整的请求和响应数据

默认级别为 L2 (CONTENT),记录请求参数、调度信息和响应内容。如需更精简的日志,可设置 FD_LOG_REQUESTS_LEVEL=0 只记录关键生命周期事件。

Accuracy Tests

1

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 3, 2026

Thanks for your contribution!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 3, 2026

Codecov Report

❌ Patch coverage is 87.50000% with 47 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@dec0b06). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/entrypoints/engine_client.py 70.96% 9 Missing ⚠️
fastdeploy/entrypoints/openai/serving_engine.py 50.00% 6 Missing ⚠️
fastdeploy/entrypoints/api_server.py 0.00% 4 Missing ⚠️
...deploy/entrypoints/openai/v1/serving_completion.py 55.55% 4 Missing ⚠️
fastdeploy/utils.py 80.00% 4 Missing ⚠️
...i/tool_parsers/ernie_45_vl_thinking_tool_parser.py 40.00% 3 Missing ⚠️
...astdeploy/entrypoints/openai/serving_completion.py 91.30% 2 Missing ⚠️
fastdeploy/input/tokenizer_client.py 50.00% 2 Missing ⚠️
fastdeploy/logger/__init__.py 90.90% 1 Missing and 1 partial ⚠️
fastdeploy/output/token_processor.py 90.00% 2 Missing ⚠️
... and 8 more
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7190   +/-   ##
==========================================
  Coverage           ?   73.75%           
==========================================
  Files              ?      397           
  Lines              ?    54828           
  Branches           ?     8587           
==========================================
  Hits               ?    40439           
  Misses             ?    11682           
  Partials           ?     2707           
Flag Coverage Δ
GPU 73.75% <87.50%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 在 FastDeploy 现有日志体系上引入“日志通道化(main/request/console)”与“请求日志 L0-L3 分级”的统一机制,并将多处请求生命周期/调度/输出相关日志迁移到 request 通道,同时补充相应单测与文档说明。

Changes:

  • 新增 fastdeploy.logger.request_logger(L0-L3 请求日志)与 fastdeploy.logger.config(日志级别/请求日志默认配置解析)。
  • setup_logging 支持 main/request/console 三通道 logger 配置,并在大量业务模块中用 log_request/log_request_error 替换原散落 logger 调用。
  • 更新与新增日志相关单测,并同步更新中英文日志文档。

Reviewed changes

Copilot reviewed 48 out of 48 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/scheduler/test_local_scheduler.py 将断言从 scheduler_logger 迁移为验证 log_request 调用
tests/scheduler/test_dp_scheduler.py 通过 mock log_request 适配 DP scheduler 的请求日志断言
tests/output/test_process_batch_output.py log_request 替换对 llm_logger 的断言,覆盖 preemption 场景
tests/output/test_process_batch_output_use_zmq.py log_request 替换对 llm_logger 的断言,覆盖 abort recycle 场景
tests/logger/test_setup_logging.py 更新默认 handler 断言并新增通道 logger/handler 配置测试
tests/logger/test_request_logger.py 新增 request_logger 的 _should_log/_truncate/log_request 测试
tests/logger/test_logging_config.py 新增 resolve_log_level 与请求日志默认值解析测试
tests/logger/test_logger.py 更新 unified logger 命名规则(默认走 main 通道)相关测试
tests/input/test_video_utils.py 调整 assertLogs 指向 main 通道 logger 名
tests/entrypoints/test_llm.py 将异常日志断言改为验证 log_request_error 调用
tests/entrypoints/test_engine_client.py 将参数告警断言改为验证 log_request 调用;修正 getenv mock 签名
tests/entrypoints/test_abort.py 将 abort 日志断言改为验证 log_request 调用
tests/entrypoints/openai/v1/test_serving_completion_v1.py 将异常日志断言改为验证 log_request_error 调用
fastdeploy/utils.py get_logger 增加 channel;全局 logger 默认归入 main/console/comm 等通道
fastdeploy/scheduler/splitwise_scheduler.py 将多处错误日志改为 log_request_error
fastdeploy/scheduler/local_scheduler.py 将请求入队/出队/完成等日志迁移到 log_request
fastdeploy/scheduler/global_scheduler.py 增加 log_request/log_request_error 并调整部分请求相关日志
fastdeploy/scheduler/dp_scheduler.py 将请求/结果相关日志迁移到 log_request
fastdeploy/output/token_processor.py 将大量请求生命周期/异常日志迁移到 log_request/log_request_error
fastdeploy/logger/setup_logging.py 默认 dictConfig 改为三通道配置并使用 LazyFileHandler
fastdeploy/logger/request_logger.py 新增请求日志模块(L0-L3、截断、error 日志接口)
fastdeploy/logger/logger.py 支持按 channel 获取 logger,并调整 unified logger 默认落 main 通道
fastdeploy/logger/config.py 新增日志级别解析与请求日志默认配置解析
fastdeploy/input/tokenizer_client.py 将轮询/解码异常日志迁移到 request 通道日志接口
fastdeploy/input/qwen3_vl_processor/qwen3_vl_processor.py 将“Processed request”日志迁移到 request 通道
fastdeploy/input/qwen_vl_processor/qwen_vl_processor.py 将“Processed request”日志迁移到 request 通道
fastdeploy/input/ernie4_5_vl_processor/ernie4_5_vl_processor.py 将“Processed request”日志迁移到 request 通道
fastdeploy/input/base_processor.py 将 token_ids/请求处理过程等日志迁移到 request 通道并分级
fastdeploy/entrypoints/openai/v1/serving_completion.py 将异常与流式关键点日志迁移到 request 通道
fastdeploy/entrypoints/openai/v1/serving_chat.py 将异常与流式关键点日志迁移到 request 通道
fastdeploy/entrypoints/openai/tool_parsers/ernie_x1_tool_parser.py 将错误日志迁移到 log_request_error
fastdeploy/entrypoints/openai/tool_parsers/ernie_45_vl_thinking_tool_parser.py 将错误日志迁移到 log_request_error
fastdeploy/entrypoints/openai/serving_reward.py 将关键输出日志迁移到 log_request
fastdeploy/entrypoints/openai/serving_models.py 将错误日志迁移到 log_request_error
fastdeploy/entrypoints/openai/serving_engine.py 将 semaphore/初始化/错误日志迁移到 request 通道接口
fastdeploy/entrypoints/openai/serving_embedding.py 将关键输出日志迁移到 log_request
fastdeploy/entrypoints/openai/serving_completion.py 将异常与流式关键点日志迁移到 request 通道接口
fastdeploy/entrypoints/openai/serving_chat.py 将异常与流式关键点日志迁移到 request 通道接口
fastdeploy/entrypoints/openai/protocol.py 将 metadata obsolete 警告日志迁移到 log_request
fastdeploy/entrypoints/openai/api_server.py 将接收请求/异常日志迁移到 request 通道接口
fastdeploy/entrypoints/llm.py 将 _receive_output 异常日志迁移到 log_request_error
fastdeploy/entrypoints/engine_client.py 将校验/接收/abort/异常日志迁移到 request 通道接口
fastdeploy/entrypoints/api_server.py 将接收请求与异常日志迁移到 request 通道接口
fastdeploy/engine/request.py 将 from_dict/mm_positions 异常日志迁移到 request 通道接口
fastdeploy/engine/engine.py 将 add_requests/generate 相关日志迁移到 request 通道接口
fastdeploy/engine/async_llm.py 将 abort/generate 异常日志迁移到 log_request_error
docs/zh/usage/log.md 补充日志通道与请求日志级别、环境变量说明(中文)
docs/usage/log.md 补充日志通道与请求日志级别、环境变量说明(英文)

Comment thread fastdeploy/logger/request_logger.py Outdated
Comment thread fastdeploy/logger/setup_logging.py Outdated
Comment thread fastdeploy/scheduler/global_scheduler.py Outdated
Comment thread fastdeploy/logger/request_logger.py
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

Comment thread fastdeploy/engine/engine.py Outdated
from fastdeploy.engine.expert_service import start_data_parallel_service
from fastdeploy.engine.request import Request
from fastdeploy.inter_communicator import EngineWorkerQueue, IPCSignal
from fastdeploy.logger.request_logger import log_request, log_request_error
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error分开打印,这个是一种通用的做法吗?是会形成一个类似request_error.log?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个分开是因为, log_request 会分 4 个级别,然后根据当前设置的级别过滤日志。 log_request_error 是不分级别。两个都会在 request.log 里打印,然后 request_error 会同时映射到 终端 和 error.log

Comment thread fastdeploy/engine/engine.py Outdated
llm_logger.info(f"Cache task with request_id ({request.get('request_id')})")
llm_logger.debug(f"cache task: {request}")
log_request(
level=1,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的level=1~n,文档提到对应不同程度,是否可以改为枚举变量,代码中更清楚知道含义

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,预计改成 4 个 枚举变量,把名称和对应的程度对齐,方便清楚含义


if not function_call_arr:
data_processor_logger.error("No valid tool calls found")
log_request_error(message="No valid tool calls found")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种打印,是针对请求的话,建议加入request_id


except Exception as e:
data_processor_logger.error(f"Error in extracting tool call from response: {str(e)}")
log_request_error(message="Error in extracting tool call from response: {error}", error=str(e))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,其它地方同理

api_server_logger.info(f"Chat Streaming response last send: {chunk.model_dump_json()}")
log_request(
level=0,
message="Chat Streaming response last send: request_id={request_id}, finish_reason={finish_reason}, completion_tokens={completion_tokens}",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果有logprob,logprob也增加打印,训练中依赖根据这个值判断是否出NaN

Comment thread fastdeploy/logger/config.py Outdated
"""
return {
"enabled": int(os.getenv("FD_LOG_REQUESTS", "1")),
"level": int(os.getenv("FD_LOG_REQUESTS_LEVEL", "0")),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

level=默认值是0,是当前的输出日志没变吗?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

level=2是对齐当前 FastDeploy 的 info 级别的请求相关的输出日志

Comment thread fastdeploy/utils.py Outdated
scheduler_logger = get_logger("scheduler", "scheduler.log")
api_server_logger = get_logger("api_server", "api_server.log")
console_logger = get_logger("console", "console.log", print_to_console=True)
llm_logger = get_logger("fastdeploy", channel="main")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

日志的打印目前有多处实现,在这一版升级中,可全部统一,不用在logger下存在,又在utils下有一份类似的了

@Jiang-Jia-Jun
Copy link
Copy Markdown
Collaborator

对于error类的打印,都Review下是否针对请求级别的,如果是,增加request_id便于追溯。

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@xyxinyang
Copy link
Copy Markdown
Collaborator Author

xyxinyang commented Apr 13, 2026

针对 review 意见的修改和说明:

1、log_request 分 4 个级别,根据当前设置的级别过滤日志。 log_request_error 不分级别,打印错误日志。两者都会在 request.log 里打印, log_request_error 还会同时映射到 终端 和 error.log。

2、log_request 的 4 个级别,分别对应有枚举变量:

级别 枚举名 说明 示例内容
0 LIFECYCLE 生命周期起止 请求创建/初始化、完成统计(InputToken/OutputToken/耗时)、流式响应首次和最后发送、请求中止
1 STAGES 处理阶段 信号量获取/释放、首 token 时间记录、信号处理(preemption/abortion/recovery)、缓存任务、预处理耗时、参数调整警告
2 CONTENT 内容和调度 请求参数、处理后的请求、调度信息(入队/拉取/完成)、响应内容(超长内容会被截断)
3 FULL 完整数据 完整的请求和响应数据、原始接收请求

3、对于error类的打印,针对请求级别的,增加request_id便于追溯。

4、logprob也增加打印

5、统一将日志相关的实现,整合到 fastdeploy/logger,在 utils 做了兼容处理

6、FD_LOG_REQUESTS_LEVEL 默认值设为了 2,该级别对应改造前的 FastDeploy 各相关模块的 info 级别日志

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

…ger implementation from utils to logger module
PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-16

📋 Review 摘要

PR 概述:实现日志通道划分和请求日志级别系统,将日志分为 main/request/console 三个通道,支持 L0-L3 四级请求日志详细度。

变更范围:fastdeploy/logger/(新增模块)、entrypoints/、engine/、scheduler/、docs/(文档更新)

影响面 Tag[Feature] [APIServer] [Engine] [Scheduler] [Docs]

问题

级别 文件 概述
🟡 建议 docs/usage/environment_variables.md:28 FD_LOG_REQUESTS_LEVEL 默认值变更属于 Breaking Change,需要明确说明

总体评价

日志系统重构整体设计合理,代码结构清晰,测试覆盖较为完善。实现了日志通道划分和分级控制功能,核心实现质量良好。但默认值变更属于 Breaking Change,建议在文档中增加醒目的迁移说明。


# Request logging detail level (0-3). Higher level means more verbose output.
"FD_LOG_REQUESTS_LEVEL": lambda: int(os.getenv("FD_LOG_REQUESTS_LEVEL", "0")),
"FD_LOG_REQUESTS_LEVEL": lambda: int(os.getenv("FD_LOG_REQUESTS_LEVEL", "2")),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 FD_LOG_REQUESTS_LEVEL 默认值从 0 改为 2,这是一个 Breaking Change。

影响说明

  • 原默认值 0 (LIFECYCLE): 只记录请求创建/完成/中止等关键生命周期事件
  • 新默认值 2 (CONTENT): 记录请求参数、调度信息、响应内容

潜在影响

  • 现有部署如果不设置 FD_LOG_REQUESTS_LEVEL 环境变量,会产生更多的日志输出
  • 日志文件大小可能显著增加,可能影响磁盘 I/O 和存储成本

建议

  1. log.md 文档中增加「重要提示」或「Breaking Change」标记,提醒用户注意日志行为的变化
  2. 考虑在 PR 描述中增加 Breaking Change 说明
  3. 或者设置一个迁移期,在下一个版本再调整默认值

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 6e16438 into PaddlePaddle:develop Apr 16, 2026
36 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants