Skip to content

[BugFix] 为decode实例增加一个守护线程去监测预分配blocks超时#7965

Open
CyanScholar wants to merge 2 commits into
PaddlePaddle:developfrom
CyanScholar:develop
Open

[BugFix] 为decode实例增加一个守护线程去监测预分配blocks超时#7965
CyanScholar wants to merge 2 commits into
PaddlePaddle:developfrom
CyanScholar:develop

Conversation

@CyanScholar
Copy link
Copy Markdown
Contributor

…on timeouts.

Motivation

在PD分离架构中,P实例收到请求后会向D实例申请block预分配。若P实例在申请后crash或网络故障,D实例上的block将永久泄露,长期累积后导致429错误(block耗尽)。

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

代码修改(2个文件)

  1. fastdeploy/envs.py — 新增环境变量 FD_DECODE_PREALLOC_BLOCK_TIMEOUT(默认600s,设为0禁用)
  2. fastdeploy/engine/common_engine.py — 新增 _prealloc_timeout_monitor() 守护线程方法,定期检查已预分配但超时未完成的请求,自动回收block

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • [x ] Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings June 2, 2026 03:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a configurable timeout-based cleanup mechanism to reclaim decode-side preallocated blocks when prefill never arrives, preventing long-lived resource leaks.

Changes:

  • Introduces FD_DECODE_PREALLOC_BLOCK_TIMEOUT environment setting for reclaim timeout configuration.
  • Starts a background monitor thread to periodically scan and reclaim timed-out preallocations.
  • Emits warnings/errors and returns a 408-like RequestOutput when reclaiming occurs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
fastdeploy/envs.py Adds a new env var to configure decode prealloc reclamation timeout.
fastdeploy/engine/common_engine.py Adds a background monitor thread to reclaim timed-out preallocated blocks and notify request completion.

Comment on lines +2224 to +2226
check_interval = max(10, min(timeout / 10, 60))
while self.running:
time.sleep(check_interval)
Comment on lines +2241 to +2248
for req_id in timed_out:
with self.resource_manager.lock:
if req_id not in self.resource_manager.requests:
continue
if any(r.request_id == req_id for r in self.resource_manager.running):
continue
self.llm_logger.warning(f"Reclaiming preallocated blocks for {req_id}: timeout {timeout}s")
self.resource_manager.pre_recycle_resource(req_id)
Comment on lines +2256 to +2257
except Exception as e:
self.llm_logger.error(f"Prealloc timeout monitor error: {e}")
Comment thread fastdeploy/envs.py Outdated
Comment on lines +193 to +194
# Timeout (seconds) for D to reclaim preallocated blocks if P never follows through. 0 to disable.
"FD_DECODE_PREALLOC_BLOCK_TIMEOUT": lambda: int(os.getenv("FD_DECODE_PREALLOC_BLOCK_TIMEOUT", "600")),
@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented Jun 2, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-02 17:51:40

CI报告基于以下代码生成(30分钟更新一次):
PR commit: 624a2ca | Merge base: b0e2e01 (branch: develop)


1 Required任务 : 8/10 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
42(0) 42 36 5 0 1 0
任务 错误类型 置信度 日志
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage PR问题:E2E服务无响应,decode引擎可能崩溃 Job
Approval 需要 Approval Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题(置信度: 中)

分析器: ci_analyze_unittest_fastdeploy | 错误类型: PR问题 | 置信度: 中

失败用例:

用例 错误摘要
e2e/test_ernie_03b_pd_router_v1_ipc.py::test_non_chat_usage_non_stream send_request() 返回 None,服务60s内无响应

关键日志:

api_url = 'http://0.0.0.0:8248/v1/completions'

    def test_non_chat_usage_non_stream(api_url):
>       response = send_request(url=api_url, payload=payload).json()
E       AttributeError: 'NoneType' object has no attribute 'json'

tests/e2e/test_ernie_03b_pd_router_v1_ipc.py:413: AttributeError
Duration: 60.0613s (请求超时)
  • 根因摘要: PR在decode引擎核心循环中新增属性赋值,可能导致服务崩溃

本次PR在 common_engine.py_process_allocate_resource_requests() 中新增了 task.is_preallocated = True,在 _process_prefilled_requests() 中新增了 req.is_preallocated = False。若 taskreq 对象定义了 __slots__ 或为不可变对象(如 frozen Pydantic model),则直接赋值会抛出 AttributeError,导致decode核心处理循环崩溃,IPC模式下的E2E测试服务因此无法响应请求,send_request() 超时返回 None

修复建议:

  1. 检查 task_process_allocate_resource_requests 中)和 req_process_prefilled_requests 中)的对象类型定义,确认是否支持动态属性赋值(无 __slots__ 限制)
  2. 若对象不支持动态属性,在对应的请求类(如 RequestTaskSchedulerRequest)中显式声明 is_preallocated: bool = False 字段
  3. 也可使用外部字典替代方案:在 resource_manager 中维护 preallocated_set: set,替代对象属性赋值

关联变更: fastdeploy/engine/common_engine.py_process_allocate_resource_requests() L2099(新增 task.is_preallocated = True)、_process_prefilled_requests() L2201-L2203(新增 req.is_preallocated = False

🟡 Approval — 需要人工审批

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。请通过人工审批。

PaddlePaddle-bot

This comment was marked as outdated.

Comment thread fastdeploy/envs.py Outdated
"FD_HPU_MEASUREMENT_MODE": lambda: os.getenv("FD_HPU_MEASUREMENT_MODE", "0"),
"FD_PREFILL_WAIT_DECODE_RESOURCE_SECONDS": lambda: int(os.getenv("FD_PREFILL_WAIT_DECODE_RESOURCE_SECONDS", "30")),
# Timeout (seconds) for D to reclaim preallocated blocks if P never follows through. 0 to disable.
"FD_DECODE_PREALLOC_BLOCK_TIMEOUT": lambda: int(os.getenv("FD_DECODE_PREALLOC_BLOCK_TIMEOUT", "600")),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个耗时再加大点吧,避免出现p实例出现排队导致超时的情况,可以改成20分钟,1200s

Comment thread fastdeploy/engine/common_engine.py Outdated
try:
now = time.time()
with self.resource_manager.lock:
skip_ids = (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里感觉不太鲁棒,可以给请求加一个is_preallocated标记,在在前面_process_allocate_resource_requests函数中标记为True,_process_prefilled_requests中成功处理后标记为False,这里就可以使用is_preallocated标记

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 2, 2026

Codecov Report

❌ Patch coverage is 16.00000% with 21 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@b0e2e01). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/engine/common_engine.py 16.00% 19 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7965   +/-   ##
==========================================
  Coverage           ?   67.86%           
==========================================
  Files              ?      467           
  Lines              ?    65216           
  Branches           ?    10013           
==========================================
  Hits               ?    44260           
  Misses             ?    18112           
  Partials           ?     2844           
Flag Coverage Δ
GPU 78.15% <16.00%> (?)
XPU 7.08% <4.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-02 13:39:08

📋 Review 摘要

PR 概述:为 Decode 实例新增守护线程监测预分配 block 超时,防止 P 实例 crash 导致 block 永久泄漏
变更范围:Engine(common_engine.py)、环境变量(envs.py)
影响面 Tag[Engine] [PD Disaggregation]

问题

级别 文件 概述
🟡 建议 fastdeploy/envs.py:194 PR 描述声明默认 600s,代码实际为 1200s,描述-代码不一致

历史 Findings 修复情况

Finding 问题 状态
F1 check-then-act TOCTOU 竞态窗口 ⚠️ 仍存在

F1 补充说明:_prealloc_timeout_monitor 在持锁扫描得到 timed_out 列表后释放锁,再逐个调用 pre_recycle_resource。在释放锁到回收之间,_process_prefilled_requests 可能已将该请求成功 prefill(设置 is_preallocated=False 并调用 add_prefilled_request),但 monitor 仍会执行 pre_recycle_resource 释放其 blocks,导致活跃 decode 请求的 blocks 被错误回收。建议在 for req_id in timed_out 循环内、调用 pre_recycle_resource 之前,重新持锁检查 req.is_preallocated 是否仍为 True。

📝 PR 规范检查

PR 标题格式合规([BugFix] 为官方 Tag),描述结构基本完整。但 Usage or CommandAccuracy Tests 段仅含 HTML 注释(渲染后为空),Checklist 中勾选了"Add unit tests"但 diff 中未包含测试代码且未说明原因。

PR 描述建议(点击展开,可直接复制)
## Motivation
在PD分离架构中,P实例收到请求后会向D实例申请block预分配。若P实例在申请后crash或网络故障,D实例上的block将永久泄露,长期累积后导致429错误(block耗尽)。

## Modifications
1. **`fastdeploy/envs.py`** — 新增环境变量 `FD_DECODE_PREALLOC_BLOCK_TIMEOUT`(默认1200s,设为0禁用)
2. **`fastdeploy/engine/common_engine.py`** — 新增 `_prealloc_timeout_monitor()` 守护线程方法,定期检查已预分配但超时未完成的请求,自动回收block并返回408错误码

## Usage or Command
通过环境变量控制超时时间:
```bash
export FD_DECODE_PREALLOC_BLOCK_TIMEOUT=1200  # 默认1200s,设为0禁用
```

## Accuracy Tests
N/A(本次变更不影响模型输出)

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
  - 本次变更为守护线程级 bugfix,暂无单测,建议后续补充集成测试
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

整体方案合理,通过守护线程超时回收解决了 PD 分离架构中 block 泄漏问题。主要需关注描述与代码默认值不一致,以及历史 TOCTOU 竞态问题仍未修复(可能导致正常 prefill 请求的 blocks 被误回收)。

Comment thread fastdeploy/envs.py
"FD_HPU_MEASUREMENT_MODE": lambda: os.getenv("FD_HPU_MEASUREMENT_MODE", "0"),
"FD_PREFILL_WAIT_DECODE_RESOURCE_SECONDS": lambda: int(os.getenv("FD_PREFILL_WAIT_DECODE_RESOURCE_SECONDS", "30")),
# Timeout (seconds) for D to reclaim preallocated blocks if P never follows through. 0 to disable.
"FD_DECODE_PREALLOC_BLOCK_TIMEOUT": lambda: int(os.getenv("FD_DECODE_PREALLOC_BLOCK_TIMEOUT", "1200")),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 PR 描述声明默认超时为 600s,但代码实际默认值为 "1200"(1200 秒)。

描述与实现不一致会误导用户配置。请统一:若 1200s 为期望值,修正 PR 描述;若 600s 为期望值,修正此处默认值。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants