[BugFix] 为decode实例增加一个守护线程去监测预分配blocks超时 by CyanScholar · Pull Request #7965 · PaddlePaddle/FastDeploy

CyanScholar · 2026-06-02T03:24:56Z

…on timeouts.

Motivation

在PD分离架构中，P实例收到请求后会向D实例申请block预分配。若P实例在申请后crash或网络故障，D实例上的block将永久泄露，长期累积后导致429错误（block耗尽）。

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

代码修改（2个文件）

fastdeploy/envs.py — 新增环境变量 FD_DECODE_PREALLOC_BLOCK_TIMEOUT（默认600s，设为0禁用）
fastdeploy/engine/common_engine.py — 新增 _prealloc_timeout_monitor() 守护线程方法，定期检查已预分配但超时未完成的请求，自动回收block

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
[x ] Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

…on timeouts.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a configurable timeout-based cleanup mechanism to reclaim decode-side preallocated blocks when prefill never arrives, preventing long-lived resource leaks.

Changes:

Introduces FD_DECODE_PREALLOC_BLOCK_TIMEOUT environment setting for reclaim timeout configuration.
Starts a background monitor thread to periodically scan and reclaim timed-out preallocations.
Emits warnings/errors and returns a 408-like RequestOutput when reclaiming occurs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
fastdeploy/envs.py	Adds a new env var to configure decode prealloc reclamation timeout.
fastdeploy/engine/common_engine.py	Adds a background monitor thread to reclaim timed-out preallocated blocks and notify request completion.

+        check_interval = max(10, min(timeout / 10, 60))
+        while self.running:
+            time.sleep(check_interval)


+                for req_id in timed_out:
+                    with self.resource_manager.lock:
+                        if req_id not in self.resource_manager.requests:
+                            continue
+                        if any(r.request_id == req_id for r in self.resource_manager.running):
+                            continue
+                    self.llm_logger.warning(f"Reclaiming preallocated blocks for {req_id}: timeout {timeout}s")
+                    self.resource_manager.pre_recycle_resource(req_id)


+            except Exception as e:
+                self.llm_logger.error(f"Prealloc timeout monitor error: {e}")


+    # Timeout (seconds) for D to reclaim preallocated blocks if P never follows through. 0 to disable.
+    "FD_DECODE_PREALLOC_BLOCK_TIMEOUT": lambda: int(os.getenv("FD_DECODE_PREALLOC_BLOCK_TIMEOUT", "600")),


PaddlePaddle-bot · 2026-06-02T03:34:04Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-02 17:51:40

CI报告基于以下代码生成（30分钟更新一次）:
PR commit: 624a2ca | Merge base: b0e2e01 (branch: develop)

1 Required任务 : 8/10 通过

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
42(0)	42	36	5	0	1	0

任务	错误类型	置信度	日志
`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	PR问题：E2E服务无响应，decode引擎可能崩溃	中	Job
`Approval`	需要 Approval	—	Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题（置信度: 中）

分析器: ci_analyze_unittest_fastdeploy | 错误类型: PR问题 | 置信度: 中

失败用例:

用例	错误摘要
`e2e/test_ernie_03b_pd_router_v1_ipc.py::test_non_chat_usage_non_stream`	`send_request()` 返回 None，服务60s内无响应

关键日志:

api_url = 'http://0.0.0.0:8248/v1/completions'

    def test_non_chat_usage_non_stream(api_url):
>       response = send_request(url=api_url, payload=payload).json()
E       AttributeError: 'NoneType' object has no attribute 'json'

tests/e2e/test_ernie_03b_pd_router_v1_ipc.py:413: AttributeError
Duration: 60.0613s (请求超时)

根因摘要: PR在decode引擎核心循环中新增属性赋值，可能导致服务崩溃

本次PR在 common_engine.py 的 _process_allocate_resource_requests() 中新增了 task.is_preallocated = True，在 _process_prefilled_requests() 中新增了 req.is_preallocated = False。若 task 或 req 对象定义了 __slots__ 或为不可变对象（如 frozen Pydantic model），则直接赋值会抛出 AttributeError，导致decode核心处理循环崩溃，IPC模式下的E2E测试服务因此无法响应请求，send_request() 超时返回 None。

修复建议:

检查 task（_process_allocate_resource_requests 中）和 req（_process_prefilled_requests 中）的对象类型定义，确认是否支持动态属性赋值（无 __slots__ 限制）
若对象不支持动态属性，在对应的请求类（如 RequestTask 或 SchedulerRequest）中显式声明 is_preallocated: bool = False 字段
也可使用外部字典替代方案：在 resource_manager 中维护 preallocated_set: set，替代对象属性赋值

关联变更: fastdeploy/engine/common_engine.py — _process_allocate_resource_requests() L2099（新增 task.is_preallocated = True）、_process_prefilled_requests() L2201-L2203（新增 req.is_preallocated = False）

🟡 Approval — 需要人工审批

该 Job 需要人工 Approval，完成审批后 CI 才会继续执行。请通过人工审批。

juncaipeng · 2026-06-02T03:33:49Z

    "FD_HPU_MEASUREMENT_MODE": lambda: os.getenv("FD_HPU_MEASUREMENT_MODE", "0"),
    "FD_PREFILL_WAIT_DECODE_RESOURCE_SECONDS": lambda: int(os.getenv("FD_PREFILL_WAIT_DECODE_RESOURCE_SECONDS", "30")),
+    # Timeout (seconds) for D to reclaim preallocated blocks if P never follows through. 0 to disable.
+    "FD_DECODE_PREALLOC_BLOCK_TIMEOUT": lambda: int(os.getenv("FD_DECODE_PREALLOC_BLOCK_TIMEOUT", "600")),


这个耗时再加大点吧，避免出现p实例出现排队导致超时的情况，可以改成20分钟，1200s

juncaipeng · 2026-06-02T03:35:55Z

+            try:
+                now = time.time()
+                with self.resource_manager.lock:
+                    skip_ids = (


这里感觉不太鲁棒，可以给请求加一个is_preallocated标记，在在前面_process_allocate_resource_requests函数中标记为True，_process_prefilled_requests中成功处理后标记为False，这里就可以使用is_preallocated标记

codecov-commenter · 2026-06-02T04:11:13Z

Codecov Report

❌ Patch coverage is 16.00000% with 21 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@b0e2e01). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/engine/common_engine.py	16.00%	19 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7965   +/-   ##
==========================================
  Coverage           ?   67.86%           
==========================================
  Files              ?      467           
  Lines              ?    65216           
  Branches           ?    10013           
==========================================
  Hits               ?    44260           
  Misses             ?    18112           
  Partials           ?     2844

Flag	Coverage Δ
GPU	`78.15% <16.00%> (?)`
XPU	`7.08% <4.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…on timeouts.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-02 13:39:08

📋 Review 摘要

PR 概述：为 Decode 实例新增守护线程监测预分配 block 超时，防止 P 实例 crash 导致 block 永久泄漏
变更范围：Engine（common_engine.py）、环境变量（envs.py）
影响面 Tag：[Engine] [PD Disaggregation]

问题

级别	文件	概述
🟡 建议	`fastdeploy/envs.py:194`	PR 描述声明默认 600s，代码实际为 1200s，描述-代码不一致

历史 Findings 修复情况

Finding	问题	状态
F1	check-then-act TOCTOU 竞态窗口	⚠️ 仍存在

F1 补充说明：_prealloc_timeout_monitor 在持锁扫描得到 timed_out 列表后释放锁，再逐个调用 pre_recycle_resource。在释放锁到回收之间，_process_prefilled_requests 可能已将该请求成功 prefill（设置 is_preallocated=False 并调用 add_prefilled_request），但 monitor 仍会执行 pre_recycle_resource 释放其 blocks，导致活跃 decode 请求的 blocks 被错误回收。建议在 for req_id in timed_out 循环内、调用 pre_recycle_resource 之前，重新持锁检查 req.is_preallocated 是否仍为 True。

📝 PR 规范检查

PR 标题格式合规（[BugFix] 为官方 Tag），描述结构基本完整。但 Usage or Command 和 Accuracy Tests 段仅含 HTML 注释（渲染后为空），Checklist 中勾选了"Add unit tests"但 diff 中未包含测试代码且未说明原因。

PR 描述建议（点击展开，可直接复制）

## Motivation
在PD分离架构中，P实例收到请求后会向D实例申请block预分配。若P实例在申请后crash或网络故障，D实例上的block将永久泄露，长期累积后导致429错误（block耗尽）。

## Modifications
1. **`fastdeploy/envs.py`** — 新增环境变量 `FD_DECODE_PREALLOC_BLOCK_TIMEOUT`（默认1200s，设为0禁用）
2. **`fastdeploy/engine/common_engine.py`** — 新增 `_prealloc_timeout_monitor()` 守护线程方法，定期检查已预分配但超时未完成的请求，自动回收block并返回408错误码

## Usage or Command
通过环境变量控制超时时间：
```bash
export FD_DECODE_PREALLOC_BLOCK_TIMEOUT=1200  # 默认1200s，设为0禁用
```

## Accuracy Tests
N/A（本次变更不影响模型输出）

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
  - 本次变更为守护线程级 bugfix，暂无单测，建议后续补充集成测试
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

整体方案合理，通过守护线程超时回收解决了 PD 分离架构中 block 泄漏问题。主要需关注描述与代码默认值不一致，以及历史 TOCTOU 竞态问题仍未修复（可能导致正常 prefill 请求的 blocks 被误回收）。

PaddlePaddle-bot · 2026-06-02T05:41:43Z

    "FD_HPU_MEASUREMENT_MODE": lambda: os.getenv("FD_HPU_MEASUREMENT_MODE", "0"),
    "FD_PREFILL_WAIT_DECODE_RESOURCE_SECONDS": lambda: int(os.getenv("FD_PREFILL_WAIT_DECODE_RESOURCE_SECONDS", "30")),
+    # Timeout (seconds) for D to reclaim preallocated blocks if P never follows through. 0 to disable.
+    "FD_DECODE_PREALLOC_BLOCK_TIMEOUT": lambda: int(os.getenv("FD_DECODE_PREALLOC_BLOCK_TIMEOUT", "1200")),


🟡 建议 PR 描述声明默认超时为 600s，但代码实际默认值为 "1200"（1200 秒）。

描述与实现不一致会误导用户配置。请统一：若 1200s 为期望值，修正 PR 描述；若 600s 为期望值，修正此处默认值。

Add a daemon thread to the decode instance to monitor for preallocati…

f0024b6

…on timeouts.

Copilot AI review requested due to automatic review settings June 2, 2026 03:24

CyanScholar had a problem deploying to Metax_ci June 2, 2026 03:25 — with GitHub Actions Failure

Copilot AI reviewed Jun 2, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

juncaipeng reviewed Jun 2, 2026

View reviewed changes

Add a daemon thread to the decode instance to monitor for preallocati…

624a2ca

…on timeouts.

CyanScholar had a problem deploying to Metax_ci June 2, 2026 05:20 — with GitHub Actions Failure

PaddlePaddle-bot reviewed Jun 2, 2026

View reviewed changes

juncaipeng approved these changes Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] 为decode实例增加一个守护线程去监测预分配blocks超时#7965

[BugFix] 为decode实例增加一个守护线程去监测预分配blocks超时#7965
CyanScholar wants to merge 2 commits into
PaddlePaddle:developfrom
CyanScholar:develop

CyanScholar commented Jun 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

PaddlePaddle-bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

juncaipeng Jun 2, 2026

Uh oh!

juncaipeng Jun 2, 2026

Uh oh!

codecov-commenter commented Jun 2, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		except Exception as e:
		self.llm_logger.error(f"Prealloc timeout monitor error: {e}")

		# Timeout (seconds) for D to reclaim preallocated blocks if P never follows through. 0 to disable.
		"FD_DECODE_PREALLOC_BLOCK_TIMEOUT": lambda: int(os.getenv("FD_DECODE_PREALLOC_BLOCK_TIMEOUT", "600")),

Conversation

CyanScholar commented Jun 2, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

PaddlePaddle-bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 Required任务 : 8/10 通过

2 失败详情

Uh oh!

This comment was marked as outdated.

Uh oh!

juncaipeng Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

juncaipeng Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

历史 Findings 修复情况

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

PaddlePaddle-bot commented Jun 2, 2026 •

edited

Loading

codecov-commenter commented Jun 2, 2026 •

edited

Loading