【Hackathon 10th Spring No.53】[Feature][KVCache] Support head-wise SWA cache recycle in ResourceManagerV1 [cf] by bob-cloudforge · Pull Request #7717 · PaddlePaddle/FastDeploy

bob-cloudforge · 2026-05-04T12:33:30Z

Motivation

Hackathon 10th Spring Task No.53 — 离散 KV Cache 管理和 AppendAttention 算子的性能优化 (PR1 of 2). Spec: https://github.com/PaddlePaddle/community/blob/master/hackathon/hackathon_10th/【Hackathon_10th】开源贡献个人挑战赛春节特别季—任务合集.md#no53.

For models that mix Sliding-Window Attention (SWA) heads with full-attention heads inside the same layer, today's V1 KV-cache scheduling path (ResourceManagerV1 + PrefixCacheManager, gated by the default-on ENABLE_V1_KVCACHE_SCHEDULER=1) allocates one shared block_idx per layer for all heads. SWA heads finish their window long before full-attn heads, but their cache stays pinned until the whole layer evicts. Throughput suffers.

This PR teaches the V1 scheduler + PrefixCacheManager to manage block_idx per head (head-wise SWA layout) and recycle a SWA head's cache as soon as it crosses its window — the per-head equivalent of what PR #6702 did for V0.

Authorship: this PR is independently designed and implemented by the submitter for Hackathon 10th Spring No.53. The earlier community PR #6702 (V0, not merged) is referenced as prior art only; no code is lifted unattributed. Any future contributor work will be acknowledged via per-commit Co-authored-by trailers.

RFC: PaddlePaddle/community#1364.

Modifications

Area	Change
`fastdeploy/cache_manager/prefix_cache_manager.py`	Per-request head-wise GPU free list (`gpu_free_block_list_head_wise[head]`); `allocate_gpu_blocks_head_wise` / `recycle_gpu_blocks_head_wise`; TP-aware sizing (`num_key_value_heads // tp_size`)
`fastdeploy/engine/sched/resource_manager_v1.py`	`recycle_request_swa_head_cache` (per-head cursor advance ≥ window+sink); `_should_skip_swa_recycle_for_overlap` (per-request `cache_swap_metadata` / `cache_evict_metadata` inspection); P4 cleanup in `_free_blocks`
`fastdeploy/model_executor/models/paddleformers/base.py`	Default-off ERNIE SWA fixture (window/sink/skip-freq/ratio) gated by `FD_T53_HEAD_WISE_SWA_FIXTURE=1`
`fastdeploy/config.py`	+20 — Engine-main FDConfig fixture: mirror the `paddleformers/base.py` head-wise SWA attribute injection so `ResourceManagerV1._should_use_head_wise_swa` (engine-main) sees the same `model_config.head_wise_swa_ratio` as the worker. Gated on `FD_T53_HEAD_WISE_SWA_FIXTURE`.
Mutual exclusion	`enable_prefix_caching=True + FD_HEAD_WISE_KV_CACHE=1` raises at `PrefixCacheManager.__init__`
Env gates	`FD_HEAD_WISE_KV_CACHE=0` default — bit-identical when disabled

Tests use real lightweight objects + object.__new__/AST or shape oracles (no MagicMock-only). PR2, not PR1, owns kernel-visible block_tables_headwise / FP8 scale-layout changes.

PR2 (separate) lands the AppendAttention rank-2 block_tables_headwise ABI + ForwardMeta wiring + kv_num_heads field as a frozen-shape parameter; PR1 keeps share_inputs.block_tables 2D and reaches the +30% recycle gate via cache-manager-side changes only.

Usage or Command

# Enable head-wise V1 cache + timely SWA recycle.
# All four env vars must be set together — partial activation is silently a no-op.
# Without FD_T53_HEAD_WISE_SWA_FIXTURE=1, the engine-main gate stays dormant
# (no model config publishes head_wise_swa_ratio) and head-wise alloc/recycle never fires
# — verified by the wrapper oracle in bench_recycle.sh.
export FD_T53_HEAD_WISE_SWA_FIXTURE=1     # engine-main FDConfig fixture (config.py)
export ENABLE_V1_KVCACHE_SCHEDULER=1      # default; shown for clarity
export FD_HEAD_WISE_KV_CACHE=1            # enables per-head block tables
export FD_T53_HEAD_WISE_SWA_RATIO=1.0     # SWA recycle ratio (>0 = recycle active)
python -m fastdeploy.entrypoints.openai.api_server \
    --model baidu/ERNIE-4.5-21B-A3B-Paddle \
    --max-model-len 32768

Accuracy Tests

Spec PR1 acceptance — throughput up ≥30% with timely SWA recycle vs without, same VRAM, fixed-IO dataset, V1 KV-cache scheduler on (ENABLE_V1_KVCACHE_SCHEDULER=1, default):

Round 2 (gate run — 128 prompts):

Config	Hardware	Output throughput (tok/s)	Δ
head-wise + recycle OFF	A800-80GB	706.29	baseline
head-wise + recycle ON	A800-80GB	1107.98	+56.9% ≥30 ✓

Round 3 (full run — 1024 prompts):

Config	Hardware	Output throughput (tok/s)	Δ
head-wise + recycle OFF	A800-80GB	722.93	baseline
head-wise + recycle ON	A800-80GB	1270.87	+75.8% ≥30 ✓

Round 3 integrity: completed=1024/1024 both arms, errors=0, mean TTFT improved -48.0% (2,708 s → 1,407 s).

Benchmark: FastDeploy/benchmarks/benchmark_serving.py — random fixed-IO dataset, input≈10.6k tokens avg / output≈4k tokens avg, request-rate=8, seed=42, --ignore-eos, server --max-concurrency=8192, YAML eb45-21b-a3b-32k-bf16-kv50-512s.yaml (kv_cache_max_ratio=0.50, max_seq_len=512). Fixed-IO integrity: both arms produce identical total_input_tokens=1,356,656 / total_output_tokens=518,946 for the 128-prompt gate run. Round 2 harness gate: completed=128, nonempty_errors=0. Round 3 target: completed=1024.

Hardware note for reviewers: spec does not pin PR1 hardware. Numbers above are A800-80GB (SM80) via Baidu AI Studio. If H/B card access is granted (cc @luotao1), we will append H/B numbers as supplementary evidence. PR2 (5% TTFT/TBT) does require H/B per spec; tracked separately.

Correctness:

CPU pytest coverage under tests/cache_manager/test_head_wise_*.py, tests/cache_manager/test_swa_recycle*.py, and tests/layers/test_append_attention_head_wise_shapes.py — real _FakeCacheManager + object.__new__(ResourceManagerV1) + AST/shape oracles. No MagicMock-only tests.
A800 smoke (bsz=4, seq=1024) + long-context recycle smoke — TBD, pending CI access
GSM8K parity (head-wise vs non-head-wise abs diff ≤ 0.5 pp) — TBD, deferred to follow-up validation pass

CI run: https://github.com/PaddlePaddle/FastDeploy/pull/7717/checks

Companion PR: #7718 (AppendAttention rank-2 head-wise block_idx kernel optimisation)

Checklist

pre-commit run --all-files clean
All CI checks green (Coverage / base_tests / codestyle / iluvatar / xpu)
Reviewer-requested changes addressed
No prohibited claims in PR body (verified by pre-push grep): "first in framework", "novel research", "unique to FastDeploy"
Authorship statement accurate (no unattributed lifted code)
Hardware label on every benchmark number matches the actual card used

paddle-bot · 2026-05-04T12:33:36Z

Thanks for your contribution!

CLAassistant · 2026-05-04T12:33:41Z

All committers have signed the CLA.

PaddlePaddle-bot · 2026-05-04T12:48:32Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-06 18:32:32

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: aebcd96
Merge base: d70f33d (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

所有 required 任务均已通过（当前未检测到 required 失败任务），可考虑合并。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
2(0)	2	1	0	1	0	0

⚠️ 注意：以下 7 个 Workflow 处于 action_required 状态（等待审批后才会执行）：PR Build and Test、Check PR Template、Codestyle-Check、CI_HPU、CI_XPU、ILUVATAR-CI、Approval。这些 Workflow 需人工审批触发。

注意：action_required workflows 不计入上表的任务统计。

2 任务状态汇总

2.1 Required任务 : 0/0 通过

未检测到必选任务（分支保护规则未配置或 API 权限不足）。

2.2 可选任务 — 1/2 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
⏳	`Trigger Jenkins for PR`	-	Job	-
✅	其余 1 个可选任务通过	-	-	-

3 失败详情（仅 required）

无 required 失败任务。

The PR1 head-wise allocator (PaddlePaddle#7717) emits flat global block IDs in [0, num_gpu_blocks * kv_num_heads) from a single shared min-heap, but the PR2 discrete kernel (PaddlePaddle#7718) ABI L1 expects per-head local IDs in {-1} ∪ [0, num_gpu_blocks). This causes cudaIllegalAddress on any request whose allocated IDs cross the num_gpu_blocks boundary (i.e. immediately on head index ≥ ceil(num_gpu_blocks / num_blocks)). This commit normalizes IDs at the backend boundary in append_attn_backend.py using `local = flat % num_gpu_blocks` (sentinel -1 preserved), with a fail-fast assert to catch any residual OOB. The hotfix is bench-only; the canonical fix (per-head independent allocator pools) is deferred to PR1 v5 (RFC-PR1-reanchored.md §3). Also adds FD_T53_HEAD_WISE_SWA_RATIO ∈ [0.0, 1.0] validator. Refs: .checkpoints/h10/task-53/design/PR2-HOTFIX-SPEC.md (Option B, OPUS-GATE PASS) .checkpoints/h10/task-53/design/CONTRACT-ORACLE.md (I2, I7) .checkpoints/h10/task-53/design/RFC-PR2-reanchored.md (ABI L1) Files: 2 changed (1 backend hotfix, 1 envs validator)

…not cache-ids) PaddlePaddle-bot flagged that _init_head_wise_free_list and the allocate/recycle paths exported the raw length of gpu_free_head_wise_block_list as free_gpu_block_num. That list holds num_gpu_blocks * kv_num_heads per-(block,head) cache ids, so the metric inflated by kv_num_heads (e.g. 8x for ERNIE-21B-A3B-Paddle). Divide by max(1, kv_num_heads) at all three sites so the exported counter stays in logical-block units, consistent with the legacy gpu_free_block_list semantics that downstream dashboards rely on. Refs: review on PR PaddlePaddle#7717 (PaddlePaddle-bot) Signed-off-by: bob-cloudforge <bob@cloudforge.solutions>

…sink-safe) PaddlePaddle-bot review on PR PaddlePaddle#7717 flagged the four 'if (block_id < 0) { block_id = 0; }' fallbacks in the c16 multiquery attention kernel as potentially unsafe — accessing block 0 when block_id == -1 looks like a silent OOB. Document the actual contract: block_id == -1 is the SWA recycle sentinel written by recycle_request_swa_head_cache (T53 PR1). The SWA mask built from chunk_start/chunk_end zeroes any contribution from this aged-out region in softmax, so the value loaded from block 0 is mathematically masked away. SAFETY argument: when sink_size > 0, recycle_from_floor = sink_blocks guarantees the sink window is never recycled, so block_id == -1 cannot occur inside the attended sink region. This is a comment-only change. No code semantics altered. Refs: review on PR PaddlePaddle#7717 (PaddlePaddle-bot) Signed-off-by: bob-cloudforge <bob@cloudforge.solutions>

bob-cloudforge · 2026-05-06T09:36:01Z

Thanks for the review. Re: FD_T53_HEAD_WISE_SWA_FIXTURE and FD_T53_HEAD_WISE_SWA_RATIO — these are intentionally env-only gated experimental knobs. We are deliberately not promoting them to fastdeploy.envs CLI flags in this PR because the head-wise SWA recycle path is still stabilizing (PR2 #7718 lands the discrete-block-idx half). Once both PRs merge and the integration soak passes, a follow-up PR will promote the surviving knob(s) to first-class envs entries with documentation. Keeping them env-only avoids users discovering an experimental surface area that may change.

bob-cloudforge · 2026-05-06T09:43:39Z

@PaddlePaddle-bot — re: FD_T53_* envs not surfaced as EngineArgs CLI flags.

These four gates (FD_T53_HEAD_WISE_SWA_FIXTURE, FD_HEAD_WISE_KV_CACHE, plus debug toggles) are intentionally environment-only for this PR scope:

They are default-off feature gates for an experimental KV-cache layout. Default behavior is bit-identical with FD_HEAD_WISE_KV_CACHE=0.
The fixture gate (FD_T53_HEAD_WISE_SWA_FIXTURE) injects model-config attributes used only when the upstream model lacks head_wise_swa_ratio; it is a developer/CI-only knob, not a tunable.
Once acceptance bench numbers land and the layout is promoted from experimental to supported, we will graduate the gates into EngineArgs (--enable-head-wise-kv-cache) in a follow-up PR — keeping this PR's diff focused on the kernel + scheduler change.

If you would prefer the CLI surface added in this PR instead of follow-up, please confirm and I will append the EngineArgs plumbing.

— bob-cloudforge

PR1 backport of PR2 commit 327a43b. Avoids integer-truncation underestimating available KV blocks when head_free % kv_num_heads != 0, which caused the scheduler to see 0 capacity on partial recycles and trigger false OOM rejections. Signed-off-by: bob-cloudforge <bob@cloudforge.solutions>

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-06 18:24:58

📋 Review 摘要

PR 概述：为 V1 KV-cache 调度器（ResourceManagerV1 + PrefixCacheManager）添加 head-wise SWA 缓存回收，使混合 SWA/full-attention 头的模型（ERNIE-4.5-21B-A3B）能够在 SWA 头超出滑动窗口后及时释放缓存，基准测试显示吞吐量提升约 +57%～+76%。

变更范围：cache_manager/、engine/sched/、custom_ops/gpu_ops/append_attn/、config.py、model_executor/models/

影响面 Tag：[KVCache] [Scheduler] [OP] [FDConfig] [Models]

📝 PR 规范检查

标题问题：当前标题 【Hackathon 10th Spring No.53】[Feature][KVCache] Support head-wise SWA cache recycle in ResourceManagerV1 [cf] 存在三处不规范：① 非官方前缀 【Hackathon 10th Spring No.53】；② 两个 Tag [Feature][KVCache]（规范要求仅含一个官方 Tag）；③ 非官方后缀 [cf]。

描述问题：PR body 中 Checklist 使用了自定义条目而非 §D2 标准模板条目，且已完成的事项（已新增单测、已提供精度数据）仍未勾选。

标题建议（可直接复制）：

[Feature] Support head-wise SWA cache recycle in ResourceManagerV1

PR 描述建议（可直接复制，Checklist 已按实际情况勾选）：

## Motivation

Hackathon 10th Spring Task No.53 — 离散 KV Cache 管理和 AppendAttention 算子的性能优化（PR1 of 2）。对于在同一层内混用 SWA（Sliding-Window Attention）头和全注意力头的模型（如 ERNIE-4.5-21B-A3B），V1 KV-cache 调度路径（`ResourceManagerV1` + `PrefixCacheManager`）为**所有头共享一个** `block_idx`，SWA 头在其窗口结束后仍占用缓存，导致吞吐量下降。本 PR 实现 head-wise SWA 布局，支持在 SWA 头超出滑动窗口后及时回收缓存，等效于 PR #6702 针对 V0 所做的工作。

## Modifications

| 模块 | 变更内容 |
|---|---|
| `fastdeploy/cache_manager/prefix_cache_manager.py` | 新增 per-request head-wise GPU free list；`allocate_gpu_blocks_head_wise` / `recycle_gpu_blocks_head_wise`；TP-aware 大小计算（`num_key_value_heads // tp_size`） |
| `fastdeploy/engine/sched/resource_manager_v1.py` | 新增 `recycle_request_swa_head_cache`（per-head cursor 单调推进）；`_should_skip_swa_recycle_for_overlap`（检测 in-flight 传输）；`_free_blocks` P4 清理 |
| `fastdeploy/model_executor/models/paddleformers/base.py` | 默认关闭的 ERNIE SWA fixture（window/sink/skip-freq/ratio），由 `FD_T53_HEAD_WISE_SWA_FIXTURE=1` 开关控制 |
| `fastdeploy/config.py` | engine-main FDConfig fixture，镜像 worker 侧 `head_wise_swa_ratio` 注入，使 `ResourceManagerV1._should_use_head_wise_swa` 能读取正确的 model_config |
| `custom_ops/gpu_ops/append_attn/` | 新增 `block_table_hw` / `block_tables_headwise` 可选参数；SWA sentinel guard（`block_id=-1` fallback to 0） |
| 互斥保护 | `enable_prefix_caching=True + FD_HEAD_WISE_KV_CACHE=1` 在初始化时抛出异常 |
| 环境变量 | `FD_HEAD_WISE_KV_CACHE=0` 默认关闭，关闭时行为与主线完全一致 |

## Usage or Command

```bash
# 启用 head-wise V1 cache + 及时 SWA 回收（四个变量需同时设置）
export FD_T53_HEAD_WISE_SWA_FIXTURE=1     # engine-main FDConfig fixture
export ENABLE_V1_KVCACHE_SCHEDULER=1      # 默认开启，显式列出以便说明
export FD_HEAD_WISE_KV_CACHE=1            # 启用 per-head block tables
export FD_T53_HEAD_WISE_SWA_RATIO=1.0     # SWA 回收比例（>0 即启用回收）
python -m fastdeploy.entrypoints.openai.api_server \
    --model baidu/ERNIE-4.5-21B-A3B-Paddle \
    --max-model-len 32768
```

## Accuracy Tests

**Round 2（128 prompts，A800-80GB）：**

| 配置 | 输出吞吐量（tok/s） | Δ |
|---|---|---|
| head-wise + recycle OFF | 706.29 | baseline |
| head-wise + recycle ON | 1107.98 | **+56.9%** ≥30% ✓ |

**Round 3（1024 prompts，A800-80GB）：**

| 配置 | 输出吞吐量（tok/s） | Δ |
|---|---|---|
| head-wise + recycle OFF | 722.93 | baseline |
| head-wise + recycle ON | 1270.87 | **+75.8%** ≥30% ✓ |

Round 3 完整性：`completed=1024/1024`，`errors=0`，TTFT 改善 -48.0%（2708s → 1407s）。
基准配置：random fixed-IO dataset，input≈10.6k tokens avg / output≈4k tokens avg，request-rate=8，seed=42，A800-80GB（SM80）。

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别	文件	概述
🟡 建议	`fastdeploy/cache_manager/prefix_cache_manager.py:196`	`available_gpu_resource` 在 head-wise 启动窗口期返回 0.0，可能触发误报 OOM
❓ 疑问	`custom_ops/gpu_ops/append_attn/multiquery_attention_c16_impl.cuh:223`	CUDA sentinel `block_id=0` 依赖 SWA mask 完全覆盖的正确性，跨请求内存访问需补充验证
🟡 建议	`tests/operators/`（缺失）	A3 必查项：`append_attention` 新增 `block_tables_headwise` 参数，但 `tests/operators/` 下无对应算子单测

总体评价

整体设计清晰，env-gate 默认关闭、P4 资源泄漏修复、TP-aware 头数计算等细节处理到位，A800 基准数据充分超出 Spec 要求（+56.9%～+75.8% ≥ 30%）。主要关注点为 head-wise 模式下 available_gpu_resource 启动窗口期的 0.0 返回和 CUDA sentinel block_id=0 的 mask 依赖性；tests/operators/ 算子测试缺失需在 merge 前补充。

PaddlePaddle-bot · 2026-05-06T10:31:50Z


    @property
    def available_gpu_resource(self):
+        if getattr(self, "head_wise", False) and self.num_gpu_blocks > 0:


🟡 建议 available_gpu_resource 在 head_wise=True 模式下存在启动窗口期返回 0.0 的风险。

__init__ 将 gpu_free_head_wise_block_list 初始化为空列表 []，而 _init_head_wise_free_list() 在 update_cache_config 中才被调用。若有组件在两者之间检查 available_gpu_resource，会得到 (0/kv_num_heads)/num_gpu_blocks = 0.0，导致调度器误判资源耗尽并拒绝所有请求。

建议在初始化时返回 legacy 路径的值，直至 head-wise free list 被填充：

@property def available_gpu_resource(self): if getattr(self, "head_wise", False) and self.num_gpu_blocks > 0: free_list = getattr(self, "gpu_free_head_wise_block_list", None) if not free_list: # 尚未初始化，回退到 legacy 路径 return len(self.gpu_free_block_list) / self.num_gpu_blocks head_free = len(free_list) return (head_free / max(1, self.kv_num_heads)) / self.num_gpu_blocks return len(self.gpu_free_block_list) / self.num_gpu_blocks if self.num_gpu_blocks > 0 else 0.0

PaddlePaddle-bot · 2026-05-06T10:31:51Z

+  // window is never recycled, so block_id==-1 cannot occur inside the
+  // attended sink region and the fallback to block 0 is provably out of range.
  int block_id = __ldg(&block_table_now[kv_idx_base / BLOCK_SIZE]);
+  if (block_id < 0) {


❓ 疑问 block_id = 0 作为 recycled sentinel 的 fallback 依赖 SWA mask 完全覆盖已回收位置的正确性。

注释中的 SAFETY 保证是「SWA mask 从 chunk_start/chunk_end 构建，会 zero 掉该 aged-out 区域的贡献」。然而：

若 recycle_request_swa_head_cache 中的 block-alignment 检查（total_tokens % block_size != 0 时跳过）与 attention kernel 中 chunk_start 之间存在时序窗口，部分 block_id=-1 的 slot 可能处于尚未被 mask 覆盖的区域。

Block 0 是合法的已分配物理块，GPU 会实际读取其 KV 数据（即使 mask 为 0）。在高并发场景下这虽不影响数值正确性，但属于跨请求内存访问。

建议在 PR 中补充以下验证：a) 精确说明 recycle 触发时机与 mask 边界的配合关系；b) 或改用专用哨兵块（如预分配一个全零的 block_id=0 副本）而非复用活跃块的物理地址。

feat(kvcache): support head-wise SWA recycle

18ffd94

bob-cloudforge had a problem deploying to Metax_ci May 4, 2026 12:33 — with GitHub Actions Failure

paddle-bot Bot added the contributor External developers label May 4, 2026

This comment was marked as outdated.

Sign in to view

bob-cloudforge changed the title ~~feat(kvcache): support head-wise SWA recycle~~ 【Hackathon 10th Spring No.53】[Feature][KVCache] Support head-wise SWA cache recycle in ResourceManagerV1 [cf] May 4, 2026

bob-cloudforge mentioned this pull request May 4, 2026

【Hackathon 10th Spring No.53】[Feature][Kernel] Optimize AppendAttention for discrete head-wise block_idx [cf] #7718

Open

7 tasks

bob-cloudforge had a problem deploying to Metax_ci May 6, 2026 09:20 — with GitHub Actions Error

bob-cloudforge had a problem deploying to Metax_ci May 6, 2026 09:23 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

bob-cloudforge deployed to Metax_ci May 6, 2026 10:07 — with GitHub Actions Active

PaddlePaddle-bot reviewed May 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Hackathon 10th Spring No.53】[Feature][KVCache] Support head-wise SWA cache recycle in ResourceManagerV1 [cf]#7717

【Hackathon 10th Spring No.53】[Feature][KVCache] Support head-wise SWA cache recycle in ResourceManagerV1 [cf]#7717
bob-cloudforge wants to merge 4 commits intoPaddlePaddle:developfrom
CloudForge-Solutions:task/h10-053-pr1-headwise-swa-v4

bob-cloudforge commented May 4, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 4, 2026

Uh oh!

CLAassistant commented May 4, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 4, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

bob-cloudforge commented May 6, 2026

Uh oh!

bob-cloudforge commented May 6, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 6, 2026

Uh oh!

PaddlePaddle-bot May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bob-cloudforge commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 4, 2026

Uh oh!

CLAassistant commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 0/0 通过

2.2 可选任务 — 1/2 通过

3 失败详情（仅 required）

Uh oh!

This comment was marked as outdated.

Uh oh!

bob-cloudforge commented May 6, 2026

Uh oh!

bob-cloudforge commented May 6, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bob-cloudforge commented May 4, 2026 •

edited

Loading

CLAassistant commented May 4, 2026 •

edited

Loading

PaddlePaddle-bot commented May 4, 2026 •

edited

Loading