[Intel HPU] fix memory fragmentation issue and fix moe all_reduce issue by fmiao2372 · Pull Request #5357 · PaddlePaddle/FastDeploy

fmiao2372 · 2025-12-03T07:52:37Z

Motivation

Fix memory fragmentation issue and moe all_reduce issue

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

update warmup sequence and moe all_reduce for HPU according to PR #5247

Usage or Command

no command usage change

Accuracy Tests

ERNIE-4.5-21B-A3B-Paddle
100%|██████████| 1319/1319 [09:16<00:00, 2.37it/s]
Accuracy: 0.917
Invalid: 0.001
Latency: 556.834 s

Checklist

[Done] Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
[Done] Format your code, run pre-commit before commit.
[Done] Add unit tests. Please write the reason in this PR if no unit tests.
[Done] Provide accuracy results.
conducted by local tests
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot

Pull request overview

This PR addresses memory fragmentation and MoE (Mixture of Experts) all_reduce issues for Intel HPU by modifying warmup sequence ordering and consolidating MoE all_reduce logic.

Key Changes:

Reversed warmup sequence iteration order (batch sizes and sequence lengths) to improve memory allocation patterns
Moved HPU-specific MoE all_reduce logic from backend-specific files to common MoE layer for better maintainability

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`fastdeploy/worker/hpu_model_runner.py`	Reversed iteration order for prefill batches, prefill lengths, decode batches, and decode block numbers during warmup to reduce memory fragmentation
`fastdeploy/model_executor/layers/moe/moe.py`	Added platform-specific all_reduce handling for Intel HPU by importing `tensor_model_parallel_all_reduce_custom` and conditionally using it in the forward method
`fastdeploy/model_executor/layers/backends/intel_hpu/moe/fused_moe_hpu_backend.py`	Removed duplicate all_reduce calls and unused import, moving the logic to the common MoE layer

fastdeploy/worker/hpu_model_runner.py

fastdeploy/model_executor/layers/moe/moe.py

codecov-commenter · 2025-12-03T09:43:41Z

Codecov Report

❌ Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@be0c960). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/model_executor/layers/moe/moe.py	50.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5357   +/-   ##
==========================================
  Coverage           ?   59.44%           
==========================================
  Files              ?      325           
  Lines              ?    40261           
  Branches           ?     6093           
==========================================
  Hits               ?    23934           
  Misses             ?    14428           
  Partials           ?     1899

Flag	Coverage Δ
GPU	`59.44% <50.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…fix moe all_reduce issue

zoooo0820

LGTM

…fix moe all_reduce issue (PaddlePaddle#5357)

Copilot AI review requested due to automatic review settings December 3, 2025 07:52

Copilot AI reviewed Dec 3, 2025

View reviewed changes

fastdeploy/worker/hpu_model_runner.py Show resolved Hide resolved

fastdeploy/model_executor/layers/moe/moe.py Show resolved Hide resolved

[Intel HPU] fix memory fragmentation issue due to warmup process and …

e707784

…fix moe all_reduce issue

fmiao2372 force-pushed the develop_hpu_fixbug branch from 5bc4f12 to e707784 Compare December 4, 2025 01:34

zoooo0820 approved these changes Dec 4, 2025

View reviewed changes

YuanRisheng added the skip-ci: coverage label Dec 4, 2025

EmmonsCurse approved these changes Dec 4, 2025

View reviewed changes

EmmonsCurse merged commit 209006e into PaddlePaddle:develop Dec 4, 2025
13 of 17 checks passed

fmiao2372 deleted the develop_hpu_fixbug branch December 5, 2025 01:14

liyonghua0910 pushed a commit to liyonghua0910/FastDeploy that referenced this pull request Dec 5, 2025

[Intel HPU] fix memory fragmentation issue due to warmup process and …

e245bee

…fix moe all_reduce issue (PaddlePaddle#5357)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Intel HPU] fix memory fragmentation issue and fix moe all_reduce issue#5357

[Intel HPU] fix memory fragmentation issue and fix moe all_reduce issue#5357
EmmonsCurse merged 1 commit intoPaddlePaddle:developfrom
fmiao2372:develop_hpu_fixbug

fmiao2372 commented Dec 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Dec 3, 2025 •

edited

Loading

Uh oh!

zoooo0820 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

fmiao2372 commented Dec 3, 2025

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zoooo0820 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented Dec 3, 2025 •

edited

Loading