Skip to content

Conversation

@xunyoyo
Copy link
Contributor

@xunyoyo xunyoyo commented Nov 15, 2025

Add unit tests for Triton fused MoE backends with stubs for GPU/operator functionality.

Motivation

NO.21 功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充

Modifications

add tests/model_executor/test_fused_moe_triton_backend.py

Usage or Command

tests/model_executor/test_fused_moe_triton_backend.py:

python -m coverage run -m unittest tests.model_executor.test_fused_moe_triton_backend \
&& python -m coverage report -m --include='fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py'

Accuracy Tests

tests/model_executor/test_fused_moe_triton_backend.py:

Name                                                               Stmts   Miss  Cover   Missing
------------------------------------------------------------------------------------------------
fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py     442     66    85%   38-39, 62, 207-208, 270, 309, 459, 503-
514, 586-648, 654-658, 691, 1151, 1197-1208, 1239-1254, 1405, 1408-1411, 1415-1429
------------------------------------------------------------------------------------------------
TOTAL                                                                442     66    85%

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Add unit tests for Triton fused MoE backends with stubs for GPU/operator functionality.
Copilot AI review requested due to automatic review settings November 15, 2025 11:58
@paddle-bot
Copy link

paddle-bot bot commented Nov 15, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Nov 15, 2025
Copilot finished reviewing on behalf of xunyoyo November 15, 2025 12:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive unit tests for the Triton fused MoE backend module (fused_moe_triton_backend.py), achieving 85% code coverage. The tests use lightweight stubs and mocks to simulate GPU operations and Triton kernels, enabling testing without actual CUDA hardware.

Key changes:

  • Implemented stub/mock framework for GPU operations and Triton kernels
  • Added tests for four quantization methods: weight-only (wint8), wfp8afp8, tensor-wise FP8, and block-wise FP8
  • Covered weight creation, loading, processing, and inference execution paths

Comment on lines +1 to +9
"""Unit tests for the Triton fused MoE backends.

These tests install lightweight GPU/operator stubs so the real
``fastdeploy.model_executor.layers.moe.fused_moe_triton_backend`` module can be
imported and exercised without CUDA kernels. The suites cover the weight-only,
wfp8afp8, tensor-wise fp8, and block-wise fp8 quantization helpers to ensure the
most important control-flow branches are validated while keeping the numerics
deterministic and CPU friendly.
"""
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test file is missing the standard Apache 2.0 copyright header that is consistently used across the project. Please add the copyright header at the beginning of the file (before the module docstring) following this format:\n\npython\n# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n# http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n

Copilot uses AI. Check for mistakes.
def __init__(self):
self.calls: list[dict] = []

def __getitem__(self, grid): # noqa: D401 - behavior mirrors kernel launch
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The noqa comment references 'D401' which is typically for imperative mood in docstrings, but this method has no docstring. Either add a docstring describing what this method does (e.g., 'Mock kernel launch behavior by recording call parameters.') or remove the noqa comment as it serves no purpose without a docstring.

Suggested change
def __getitem__(self, grid): # noqa: D401 - behavior mirrors kernel launch
def __getitem__(self, grid): # noqa: D401 - behavior mirrors kernel launch
"""Mock kernel launch behavior by recording call parameters."""

Copilot uses AI. Check for mistakes.
is_checkpoint_bf16: bool = False
weight_block_size: tuple[int, int] = (2, 2)

def name(self): # noqa: D401 - mimic FastDeploy quant config API
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The noqa comment references 'D401' but the method lacks a docstring. Either add a proper docstring (e.g., 'Return the quantization configuration name.') or remove the noqa comment.

Suggested change
def name(self): # noqa: D401 - mimic FastDeploy quant config API
def name(self): # noqa: D401 - mimic FastDeploy quant config API
"""Return the quantization configuration name."""

Copilot uses AI. Check for mistakes.
super().__init__()
self.num_experts = num_experts

def forward(self, x): # noqa: D401 - deterministic gating scores
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The noqa comment references 'D401' but there is no docstring. Add a docstring describing the method's behavior (e.g., 'Generate deterministic gating scores for testing.') or remove the unnecessary noqa comment.

Suggested change
def forward(self, x): # noqa: D401 - deterministic gating scores
def forward(self, x): # noqa: D401 - deterministic gating scores
"""Generate deterministic gating scores for testing."""

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants