Update version#13
Merged
Merged
Conversation
9c5aeab to
9dc4281
Compare
- enable paddle torch proxy in conftest via paddle.enable_compat(scope={"flashinfer"})
- in tests/attention/test_attention_sink_blackwell.py: prepend paddle.enable_compat(),
replace torch.manual_seed with paddle.seed, replace torch.testing.assert_close with
numpy.testing.assert_allclose, parametrize to a minimal shape for quick verification
- flashinfer/utils.py: access TorchVersion via torch.torch_version proxy with fallback
for paddle compat where paddle.torch_version is not exposed
- flashinfer/cute_dsl/fp4_common.py: add "from __future__ import annotations" to
defer evaluation of "int | torch.device | str | None" annotation which fails under
paddle proxy (torch.device is a CallableProxyModule, not a type)
adapt prefill trtllm paged attention for paddle compat
- flashinfer/prefill.py: convert workspace_size (tensor scalar from numel()*element_size())
to Python int via .item() before passing to the tvm_ffi C++ kernel, which expects int
but receives ffi.Tensor under paddle (doc item PFCCLab#11)
- tests/conftest.py: revert paddle.enable_compat() to global scope so that `import torch`
at conftest module level (outside flashinfer scope) also resolves via the proxy
paddle compat: decode workspace_size .item(), moe fp8 index via int8 view, autotuner shape tuple, moe test
support allreduce fusion
dist.group.WORLD compat
modify readme
modify format
fix env issue
fix some issue
paddle compat: fix dtype.itemsize + expand trtllm_allreduce_fusion test
- flashinfer/comm/trtllm_ar.py: paddle.dtype has no `itemsize`; add
_DTYPE_SIZE_MAP + _dtype_itemsize() fallback used in _should_use_oneshot
(fixes AttributeError when use_oneshot=None triggers the heuristic).
- tests/comm/test_trtllm_allreduce_fusion.py: restore full parametrize
scope (patterns/layouts/pdls/oneshots/trigger/fp32_acc); drop leftover
[DBG] prints; guard `if __name__ == "__main__"` block so mp-spawn
children do not re-enter it under pytest (was double-initializing
paddle TCPStore and SIGABRT in libuv).
Verified: pytest tests/comm/test_trtllm_allreduce_fusion.py::test_trtllm_allreduce_fusion[True-1024-dtype0-2] and [False-1024-dtype0-2] both pass on 2xGPU.
add adaptation paddle skill
paddle compat: revert over-adaptation in test_trtllm_gen_fused_moe
`torch.cuda.get_device_capability`, `tensor.device`, and `tensor.to(device)`
are fully aligned under `paddle.enable_compat()`. Revert the earlier
paddle-specific detours (`torch.device.cuda.get_device_capability`,
`paddle.device(x.place)`, `paddle.get_device()`) back to plain torch APIs.
Also record the finding in adaptation-paddle skill (§10, items 31-34) as a
"do-not-over-adapt" reference for future MoE test reviews.
Verified: `pytest tests/moe/test_trtllm_gen_fused_moe.py -k test_moe_quantization_classes`
passes (1 passed).
paddle compat: restore test_trtllm_gen_fused_moe to upstream + minimal patches
The previous adaptation commented out / trimmed ~1800 lines from upstream,
making future rebases painful and dropping valid test coverage. Reset the
file to exact upstream content (github.com/flashinfer-ai/flashinfer main)
and keep only the minimum compat patches needed to run on paddle:
test file patches:
- add `import paddle; paddle.enable_compat()` at top
- `block.aminmax()` -> `block.float().aminmax()` (paddle missing bf16 kernel)
- fp8 slice assign via `.view(torch.int8)` on both sides (paddle missing fp8 set_value kernel)
- `expertLogits.cpu()` -> `.cpu().float()` (paddle missing cpu-bf16 topk)
- `torch.random.manual_seed` -> `torch.manual_seed` (paddle.random lacks manual_seed)
- `torch.device(device="cuda")` -> `torch.device("cuda")` (paddle Device rejects kwarg)
same `torch.device(...)` kwarg fix in tests/moe/utils.py.
library patch (flashinfer/autotuner.py):
- `torch.cuda.OutOfMemoryError` missing under paddle. Use a sentinel placeholder
class (NOT `RuntimeError` - that would silently swallow real kernel errors).
Verified: `pytest test_trtllm_gen_fused_moe.py::test_fp8_block_scale_routed_activation_type_relu2_smoke`
passes. Larger parametrized cases still need library-side fixes (e.g.
`core.py::_init_packed_topk_ids` bitwise_or dtype mismatch).
Docs (skills/adaptation-paddle): record new patches 31-36 and the
"do-not-trim-upstream" lesson.
paddle compat: fix bitwise_or dtype mismatch in _init_packed_topk_ids
torch implicitly promotes int16->int32 in `(expert_ids << 16) | expert_weights`.
Paddle's bitwise_or does not, so it raises
ValueError: The type of data we are trying to retrieve (int16) does not
match the type of data (int32)
Explicitly .to(torch.int32) after .view(torch.int16). Works on both backends.
With this fix, routing-family tests (renormalize/sigmoid/deepseekv3/topk/
llama4/dyn_block/tier_1024/deepseek_ngroup1/routing_dtype_flexibility) all
progress past the dtype check. Remaining failures on this machine are
infrastructure (cubin artifactory unreachable), not paddle-compat.
modify skill
fix some issues
paddle compat: test_fused_rmsnorm_silu zero-patch adaptation
tests/norm/test_fused_rmsnorm_silu.py runs under paddle.enable_compat()
with no source changes (conftest.py already enables compat). Full run:
102 passed, 50 skipped (all skips due to torch.float4_e2m1fn_x2 missing
from paddle torch-proxy, not a kernel adaptation issue).
- adp_test.md: add row 18 recording PASS 102/152
- adaptation_exp.md: add section XI (flashinfer-ai#37-39) documenting zero-patch
result, rationale, reproduction command, and the methodology
recommendation (bare-run first, consult adaptation table only on
failure).
fix format
fix some issue
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📌 Description
升级版本到0.6.11
当前适配跑通:
tests/attention/test_attention_sink_blackwell.py -k test_blackwell_trtllm_gen_context_attention_sinktests/attention/test_attention_sink_blackwell.py -k test_blackwell_trtllm_gen_decode_attention_sinktests/moe/test_trtllm_gen_fused_moe.py::test_fp8_block_scale_routed_activation_type_relu2_smoketests/comm/test_trtllm_allreduce_fusion.py::test_trtllm_allreduce_fusion[True-1024-dtype0-2]test_trtllm_gen_fused_moe.py::test_renormalize_routing[...FP8_Block_DeepSeek-1024-1024-8-RandomHiddenStates]test_trtllm_gen_fused_moe.py::test_sigmoid_routing[...FP8_Block_DeepSeek-1024-1024-8]test_trtllm_gen_fused_moe.py::test_dyn_block_kernel_routing[...FP8_Block_DeepSeek...]test_trtllm_gen_fused_moe.py::test_tier_1024_experts_routing[...FP8_Block_DeepSeek...]test_trtllm_gen_fused_moe.py::test_deepseek_ngroup1_block_per_token_routing[...FP8_Block_DeepSeek...]test_trtllm_gen_fused_moe.py::test_routing_dtype_flexibility[...FP8_Block_DeepSeek...]test_trtllm_gen_fused_moe.py::test_mxfp8_block_scale_moe_relu2_non_gated[...Shuffled E32_K4]test_trtllm_gen_fused_moe.py::test_mxfp8_block_scale_moe_relu2_deepseekv3_topk22test_trtllm_gen_fused_moe.py::test_fp8_block_scale_autotune_valid_configs[...MxFp8_Relu2]test_trtllm_gen_fused_moe.py::test_fp8_per_tensor_autotune_valid_configs_nonefp8[...PerTensor_Swiglu]test_trtllm_gen_fused_moe.py::test_llama4_routing[...FP8_Tensor-1024-1024-8]test_trtllm_gen_fused_moe.py::test_deepseekv3_routingtest_trtllm_gen_fused_moe.py::test_nvfp4_moe_gemm_biastests/norm/test_fused_rmsnorm_silu.pytorch.float4_e2m1fn_x2未暴露,与 paddle 适配无关)🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes