feat(ROCm): Add BF16 support for conv kernels on HIP/ROCm#47
Merged
M4jupitercannon merged 2 commits intoROCm:developfrom Apr 14, 2026
Merged
feat(ROCm): Add BF16 support for conv kernels on HIP/ROCm#47M4jupitercannon merged 2 commits intoROCm:developfrom
M4jupitercannon merged 2 commits intoROCm:developfrom
Conversation
Register bfloat16 data type for conv2d, conv3d, depthwise_conv2d and their grad/double_grad kernels on HIP/ROCm platform. Changes: - Add BFLOAT16 case to ToCudnnDataType in miopen_desc.h - Register phi::bfloat16 for conv2d, conv3d, depthwise_conv2d kernels - Register phi::bfloat16 for conv2d_grad, conv3d_grad, conv2d_double_grad, conv3d_double_grad, depthwise_conv2d_double_grad kernels - Add test_hip_bf16_conv_kernel.py for BF16 conv validation This enables PaddleOCR-VL and other BF16 models to run on AMD ROCm GPUs using the native backend. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Register bfloat16 for layer_norm and layer_norm_grad kernels on HIP. This is required for PaddleOCR-VL native backend which uses BF16 precision. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Description
This PR adds bfloat16 (BF16) data type support for convolution kernels on AMD ROCm/HIP GPUs.
Problem
The PaddleOCR-VL model uses BF16 precision, but the native HIP/ROCm backend fails because conv kernels are not registered for BF16. This blocks running PaddleOCR-VL with the native backend on AMD GPUs.
Changes
1.
paddle/phi/backends/gpu/rocm/miopen_desc.hBFLOAT16case toToCudnnDataType()mapping tomiopenBFloat162.
paddle/phi/kernels/gpudnn/conv_kernel.cuphi::bfloat16forconv2dkernelphi::bfloat16forconv3dkernelphi::bfloat16fordepthwise_conv2dkernel3.
paddle/phi/kernels/gpudnn/conv_grad_kernel.cuphi::bfloat16forconv2d_gradkernelphi::bfloat16forconv3d_gradkernelphi::bfloat16forconv2d_double_gradkernelphi::bfloat16forconv3d_double_gradkernelphi::bfloat16fordepthwise_conv2d_double_gradkernel4.
test/legacy_test/test_hip_bf16_conv_kernel.py(new)Motivation
This is a port of the same fix from PaddlePaddle/Paddle#78587 to the ROCm fork, enabling PaddleOCR-VL and other BF16 models to run on AMD ROCm GPUs using the native backend.
Testing
test_hip_bf16_conv_kernel.pywith BF16 conv2d forward and grouped conv testscore.is_compiled_with_rocm()checkcc: @PaddlePaddle/paddle-rocma