Skip to content

feat(ROCm): Add BF16 support for conv kernels on HIP/ROCm#47

Merged
M4jupitercannon merged 2 commits intoROCm:developfrom
fchange:hip-bf16-conv-support
Apr 14, 2026
Merged

feat(ROCm): Add BF16 support for conv kernels on HIP/ROCm#47
M4jupitercannon merged 2 commits intoROCm:developfrom
fchange:hip-bf16-conv-support

Conversation

@fchange
Copy link
Copy Markdown

@fchange fchange commented Apr 13, 2026

Description

This PR adds bfloat16 (BF16) data type support for convolution kernels on AMD ROCm/HIP GPUs.

Problem

The PaddleOCR-VL model uses BF16 precision, but the native HIP/ROCm backend fails because conv kernels are not registered for BF16. This blocks running PaddleOCR-VL with the native backend on AMD GPUs.

Changes

1. paddle/phi/backends/gpu/rocm/miopen_desc.h

  • Added BFLOAT16 case to ToCudnnDataType() mapping to miopenBFloat16

2. paddle/phi/kernels/gpudnn/conv_kernel.cu

  • Registered phi::bfloat16 for conv2d kernel
  • Registered phi::bfloat16 for conv3d kernel
  • Registered phi::bfloat16 for depthwise_conv2d kernel

3. paddle/phi/kernels/gpudnn/conv_grad_kernel.cu

  • Registered phi::bfloat16 for conv2d_grad kernel
  • Registered phi::bfloat16 for conv3d_grad kernel
  • Registered phi::bfloat16 for conv2d_double_grad kernel
  • Registered phi::bfloat16 for conv3d_double_grad kernel
  • Registered phi::bfloat16 for depthwise_conv2d_double_grad kernel

4. test/legacy_test/test_hip_bf16_conv_kernel.py (new)

  • Added unit tests for BF16 conv2d forward and grouped conv on HIP

Motivation

This is a port of the same fix from PaddlePaddle/Paddle#78587 to the ROCm fork, enabling PaddleOCR-VL and other BF16 models to run on AMD ROCm GPUs using the native backend.

Testing

  • Added test_hip_bf16_conv_kernel.py with BF16 conv2d forward and grouped conv tests
  • Tests are gated behind core.is_compiled_with_rocm() check

cc: @PaddlePaddle/paddle-rocma

fchange and others added 2 commits April 13, 2026 10:11
Register bfloat16 data type for conv2d, conv3d, depthwise_conv2d
and their grad/double_grad kernels on HIP/ROCm platform.

Changes:
- Add BFLOAT16 case to ToCudnnDataType in miopen_desc.h
- Register phi::bfloat16 for conv2d, conv3d, depthwise_conv2d kernels
- Register phi::bfloat16 for conv2d_grad, conv3d_grad, conv2d_double_grad,
  conv3d_double_grad, depthwise_conv2d_double_grad kernels
- Add test_hip_bf16_conv_kernel.py for BF16 conv validation

This enables PaddleOCR-VL and other BF16 models to run on AMD ROCm GPUs
using the native backend.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Register bfloat16 for layer_norm and layer_norm_grad kernels on HIP.
This is required for PaddleOCR-VL native backend which uses BF16 precision.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
@fchange
Copy link
Copy Markdown
Author

fchange commented Apr 13, 2026

image image

@M4jupitercannon M4jupitercannon merged commit 29d1c6f into ROCm:develop Apr 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants