feat(ROCm): Add BF16 support for conv kernels on HIP/ROCm by fchange · Pull Request #47 · ROCm/Paddle

fchange · 2026-04-13T10:12:13Z

Description

This PR adds bfloat16 (BF16) data type support for convolution kernels on AMD ROCm/HIP GPUs.

Problem

The PaddleOCR-VL model uses BF16 precision, but the native HIP/ROCm backend fails because conv kernels are not registered for BF16. This blocks running PaddleOCR-VL with the native backend on AMD GPUs.

Changes

1. paddle/phi/backends/gpu/rocm/miopen_desc.h

Added BFLOAT16 case to ToCudnnDataType() mapping to miopenBFloat16

2. paddle/phi/kernels/gpudnn/conv_kernel.cu

Registered phi::bfloat16 for conv2d kernel
Registered phi::bfloat16 for conv3d kernel
Registered phi::bfloat16 for depthwise_conv2d kernel

3. paddle/phi/kernels/gpudnn/conv_grad_kernel.cu

Registered phi::bfloat16 for conv2d_grad kernel
Registered phi::bfloat16 for conv3d_grad kernel
Registered phi::bfloat16 for conv2d_double_grad kernel
Registered phi::bfloat16 for conv3d_double_grad kernel
Registered phi::bfloat16 for depthwise_conv2d_double_grad kernel

4. test/legacy_test/test_hip_bf16_conv_kernel.py (new)

Added unit tests for BF16 conv2d forward and grouped conv on HIP

Motivation

This is a port of the same fix from PaddlePaddle/Paddle#78587 to the ROCm fork, enabling PaddleOCR-VL and other BF16 models to run on AMD ROCm GPUs using the native backend.

Testing

Added test_hip_bf16_conv_kernel.py with BF16 conv2d forward and grouped conv tests
Tests are gated behind core.is_compiled_with_rocm() check

cc: @PaddlePaddle/paddle-rocma

Register bfloat16 data type for conv2d, conv3d, depthwise_conv2d and their grad/double_grad kernels on HIP/ROCm platform. Changes: - Add BFLOAT16 case to ToCudnnDataType in miopen_desc.h - Register phi::bfloat16 for conv2d, conv3d, depthwise_conv2d kernels - Register phi::bfloat16 for conv2d_grad, conv3d_grad, conv2d_double_grad, conv3d_double_grad, depthwise_conv2d_double_grad kernels - Add test_hip_bf16_conv_kernel.py for BF16 conv validation This enables PaddleOCR-VL and other BF16 models to run on AMD ROCm GPUs using the native backend. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

Register bfloat16 for layer_norm and layer_norm_grad kernels on HIP. This is required for PaddleOCR-VL native backend which uses BF16 precision. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

fchange · 2026-04-13T11:10:00Z

fchange and others added 2 commits April 13, 2026 10:11

feat(ROCm): Add BF16 support for layer_norm kernels on HIP/ROCm

c086e92

Register bfloat16 for layer_norm and layer_norm_grad kernels on HIP. This is required for PaddleOCR-VL native backend which uses BF16 precision. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

M4jupitercannon merged commit 29d1c6f into ROCm:develop Apr 14, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ROCm): Add BF16 support for conv kernels on HIP/ROCm#47

feat(ROCm): Add BF16 support for conv kernels on HIP/ROCm#47
M4jupitercannon merged 2 commits intoROCm:developfrom
fchange:hip-bf16-conv-support

fchange commented Apr 13, 2026

Uh oh!

fchange commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fchange commented Apr 13, 2026

Description

Problem

Changes

Motivation

Testing

Uh oh!

fchange commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants