Skip to content

Conversation

@Manfredss
Copy link
Contributor

PR Category

User Experience

PR Types

New features

Description

This PR implements the addcmul operator for PaddlePaddle, which performs element-wise multiplication of two tensors, multiplies the result by a scalar value, and adds it to an input tensor.

Formula: output = input + value * tensor1 * tensor2

This operator provides users with a convenient operation for combined multiply-add computations.

Implementation Details

Core Components

  1. C++ Kernels (paddle/phi/kernels/)

    • Forward kernels: addcmul_kernel.h and implementations for CPU/GPU
    • Backward kernels: addcmul_grad_kernel.h and implementations for CPU/GPU
    • Implementation files in impl/ directory with templated functions for different ranks (0-6D)
  2. Operator Configuration (paddle/phi/ops/yaml/)

    • Added addcmul operator definition in ops.yaml
    • Added addcmul_grad backward operator in backward.yaml
    • Configured with proper infer_meta and kernel functions
  3. Shape Inference (paddle/phi/infermeta/)

    • Implemented AddcmulInferMeta in ternary.cc/h
    • Handles broadcasting for three input tensors
    • Validates dimension compatibility
  4. PIR Support (paddle/fluid/pir/dialect/operator/interface/infer_symbolic_shape/)

    • Added AddcmulOpInferSymbolicShape for new IR system
    • Handles symbolic shape inference with broadcasting
  5. Python API (python/paddle/tensor/)

    • Added paddle.addcmul() function in math.py
    • Registered Tensor.addcmul() method in __init__.py
    • Supports both dynamic and static graph modes
  6. Testing (test/legacy_test/)

    • Comprehensive test suite with 52 test cases
    • Tests multiple data types: float16, float32, float64, bfloat16
    • Tests various tensor shapes and broadcasting scenarios
    • Tests gradient computation for all inputs
    • Tests zero-size tensors and error conditions
    • Tests both OpTest framework and high-level API
  7. Configuration (test/white_list/)

    • Added addcmul to FP64 gradient threshold whitelist

Features

  • Multi-device support: CPU and GPU (CUDA)
  • Multiple data types: float16, float32, float64, bfloat16
  • Broadcasting: Full NumPy-style broadcasting support
  • Gradient support: Automatic differentiation for all three inputs
  • Tensor dimensions: Supports 0D to 6D tensors
  • API compatibility: Similar interface to PyTorch's torch.addcmul
  • Zero-size tensors: Properly handles edge cases

Testing Results

All 52 tests pass successfully:

(paddle) D:\Xue\ML\Paddle\PaddleDebug>python test/legacy_test/test_addcmul.py
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0113 14:12:09.673414 20628 gpu_resources.cc:116] Please NOTE: device: 0, GPU Compute Capability: 12.0, Driver API Version: 13.1, Runtime API Version: 12.9
....I0113 14:12:09.695261 20628 pir_interpreter.cc:1529] New Executor is Running ...
I0113 14:12:09.695261 20628 pir_interpreter.cc:1552] pir interpreter is running by multi-thread mode ...
..I0113 14:12:09.702877 20628 program_interpreter.cc:255] New Executor is Running.
I0113 14:12:09.704878 20628 interpreter_util.cc:624] Standalone Executor is Used.
W0113 14:12:09.744876 20628 eager_utils.cc:3584] Paddle static graph(PIR) not support input out tensor for now!!!!!
C:\Users\***\anaconda3\envs\paddle\Lib\site-packages\paddle\pir\math_op_patch.py:241: UserWarning: Tensor do not have 'place' interface for pir graph mode, try not to use it. None will be returned.
  warnings.warn(
..............................................
----------------------------------------------------------------------
Ran 52 tests in 16.889s

OK

Test coverage includes:

  • Basic functionality with various shapes (1D, 2D, 3D, large tensors)
  • Different value parameters (positive, negative, default)
  • Multiple data types (FP16, FP32, FP64, BF16)
  • Broadcasting scenarios
  • Gradient checks for all inputs
  • Zero-size tensor edge cases
  • Error handling for invalid inputs
  • Both static and dynamic graph modes
  • Tensor method (tensor.addcmul())

API Examples

Dynamic Graph Mode

import paddle

input = paddle.ones([2, 2])
tensor1 = paddle.ones([2, 2]) * 2
tensor2 = paddle.ones([2, 2]) * 3

# Using function API
out = paddle.addcmul(input, tensor1, tensor2, value=0.5)
# Result: [[4., 4.], [4., 4.]]

# Using tensor method
out = input.addcmul(tensor1, tensor2, value=0.5)

Static Graph Mode

import paddle

paddle.enable_static()
input = paddle.static.data('input', shape=[2, 2], dtype='float32')
tensor1 = paddle.static.data('tensor1', shape=[2, 2], dtype='float32')
tensor2 = paddle.static.data('tensor2', shape=[2, 2], dtype='float32')
out = paddle.addcmul(input, tensor1, tensor2, value=0.5)

Broadcasting

input = paddle.ones([3, 4])
tensor1 = paddle.randn([1, 4])
tensor2 = paddle.randn([3, 1])
out = paddle.addcmul(input, tensor1, tensor2, value=2.0)

Backward Compatibility

This PR adds new functionality without modifying existing APIs or behaviors. It is fully backward compatible.

Checklist

  • Implemented forward and backward kernels
  • Added operator YAML configurations
  • Implemented shape inference (InferMeta)
  • Added PIR symbolic shape inference
  • Created Python API wrapper
  • Registered tensor method
  • Added comprehensive test suite
  • All tests passing (52/52)
  • Added to gradient threshold whitelist
  • Code follows PaddlePaddle style guidelines
  • All comments in English
  • No linter errors

Related Issues

【启航计划】PaddlePaddle API兼容性增强 No.354

Additional Notes

  • The operator uses Eigen for efficient computation with automatic vectorization
  • Mixed precision computation is handled via MPTypeTrait for numerical stability
  • Broadcasting follows NumPy semantics
  • Gradient computation is mathematically verified and tested

Files Changed

New Files (9):

paddle/phi/kernels/addcmul_kernel.h
paddle/phi/kernels/addcmul_grad_kernel.h
paddle/phi/kernels/impl/addcmul_kernel_impl.h
paddle/phi/kernels/impl/addcmul_grad_kernel_impl.h
paddle/phi/kernels/cpu/addcmul_kernel.cc
paddle/phi/kernels/cpu/addcmul_grad_kernel.cc
paddle/phi/kernels/gpu/addcmul_kernel.cu
paddle/phi/kernels/gpu/addcmul_grad_kernel.cu
test/legacy_test/test_addcmul.py

Modified Files (9):

paddle/phi/ops/yaml/ops.yaml
paddle/phi/ops/yaml/backward.yaml
paddle/phi/infermeta/ternary.h
paddle/phi/infermeta/ternary.cc
paddle/fluid/pir/dialect/operator/interface/infer_symbolic_shape/multiary_infer_sym.h
paddle/fluid/pir/dialect/operator/interface/infer_symbolic_shape/multiary_infer_sym.cc
python/paddle/tensor/__init__.py
python/paddle/tensor/math.py
test/white_list/op_threshold_white_list.py

…rnels for CPU/GPU (fp16, fp32, fp64, bf16) - Add operator configuration in ops.yaml and backward.yaml - Implement AddcmulInferMeta for shape inference - Add PIR symbolic shape inference support - Add Python API: paddle.addcmul() and Tensor.addcmul() - Add comprehensive test suite (52 tests, all passing) - Add to FP64 gradient threshold whitelist - Formula: output = input + value * tensor1 * tensor2 - Supports broadcasting and multiple dtypes.
@paddle-bot
Copy link

paddle-bot bot commented Jan 13, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看下覆盖率,确保都能测到。

同时提前进行下PaConvert测试,确保与torch计算结果一致。截图下PaConvert的case计算结果。

return _C_ops.addmm_(input, x, y, beta, alpha)


def addcmul(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增API直接采取C++下沉的方法吧,这个可以不加

@Manfredss
Copy link
Contributor Author

/re-run all-failed

Copy link
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个PR看怎么减小下大小


add_doc_and_signature(
"i1",
"addcmul",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要删掉别的,改完后自己先check下所有改动是否符合预期

return _C_ops.addmm_(input, x, y, beta, alpha)


# def addcmul(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个PR压缩下行数,这些删除掉

@codecov-commenter
Copy link

codecov-commenter commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 35.27132% with 167 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@d5a87b6). Learn more about missing BASE report.

Files with missing lines Patch % Lines
paddle/phi/kernels/impl/addcmul_grad_kernel_impl.h 0.00% 129 Missing ⚠️
paddle/phi/kernels/impl/addcmul_kernel_impl.h 50.00% 30 Missing ⚠️
...terface/infer_symbolic_shape/multiary_infer_sym.cc 84.61% 6 Missing ⚠️
paddle/phi/kernels/funcs/common_shape.h 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #77333   +/-   ##
==========================================
  Coverage           ?   35.27%           
==========================================
  Files              ?        8           
  Lines              ?      258           
  Branches           ?        0           
==========================================
  Hits               ?       91           
  Misses             ?      167           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Manfredss
Copy link
Contributor Author

/re-run all-failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants