-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
needs-triagePRs or issues that need to be investigated by maintainers to find the right assignees to address itPRs or issues that need to be investigated by maintainers to find the right assignees to address ittype: bug
Description
Issue: [RISC-V RVV] sqrt operator shows poor vectorization performance
Description
The sqrt (square root) operator performs poorly with the RISC‑V Vector (RVV) extension, achieving only 0.385× the performance of the scalar implementation. This is unexpected for a mathematical function that should see significant benefits from vectorization.
Steps to Reproduce
- Generate the sqrt operator with the following configuration:
params = {
"dtype": "float32",
"batch": 14,
"channels": 23,
"input_height": 67,
"input_width": 99
}-
Export the operator to two targets:
- RV target (scalar, without vector extension):
llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c - RVV target (with vector extension):
llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c,+v
- RV target (scalar, without vector extension):
-
Run performance measurement on both targets.
Operator definition code:
def export_sqrt(params, set_dir=None, platform="rv"):
data = relay.var("data",
shape=(params["batch"], params["channels"],
params["input_height"], params["input_width"]),
dtype=params["dtype"])
sqrt_op = relay.sqrt(data)
export_op(sqrt_op, params["op_name"], [data], params, set_dir=set_dir)Performance Data
- RV execution time: 11.502000 ms
- RVV execution time: 29.906500 ms
- Acceleration ratio (RV/RVV): 0.385 (RVV is ~2.6× slower)
Environment Information
- TVM version: 0.19.0
- LLVM version: [Please provide:
llvm-config --version] - Hardware: Spacemit K1‑X bit‑brick board
- CPU: Spacemit X60 (8 cores, 1.6 GHz)
- ISA: rv64imafdcv (with vector extensions)
- Memory: 7.6 GB
- OS: Bianbu 2.2, Linux kernel 6.6.63
- Operation: Elementwise square root on ~1.7M elements
Expected Behavior
RVV vectorization should provide a performance improvement over the scalar RV baseline for mathematical functions like square root.
Additional Context
- The sqrt operation is applied elementwise to a tensor of ~1.7M elements.
- The performance regression (2.6× slower) suggests that the vectorized implementation of sqrt may be using suboptimal instructions or inefficient vector length management.
- This is part of a broader pattern where multiple mathematical operators (log, sqrt, etc.) show severe performance degradation with RVV, indicating a potential issue with vector intrinsic mapping or loop vectorization for transcendental functions.
Metadata
Metadata
Assignees
Labels
needs-triagePRs or issues that need to be investigated by maintainers to find the right assignees to address itPRs or issues that need to be investigated by maintainers to find the right assignees to address ittype: bug