Skip to content

opt(RVV): Optimize CV color conversion functions with intrinsics#4079

Merged
wangzhaode merged 2 commits intoalibaba:masterfrom
ihb2032:opt/rvv-c3-nv21
Dec 26, 2025
Merged

opt(RVV): Optimize CV color conversion functions with intrinsics#4079
wangzhaode merged 2 commits intoalibaba:masterfrom
ihb2032:opt/rvv-c3-nv21

Conversation

@ihb2032
Copy link
Contributor

@ihb2032 ihb2032 commented Dec 23, 2025

Summary

Optimize MNNC3 and MNNNV21 related color conversion functions using RVV intrinsics, including C3 to YUV/XYZ/HSV/BGR555/BGR565 and NV21 to RGBA/RGB/BGRA/BGR.

Environment

  • Platform: Banana PI BPI-F3
  • OS: EulixOS 3.0

Benchmark

Click to expand full test logs
root@yangwang:~/mnntest# ./test_c3_to_yuv
number=1
Scalar time: 0.000003 sec
RVV time    : 0.000022 sec
Speedup     : 0.13x
Test number=1: PASSED
number=3
Scalar time: 0.000004 sec
RVV time    : 0.000007 sec
Speedup     : 0.57x
Test number=3: PASSED
number=8
Scalar time: 0.000005 sec
RVV time    : 0.000006 sec
Speedup     : 0.84x
Test number=8: PASSED
number=33
Scalar time: 0.000019 sec
RVV time    : 0.000005 sec
Speedup     : 3.81x
Test number=33: PASSED
number=100
Scalar time: 0.000051 sec
RVV time    : 0.000009 sec
Speedup     : 5.76x
Test number=100: PASSED
number=1024
Scalar time: 0.000675 sec
RVV time    : 0.000071 sec
Speedup     : 9.50x
Test number=1024: PASSED
number=65536
Scalar time: 0.033821 sec
RVV time    : 0.004405 sec
Speedup     : 7.68x
Test number=65536: PASSED
number=1000000
Scalar time: 0.517025 sec
RVV time    : 0.066878 sec
Speedup     : 7.73x
Test number=1000000: PASSED
number=10000000
Scalar time: 1.987096 sec
RVV time    : 0.262969 sec
Speedup     : 7.56x
Test number=10000000: PASSED
number=50000000
Scalar time: 9.916534 sec
RVV time    : 1.314206 sec
Speedup     : 7.55x
Test number=50000000: PASSED

All tests PASSED 
root@yangwang:~# ./test_c3_to_xyz
number=1
Scalar time: 0.000001 sec
RVV time    : 0.000012 sec
Speedup     : 0.08x
Test number=1: PASSED
number=3
Scalar time: 0.000001 sec
RVV time    : 0.000002 sec
Speedup     : 0.50x
Test number=3: PASSED
number=8
Scalar time: 0.000001 sec
RVV time    : 0.000002 sec
Speedup     : 0.62x
Test number=8: PASSED
number=33
Scalar time: 0.000004 sec
RVV time    : 0.000002 sec
Speedup     : 2.12x
Test number=33: PASSED
number=100
Scalar time: 0.000012 sec
RVV time    : 0.000004 sec
Speedup     : 2.94x
Test number=100: PASSED
number=1024
Scalar time: 0.000122 sec
RVV time    : 0.000027 sec
Speedup     : 4.53x
Test number=1024: PASSED
number=65536
Scalar time: 0.007678 sec
RVV time    : 0.001646 sec
Speedup     : 4.67x
Test number=65536: PASSED
number=1000000
Scalar time: 0.118545 sec
RVV time    : 0.024820 sec
Speedup     : 4.78x
Test number=1000000: PASSED
number=10000000
Scalar time: 1.179671 sec
RVV time    : 0.249469 sec
Speedup     : 4.73x
Test number=10000000: PASSED
number=50000000
Scalar time: 5.890856 sec
RVV time    : 1.239900 sec
Speedup     : 4.75x
Test number=50000000: PASSED

All tests PASSED 
[root@EulixOS ~]# ./test_c3_to_bgr555
[RGB] count=1 | Scalar: 0.000001 s | RVV: 0.000017 s | Speedup: 0.06x
[BGR] count=1 | Scalar: 0.000000 s | RVV: 0.000001 s | Speedup: 0.00x
Test size=1 PASSED

[RGB] count=3 | Scalar: 0.000000 s | RVV: 0.000001 s | Speedup: 0.00x
[BGR] count=3 | Scalar: 0.000001 s | RVV: 0.000001 s | Speedup: 0.80x
Test size=3 PASSED

[RGB] count=8 | Scalar: 0.000000 s | RVV: 0.000001 s | Speedup: 0.00x
[BGR] count=8 | Scalar: 0.000001 s | RVV: 0.000001 s | Speedup: 1.25x
Test size=8 PASSED

[RGB] count=15 | Scalar: 0.000001 s | RVV: 0.000001 s | Speedup: 1.00x
[BGR] count=15 | Scalar: 0.000001 s | RVV: 0.000001 s | Speedup: 1.00x
Test size=15 PASSED

[RGB] count=31 | Scalar: 0.000001 s | RVV: 0.000001 s | Speedup: 1.00x
[BGR] count=31 | Scalar: 0.000002 s | RVV: 0.000001 s | Speedup: 2.00x
Test size=31 PASSED

[RGB] count=32 | Scalar: 0.000001 s | RVV: 0.000001 s | Speedup: 1.00x
[BGR] count=32 | Scalar: 0.000002 s | RVV: 0.000001 s | Speedup: 1.60x
Test size=32 PASSED

[RGB] count=33 | Scalar: 0.000001 s | RVV: 0.000001 s | Speedup: 1.00x
[BGR] count=33 | Scalar: 0.000002 s | RVV: 0.000001 s | Speedup: 2.00x
Test size=33 PASSED

[RGB] count=100 | Scalar: 0.000004 s | RVV: 0.000001 s | Speedup: 4.25x
[BGR] count=100 | Scalar: 0.000005 s | RVV: 0.000001 s | Speedup: 4.20x
Test size=100 PASSED

[RGB] count=1024 | Scalar: 0.000037 s | RVV: 0.000007 s | Speedup: 5.17x
[BGR] count=1024 | Scalar: 0.000056 s | RVV: 0.000006 s | Speedup: 9.40x
Test size=1024 PASSED

[RGB] count=65536 | Scalar: 0.002423 s | RVV: 0.000417 s | Speedup: 5.81x
[BGR] count=65536 | Scalar: 0.003773 s | RVV: 0.000398 s | Speedup: 9.48x
Test size=65536 PASSED

[RGB] count=1000000 | Scalar: 0.037708 s | RVV: 0.006110 s | Speedup: 6.17x
[BGR] count=1000000 | Scalar: 0.054990 s | RVV: 0.006060 s | Speedup: 9.07x
Test size=1000000 PASSED

[RGB] count=10000000 | Scalar: 0.368202 s | RVV: 0.061179 s | Speedup: 6.02x
[BGR] count=10000000 | Scalar: 0.544550 s | RVV: 0.060702 s | Speedup: 8.97x
Test size=10000000 PASSED

[RGB] count=50000000 | Scalar: 1.830795 s | RVV: 0.304496 s | Speedup: 6.01x
[BGR] count=50000000 | Scalar: 2.722448 s | RVV: 0.303552 s | Speedup: 8.97x
Test size=50000000 PASSED

Final Result: ALL TESTS PASSED 
[root@EulixOS ~]# ./test_c3_to_hsv
number=1
Scalar time: 0.000004 sec
RVV time    : 0.000029 sec
Speedup     : 0.14x
Test number=1: PASSED
number=3
Scalar time: 0.000002 sec
RVV time    : 0.000003 sec
Speedup     : 0.75x
Test number=3: PASSED
number=8
Scalar time: 0.000004 sec
RVV time    : 0.000003 sec
Speedup     : 1.42x
Test number=8: PASSED
number=33
Scalar time: 0.000016 sec
RVV time    : 0.000002 sec
Speedup     : 8.38x
Test number=33: PASSED
number=100
Scalar time: 0.000045 sec
RVV time    : 0.000005 sec
Speedup     : 9.00x
Test number=100: PASSED
number=1024
Scalar time: 0.000457 sec
RVV time    : 0.000038 sec
Speedup     : 12.06x
Test number=1024: PASSED
number=65536
Scalar time: 0.028634 sec
RVV time    : 0.002295 sec
Speedup     : 12.48x
Test number=65536: PASSED
number=1000000
Scalar time: 0.432992 sec
RVV time    : 0.034932 sec
Speedup     : 12.40x
Test number=1000000: PASSED
number=10000000
Scalar time: 4.342924 sec
RVV time    : 0.348972 sec
Speedup     : 12.44x
Test number=10000000: PASSED
number=50000000
Scalar time: 21.713869 sec
RVV time    : 1.750756 sec
Speedup     : 12.40x
Test number=50000000: PASSED

All tests PASSED 
[root@EulixOS ~]# ./test_c3_to_bgr565
[RGB] count=1 | Scalar: 0.000000 s | RVV: 0.000019 s | Speedup: 0.00x
[BGR] count=1 | Scalar: 0.000000 s | RVV: 0.000001 s | Speedup: 0.00x
Test size=1 PASSED

[RGB] count=3 | Scalar: 0.000001 s | RVV: 0.000001 s | Speedup: 1.25x
[BGR] count=3 | Scalar: 0.000001 s | RVV: 0.000001 s | Speedup: 1.00x
Test size=3 PASSED

[RGB] count=8 | Scalar: 0.000000 s | RVV: 0.000001 s | Speedup: 0.00x
[BGR] count=8 | Scalar: 0.000000 s | RVV: 0.000001 s | Speedup: 0.00x
Test size=8 PASSED

[RGB] count=15 | Scalar: 0.000001 s | RVV: 0.000001 s | Speedup: 1.00x
[BGR] count=15 | Scalar: 0.000002 s | RVV: 0.000001 s | Speedup: 2.00x
Test size=15 PASSED

[RGB] count=31 | Scalar: 0.000001 s | RVV: 0.000001 s | Speedup: 1.00x
[BGR] count=31 | Scalar: 0.000002 s | RVV: 0.000001 s | Speedup: 2.00x
Test size=31 PASSED

[RGB] count=32 | Scalar: 0.000002 s | RVV: 0.000001 s | Speedup: 2.00x
[BGR] count=32 | Scalar: 0.000002 s | RVV: 0.000001 s | Speedup: 2.00x
Test size=32 PASSED

[RGB] count=33 | Scalar: 0.000002 s | RVV: 0.000001 s | Speedup: 2.00x
[BGR] count=33 | Scalar: 0.000002 s | RVV: 0.000000 s | Speedup: 0.00x
Test size=33 PASSED

[RGB] count=100 | Scalar: 0.000004 s | RVV: 0.000001 s | Speedup: 4.25x
[BGR] count=100 | Scalar: 0.000006 s | RVV: 0.000001 s | Speedup: 6.25x
Test size=100 PASSED

[RGB] count=1024 | Scalar: 0.000037 s | RVV: 0.000007 s | Speedup: 5.34x
[BGR] count=1024 | Scalar: 0.000056 s | RVV: 0.000007 s | Speedup: 7.83x
Test size=1024 PASSED

[RGB] count=65536 | Scalar: 0.002868 s | RVV: 0.000410 s | Speedup: 6.99x
[BGR] count=65536 | Scalar: 0.004017 s | RVV: 0.000426 s | Speedup: 9.43x
Test size=65536 PASSED

[RGB] count=1000000 | Scalar: 0.037677 s | RVV: 0.005883 s | Speedup: 6.40x
[BGR] count=1000000 | Scalar: 0.055934 s | RVV: 0.005829 s | Speedup: 9.60x
Test size=1000000 PASSED

[RGB] count=10000000 | Scalar: 0.373215 s | RVV: 0.058184 s | Speedup: 6.41x
[BGR] count=10000000 | Scalar: 0.564888 s | RVV: 0.062682 s | Speedup: 9.01x
Test size=10000000 PASSED

[RGB] count=50000000 | Scalar: 1.861281 s | RVV: 0.290735 s | Speedup: 6.40x
[BGR] count=50000000 | Scalar: 2.821005 s | RVV: 0.290990 s | Speedup: 9.69x
Test size=50000000 PASSED

Final Result: ALL TESTS PASSED 
[root@EulixOS ~]# ./test_nv21_to_rgba
count=1
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 0.06x
Test count=1: PASSED
count=3
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 0.33x
Test count=3: PASSED
count=8
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 2.00x
Test count=8: PASSED
count=33
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 7.50x
Test count=33: PASSED
count=100
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 6.46x
Test count=100: PASSED
count=1024
Scalar time: 0.0002 sec
RVV time    : 0.0000 sec
Speedup     : 8.68x
Test count=1024: PASSED
count=65536
Scalar time: 0.0131 sec
RVV time    : 0.0016 sec
Speedup     : 8.39x
Test count=65536: PASSED
count=1000000
Scalar time: 0.2007 sec
RVV time    : 0.0227 sec
Speedup     : 8.85x
Test count=1000000: PASSED
count=10000000
Scalar time: 1.9936 sec
RVV time    : 0.2274 sec
Speedup     : 8.77x
Test count=10000000: PASSED
count=50000000
Scalar time: 9.9450 sec
RVV time    : 1.1195 sec
Speedup     : 8.88x
Test count=50000000: PASSED

All tests PASSED 
[root@EulixOS ~]# ./test_nv21_to_rgb
count=1
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 0.11x
Test count=1: PASSED
count=3
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 0.33x
Test count=3: PASSED
count=8
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 2.00x
Test count=8: PASSED
count=33
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 2.78x
Test count=33: PASSED
count=100
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 6.46x
Test count=100: PASSED
count=1024
Scalar time: 0.0002 sec
RVV time    : 0.0000 sec
Speedup     : 8.96x
Test count=1024: PASSED
count=65536
Scalar time: 0.0130 sec
RVV time    : 0.0014 sec
Speedup     : 9.01x
Test count=65536: PASSED
count=1000000
Scalar time: 0.1984 sec
RVV time    : 0.0213 sec
Speedup     : 9.32x
Test count=1000000: PASSED
count=10000000
Scalar time: 1.9873 sec
RVV time    : 0.2127 sec
Speedup     : 9.34x
Test count=10000000: PASSED
count=50000000
Scalar time: 9.9288 sec
RVV time    : 1.0647 sec
Speedup     : 9.33x
Test count=50000000: PASSED

All tests PASSED 
[root@EulixOS ~]# ./test_nv21_to_bgra
count=1
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 0.10x
Test count=1: PASSED
count=3
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 0.31x
Test count=3: PASSED
count=8
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 1.12x
Test count=8: PASSED
count=33
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 3.62x
Test count=33: PASSED
count=100
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 6.77x
Test count=100: PASSED
count=1024
Scalar time: 0.0002 sec
RVV time    : 0.0000 sec
Speedup     : 9.07x
Test count=1024: PASSED
count=65536
Scalar time: 0.0134 sec
RVV time    : 0.0015 sec
Speedup     : 8.76x
Test count=65536: PASSED
count=1000000
Scalar time: 0.2019 sec
RVV time    : 0.0227 sec
Speedup     : 8.91x
Test count=1000000: PASSED
count=10000000
Scalar time: 2.0204 sec
RVV time    : 0.2268 sec
Speedup     : 8.91x
Test count=10000000: PASSED
count=50000000
Scalar time: 10.0888 sec
RVV time    : 1.1324 sec
Speedup     : 8.91x
Test count=50000000: PASSED

All tests PASSED 
[root@EulixOS ~]# ./test_nv21_to_bgr
count=1
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 0.12x
Test count=1: PASSED
count=3
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 0.42x
Test count=3: PASSED
count=8
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 0.31x
Test count=8: PASSED
count=33
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 3.22x
Test count=33: PASSED
count=100
Scalar time: 0.0000 sec
RVV time    : 0.0000 sec
Speedup     : 6.77x
Test count=100: PASSED
count=1024
Scalar time: 0.0002 sec
RVV time    : 0.0000 sec
Speedup     : 8.19x
Test count=1024: PASSED
count=65536
Scalar time: 0.0134 sec
RVV time    : 0.0014 sec
Speedup     : 9.27x
Test count=65536: PASSED
count=1000000
Scalar time: 0.2010 sec
RVV time    : 0.0214 sec
Speedup     : 9.38x
Test count=1000000: PASSED
count=10000000
Scalar time: 1.9991 sec
RVV time    : 0.2125 sec
Speedup     : 9.41x
Test count=10000000: PASSED
count=50000000
Scalar time: 10.0013 sec
RVV time    : 1.0601 sec
Speedup     : 9.43x
Test count=50000000: PASSED

All tests PASSED 

</details>

Optimize MNNC3 and MNNNV21 related color conversion functions using RVV intrinsics, including C3 to YUV/XYZ/HSV/BGR555/BGR565 and NV21 to RGBA/RGB/BGRA/BGR.

Signed-off-by: ihb2032 <1355790728@qq.com>
Co-authored-by: lyd1992 <liuyudong@iscas.ac.cn>
@wangzhaode wangzhaode self-assigned this Dec 26, 2025
@wangzhaode wangzhaode merged commit 07bda85 into alibaba:master Dec 26, 2025
6 checks passed
wangzhaode added a commit that referenced this pull request Dec 26, 2025
opt(RVV): Optimize CV color conversion functions with intrinsics

GitOrigin-RevId: 9c25bfac9121a326b86cd69e67949f742321ee9f
@ihb2032 ihb2032 deleted the opt/rvv-c3-nv21 branch December 27, 2025 02:09
wangzhaode added a commit that referenced this pull request Dec 30, 2025
opt(RVV): Optimize CV color conversion functions with intrinsics

GitOrigin-RevId: f5d0f2956495380fe72b693ea9a4e322886b4d9f
wangzhaode added a commit that referenced this pull request Dec 30, 2025
opt(RVV): Optimize CV color conversion functions with intrinsics

GitOrigin-RevId: 1150246ab040f30449b228b1cc9d948de543b7dc
wangzhaode added a commit that referenced this pull request Jan 7, 2026
opt(RVV): Optimize CV color conversion functions with intrinsics

GitOrigin-RevId: 9a7c2a465c1101b6c0690aef44d502795c0970dc
wangzhaode added a commit that referenced this pull request Jan 8, 2026
opt(RVV): Optimize CV color conversion functions with intrinsics

GitOrigin-RevId: 6e09d6eb2b9e8a4c280564dc1d5f6f9ff5e16605
Juude pushed a commit to Juude/MNN that referenced this pull request Jan 14, 2026
opt(RVV): Optimize CV color conversion functions with intrinsics

GitOrigin-RevId: 9a7c2a465c1101b6c0690aef44d502795c0970dc
Juude pushed a commit to Juude/MNN that referenced this pull request Jan 14, 2026
opt(RVV): Optimize CV color conversion functions with intrinsics

GitOrigin-RevId: 6e09d6eb2b9e8a4c280564dc1d5f6f9ff5e16605
wangzhaode added a commit that referenced this pull request Jan 20, 2026
opt(RVV): Optimize CV color conversion functions with intrinsics

GitOrigin-RevId: 62f1133a743912812e28114c7c08821473d19a68
wangzhaode added a commit that referenced this pull request Jan 21, 2026
opt(RVV): Optimize CV color conversion functions with intrinsics

GitOrigin-RevId: 62f1133a743912812e28114c7c08821473d19a68
wangzhaode added a commit that referenced this pull request Jan 21, 2026
opt(RVV): Optimize CV color conversion functions with intrinsics

GitOrigin-RevId: f491541f2b63732eec0b1bd38327b6b3c4191fdb
ORIGINAL_AUTHOR=MNNSyncBot <hi@zhaode.wang>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants