Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can RVV Kernels be enable by default? #5653

Closed
bhbruce opened this issue Oct 16, 2023 · 1 comment
Closed

Can RVV Kernels be enable by default? #5653

bhbruce opened this issue Oct 16, 2023 · 1 comment

Comments

@bhbruce
Copy link
Contributor

bhbruce commented Oct 16, 2023

I find there are several RVV kernels in this project.
However, they aren't enabled by default.
E.g: XNNPACK has RVV H-swish here, but it isn't enabled by default

#elif XNN_ARCH_RISCV
f32_hswish_config.ukernel = (xnn_vunary_ukernel_fn) xnn_f32_vhswish_ukernel__scalar_u4;
f32_hswish_config.init.f32_hswish = xnn_init_f32_hswish_scalar_params;
f32_hswish_config.element_tile = 4;

Is there any reason or side effect preventing us from enabling them by default?

@fbarchard
Copy link
Contributor

To configure microkernels we need to run benchmarks on RVV hardware.
The kernels support m1, m2, m4 and m8 and normally we'd run the benchmark, select the fastest and plug that into the config.
There is a microkernel benchmark, which shows the kernels that could be used
f32_vhswish/rvv_u1v/N:3840/real_time ERROR OCCURRED: 'no V extension'
f32_vhswish/rvv_u1v/N:32640/real_time ERROR OCCURRED: 'no V extension'
f32_vhswish/rvv_u2v/N:3840/real_time ERROR OCCURRED: 'no V extension'
f32_vhswish/rvv_u2v/N:32640/real_time ERROR OCCURRED: 'no V extension'
f32_vhswish/rvv_u4v/N:3840/real_time ERROR OCCURRED: 'no V extension'
f32_vhswish/rvv_u4v/N:32640/real_time ERROR OCCURRED: 'no V extension'
f32_vhswish/rvv_u8v/N:3840/real_time ERROR OCCURRED: 'no V extension'
f32_vhswish/rvv_u8v/N:32640/real_time ERROR OCCURRED: 'no V extension'
f32_vhswish/scalar_u1/N:3840/real_time 322413 ns 322405 ns 2174 bytes=95.2816M/s
f32_vhswish/scalar_u1/N:32640/real_time 2736093 ns 2736025 ns 256 bytes=95.4354M/s
f32_vhswish/scalar_u2/N:3840/real_time 267837 ns 267827 ns 2611 bytes=114.696M/s
f32_vhswish/scalar_u2/N:32640/real_time 2278400 ns 2278154 ns 308 bytes=114.607M/s
f32_vhswish/scalar_u4/N:3840/real_time 289920 ns 289905 ns 2413 bytes=105.96M/s
f32_vhswish/scalar_u4/N:32640/real_time 2479500 ns 2479095 ns 284 bytes=105.312M/s

To test they work, I manually hack the cpu_info RVV detect and run on qemu.
We prefer properly test on real hardware before enabling, but if you want to help with that, contributions welcome

@bhbruce bhbruce closed this as completed May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants