Fix: Types in sparse dot-products generic macro#253
Conversation
The `SIMSIMD_MAKE_INTERSECT_WEIGHTED` macro previously used `counter_type` when loading weights. Beyond just using the `weight_type`, it should use the `accumulator_type` so as to match the behavior of SIMD intrinsics such as `_mm256_dbpf16_ps` which widen the values before doing the dot-product.
spdot_counts_u16_turin
|
Would it be hard to use saturating addition everywhere? I can imagine receiving really large inputs in those functions. PS: Thanks for following the commit naming convention! |
Good point. bf16
i16Scalar
AVX512
No intrinsic for saturating horizontal add, but can easily adapt the SVE2I don't see a dot product intrinsic with saturation built-in, but here's an idea for inside the loop: auto tmp = svdot_s64(zero_vec, a_weights_vec, b_equal_weights_vec);
product_vec = svaddq_s64_z(svptrue_b64(), product_vec, tmp);I'm unfamiliar with SVE, so take the above with a grain of salt. |
268d945 to
354a6b8
Compare
spdot_counts_u16_turin|
I realize now that there are some subtleties to handling overflows in each backend, both for the implementation and the API design, so I will open a separate issue for that. |
|
Thank you @real-eren! Sorry, I forgot to merge this sooner 🤦♂️ |
Re: types
The dot-product intrinsics _mm256_dpbf16_ps, _mm256_dpwssd_epi32, and svbfdot_f32 each widen the values before multiplying them, so the scalar algorithm should also do that. Also, the macro currently uses the
counter_type, which is an unsigned int.Re: saturating addition
Same links, none of them perform saturating addition other than the currently used
_mm256_dpwssds_epi32, so the currentspdot_counts_u16_turinkernel is the odd one out.