MDEV-37107 - Optimise dot_product by loop-unrolling by a factor of 4

mikejuliet13 · svoj · commit 311171c176d7 · 2025-07-02T00:12:46.000+04:00
This patch introduces loop unrolling by a factor of 4 in the
dot_product() function used in vector-based distance calculations.

The goal is to improve SIMD utilization and overall performance during
high-throughput vector operations, particularly in indexing and search
routines that rely on this function.

Observations from benchmarking (ann-benchmark):
- Query Performance (QPS) improved by 4–10% across datasets.
- Indexation time reduced by 22–28%.
- Loop unrolling factors of 8 or 16 yielded similar performance to
  factor-4 but made the code less readable. Hence, a factor of 4 was
  chosen to maintain a balance between performance and code clarity.

This change is architecture-specific (PowerPC) and should not
introduce any behavioral regressions or side effects in unrelated
parts of the codebase.

Signed-off-by: Manjul Mohan &lt;manjul.mohan@ibm.com&gt;
diff --git a/sql/vector_mhnsw.cc b/sql/vector_mhnsw.cc
@@ -241,6 +241,7 @@ struct FVector
     // Round up to process full vector, including padding
     size_t base= ((len + POWER_dims - 1) / POWER_dims) * POWER_dims;
 
+    #pragma GCC unroll 4
     for (size_t i= 0; i < base; i+= POWER_dims)
     {
       vector short x= vec_ld(0, &v1[i]);

Original file line number	Diff line number	Diff line change
`@@ -241,6 +241,7 @@ struct FVector`
`241`	`241`	`// Round up to process full vector, including padding`
`242`	`242`	`size_t base= ((len + POWER_dims - 1) / POWER_dims) * POWER_dims;`
`243`	`243`
	`244`	`+ #pragma GCC unroll 4`
`244`	`245`	`for (size_t i= 0; i < base; i+= POWER_dims)`
`245`	`246`	`{`
`246`	`247`	`vector short x= vec_ld(0, &v1[i]);`