LUCENE-9837: try to improve performance of VectorUtil.dotProduct #17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Here's my stab at it. It gives ~ 25% improvement for big vectors and doesn't cause regressions. I tested sizes of 1,4,6,8,13,16,25,32,64,100,128,207,256,300,512,702,1024. It starts to give good speedups for sizes bigger than 32.
Benchmarks
DotProduct.dotProductOld 1 thrpt 45 184.487 ± 0.364 ops/us
DotProduct.dotProductNew 1 thrpt 45 184.675 ± 0.308 ops/us
DotProduct.dotProductOld 4 thrpt 45 132.525 ± 1.124 ops/us
DotProduct.dotProductNew 4 thrpt 45 133.688 ± 0.383 ops/us
DotProduct.dotProductOld 6 thrpt 45 127.527 ± 0.437 ops/us
DotProduct.dotProductNew 6 thrpt 45 122.776 ± 3.605 ops/us
DotProduct.dotProductOld 8 thrpt 45 95.154 ± 0.209 ops/us
DotProduct.dotProductNew 8 thrpt 45 109.312 ± 0.282 ops/us
DotProduct.dotProductOld 13 thrpt 45 73.528 ± 0.179 ops/us
DotProduct.dotProductNew 13 thrpt 45 75.585 ± 0.149 ops/us
DotProduct.dotProductOld 16 thrpt 45 67.102 ± 0.165 ops/us
DotProduct.dotProductNew 16 thrpt 45 71.895 ± 0.207 ops/us
DotProduct.dotProductOld 25 thrpt 45 46.128 ± 0.068 ops/us
DotProduct.dotProductNew 25 thrpt 45 49.999 ± 0.106 ops/us
DotProduct.dotProductOld 32 thrpt 45 40.341 ± 0.136 ops/us
DotProduct.dotProductNew 32 thrpt 45 46.885 ± 0.101 ops/us (+16%)
DotProduct.dotProductOld 64 thrpt 45 23.086 ± 0.039 ops/us
DotProduct.dotProductNew 64 thrpt 45 27.729 ± 0.046 ops/us (+20%)
DotProduct.dotProductOld 100 thrpt 45 14.183 ± 0.041 ops/us
DotProduct.dotProductNew 100 thrpt 45 17.707 ± 0.095 ops/us (+25%)
DotProduct.dotProductOld 128 thrpt 45 12.307 ± 0.022 ops/us
DotProduct.dotProductNew 128 thrpt 45 14.998 ± 0.099 ops/us (+21%)
DotProduct.dotProductOld 207 thrpt 45 7.749 ± 0.047 ops/us
DotProduct.dotProductNew 207 thrpt 45 9.069 ± 0.082 ops/us (+17%)
DotProduct.dotProductOld 256 thrpt 45 6.365 ± 0.012 ops/us
DotProduct.dotProductNew 256 thrpt 45 7.992 ± 0.016 ops/us (+26%)
DotProduct.dotProductOld 300 thrpt 45 5.066 ± 0.009 ops/us
DotProduct.dotProductNew 300 thrpt 45 6.381 ± 0.016 ops/us (+26%)
DotProduct.dotProductOld 512 thrpt 45 3.216 ± 0.017 ops/us
DotProduct.dotProductNew 512 thrpt 45 4.089 ± 0.009 ops/us (+27%)
DotProduct.dotProductOld 702 thrpt 45 2.370 ± 0.004 ops/us
DotProduct.dotProductNew 702 thrpt 45 2.809 ± 0.007 ops/us (+19%)
DotProduct.dotProductOld 1024 thrpt 45 1.611 ± 0.003 ops/us
DotProduct.dotProductNew 1024 thrpt 45 2.049 ± 0.004 ops/us (+27%)