Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9837: try to improve performance of VectorUtil.dotProduct #17

Merged
merged 1 commit into from Mar 15, 2021

Conversation

rmuir
Copy link
Member

@rmuir rmuir commented Mar 13, 2021

Here's my stab at it. It gives ~ 25% improvement for big vectors and doesn't cause regressions. I tested sizes of 1,4,6,8,13,16,25,32,64,100,128,207,256,300,512,702,1024. It starts to give good speedups for sizes bigger than 32.

Benchmarks

DotProduct.dotProductOld 1 thrpt 45 184.487 ± 0.364 ops/us
DotProduct.dotProductNew 1 thrpt 45 184.675 ± 0.308 ops/us

DotProduct.dotProductOld 4 thrpt 45 132.525 ± 1.124 ops/us
DotProduct.dotProductNew 4 thrpt 45 133.688 ± 0.383 ops/us

DotProduct.dotProductOld 6 thrpt 45 127.527 ± 0.437 ops/us
DotProduct.dotProductNew 6 thrpt 45 122.776 ± 3.605 ops/us

DotProduct.dotProductOld 8 thrpt 45 95.154 ± 0.209 ops/us
DotProduct.dotProductNew 8 thrpt 45 109.312 ± 0.282 ops/us

DotProduct.dotProductOld 13 thrpt 45 73.528 ± 0.179 ops/us
DotProduct.dotProductNew 13 thrpt 45 75.585 ± 0.149 ops/us

DotProduct.dotProductOld 16 thrpt 45 67.102 ± 0.165 ops/us
DotProduct.dotProductNew 16 thrpt 45 71.895 ± 0.207 ops/us

DotProduct.dotProductOld 25 thrpt 45 46.128 ± 0.068 ops/us
DotProduct.dotProductNew 25 thrpt 45 49.999 ± 0.106 ops/us

DotProduct.dotProductOld 32 thrpt 45 40.341 ± 0.136 ops/us
DotProduct.dotProductNew 32 thrpt 45 46.885 ± 0.101 ops/us (+16%)

DotProduct.dotProductOld 64 thrpt 45 23.086 ± 0.039 ops/us
DotProduct.dotProductNew 64 thrpt 45 27.729 ± 0.046 ops/us (+20%)

DotProduct.dotProductOld 100 thrpt 45 14.183 ± 0.041 ops/us
DotProduct.dotProductNew 100 thrpt 45 17.707 ± 0.095 ops/us (+25%)

DotProduct.dotProductOld 128 thrpt 45 12.307 ± 0.022 ops/us
DotProduct.dotProductNew 128 thrpt 45 14.998 ± 0.099 ops/us (+21%)

DotProduct.dotProductOld 207 thrpt 45 7.749 ± 0.047 ops/us
DotProduct.dotProductNew 207 thrpt 45 9.069 ± 0.082 ops/us (+17%)

DotProduct.dotProductOld 256 thrpt 45 6.365 ± 0.012 ops/us
DotProduct.dotProductNew 256 thrpt 45 7.992 ± 0.016 ops/us (+26%)

DotProduct.dotProductOld 300 thrpt 45 5.066 ± 0.009 ops/us
DotProduct.dotProductNew 300 thrpt 45 6.381 ± 0.016 ops/us (+26%)

DotProduct.dotProductOld 512 thrpt 45 3.216 ± 0.017 ops/us
DotProduct.dotProductNew 512 thrpt 45 4.089 ± 0.009 ops/us (+27%)

DotProduct.dotProductOld 702 thrpt 45 2.370 ± 0.004 ops/us
DotProduct.dotProductNew 702 thrpt 45 2.809 ± 0.007 ops/us (+19%)

DotProduct.dotProductOld 1024 thrpt 45 1.611 ± 0.003 ops/us
DotProduct.dotProductNew 1024 thrpt 45 2.049 ± 0.004 ops/us (+27%)

@rmuir rmuir requested a review from msokolov March 13, 2021 09:45
@madrob
Copy link
Contributor

madrob commented Mar 13, 2021

Can you share the JMH code? Always interested to see how we set these up.

@rmuir
Copy link
Member Author

rmuir commented Mar 13, 2021

@madrob it is GPL i guess due to licensing of libraries used.

But I simply adapted an existing one from a hotspot bug report: https://bugs.openjdk.java.net/browse/JDK-8200477 I changed it to floats and instead of only testing 1024 i tested the series of sizes listed above

@rmuir
Copy link
Member Author

rmuir commented Mar 13, 2021

Here is attached benchmark and asm output (old and new).

DotProduct.java.txt
asm_old.txt
asm_new.txt

@msokolov
Copy link
Contributor

msokolov commented Mar 13, 2021 via email

Copy link
Contributor

@msokolov msokolov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crazy all this loop unrolling, and awesome that you were able to achieve these nice gains by tickling the compiler just so!

@rmuir
Copy link
Member Author

rmuir commented Mar 15, 2021

We have https://issues.apache.org/jira/browse/LUCENE-9838 as a followup, but it requires the JDK16 incubating vector API. But in the meantime, this may eek out a tiny little performance gain for the scalar version.

@rmuir rmuir merged commit d48193e into apache:main Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants