Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy collected acc(maxFreqs) into empty acc, rather than merge them. #12846

Merged
merged 10 commits into from
Jan 11, 2024

Conversation

vsop-479
Copy link
Contributor

We could copy collected acc into buffered acc, rather than merge them, in bufferSkip phase.
Because the target CompetitiveImpactAccumulator is cleared in last writeSkipData phase.

@vsop-479
Copy link
Contributor Author

@jpountz Please take a look when you get a chance.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. Did you observe a speedup with this change? Also can you add a test?

@vsop-479
Copy link
Contributor Author

vsop-479 commented Nov 28, 2023

@jpountz
I added a test case for copy accumulator, and replaced iterate add with addAll for otherFreqNormPairs. Please take a look when you get a chance.

And then maybe we can drop the requirement that the current accumulator is empty?

Do you mean copy the treeset's all entries regardless wether they are competitice, into current accumulator wether it is empty or not, and does not remove less competitive entries which implemented in add phase?

@jpountz
Copy link
Contributor

jpountz commented Nov 28, 2023

Sorry I wasn't clear, I meant to replace entries of the treeset with entries of the other treeset by clearing it first, and then doing an addAll.

@vsop-479
Copy link
Contributor Author

vsop-479 commented Nov 29, 2023

replace entries of the treeset with entries of the other treeset by clearing it first, and then doing an addAll.

Sorry, I am still confused about this. If we clear an unEmpty treeset, how to deal with its competitive entries?

@jpountz
Copy link
Contributor

jpountz commented Nov 29, 2023

Sorry, I am still confused about this. If we clear an unEmpty treeset, how to deal with its competitive entries?

This looks similar to the arraycopy that you perform on the maxFreqs array, ignoring existing values? We would just need to update the contract of this method to say something like Copy the provided accumulator into this accumulator, ignoring its existing state. This is effectively the same as calling clear() then addAll()..

@vsop-479
Copy link
Contributor Author

vsop-479 commented Nov 30, 2023

@jpountz I applied copy for all level's empty acc, Please take a look when you get a chance.

As so far I just performed copy on empty acc which cleared by clear. Do you mean we could skip the clear (mainly Arrays.fill(maxFreqs, 0)) after writeSkipData, and override the stale acc in next writing directly?

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I liked your previous patch better before, when knowing when the accumulator is empty is the responsibility of the caller rather than CompetitiveFreqNormAccumulator. Are there reasons why this did not work? Or is this new approach helping cover more cases maybe?

@vsop-479
Copy link
Contributor Author

vsop-479 commented Dec 6, 2023

is this new approach helping cover more cases maybe?

Yes, Previous patch only apply copy for level 0 's acc in Lucene99SkipWriter.bufferSkip, Current patch apply copy for all level's acc in Lucene99SkipWriter.writeSkipData by adding a flag to indicates wether this is an empty acc.

when knowing when the accumulator is empty is the responsibility of the caller rather than CompetitiveFreqNormAccumulator.

I will move the copy/addAll selection to caller.

Before that, there are some performance data I want to share:

First of all, I measured method execution time(without assertion):

Method ns diff
addAll 48291 -
copy 17000 +64.7%

And add the implementation to JMH to enable JIT's optimization(auto-unroll, auto-vectorization):

Benchmark                            (size)   Mode  Cnt   Score   Error   Units
VectorUtilBenchmark.accAddAll           256  thrpt   15  17.311 ± 4.468  ops/us
VectorUtilBenchmark.accCopy             256  thrpt   15  21.417 ± 7.009  ops/us

Benchmark                            (size)   Mode  Cnt   Score   Error   Units
VectorUtilBenchmark.accAddAll           256  thrpt   15  19.943 ± 4.831  ops/us
VectorUtilBenchmark.accCopy             256  thrpt   15  16.516 ± 1.904  ops/us

The speedup is not stable, so I split the implementation into array part and treeSet part.

  1. Array part:
Benchmark                              (size)   Mode  Cnt    Score    Error   Units
VectorUtilBenchmark.arrayCopy             256  thrpt   15  141.356 ±  3.640  ops/us 
VectorUtilBenchmark.arrayMax              256  thrpt   15   77.875 ±  1.310  ops/us 

Benchmark                              (size)   Mode  Cnt    Score    Error   Units
VectorUtilBenchmark.arrayCopy             256  thrpt   15  138.484 ±  5.038  ops/us
VectorUtilBenchmark.arrayMax              256  thrpt   15   78.250 ±  0.992  ops/us

ArrayMax is the baseline that iterates the array and get the max value.
ArrayCopy uses System.arraycopy, which get a explicit speedup.

  1. TreeSet part:
Benchmark                          (size)   Mode  Cnt   Score    Error   Units
VectorUtilBenchmark.treeAddAll        256  thrpt   15  27.673 ± 19.096  ops/us
VectorUtilBenchmark.treeScalarAdd     256  thrpt   15  34.888 ± 14.321  ops/us

Benchmark                          (size)   Mode  Cnt   Score    Error   Units
VectorUtilBenchmark.treeAddAll        256  thrpt   15  25.342 ±  4.151  ops/us
VectorUtilBenchmark.treeScalarAdd     256  thrpt   15  42.673 ± 19.396  ops/us

I also measured the method performance without JMH:

Method ns diff
treeScalarAdd 36000 -
treeAddAll 26875 +25.3%

The method performance indicates that TreeSet.addAll get a better performance, but JMH benchmark result indicates that scalarAdd(baseline) get a better performance than TreeSet.addAll, which is unexpected for me. I am still working on
finding out the reason.

Finally, I modified acc.copy to just copy the array and use the original iterate add for treeSet, and measured the performance:

Benchmark                            (size)   Mode  Cnt   Score    Error   Units
VectorUtilBenchmark.accAddAll           256  thrpt   15  16.180 ±  3.014  ops/us
VectorUtilBenchmark.accCopy             256  thrpt   15  31.807 ± 12.811  ops/us

Benchmark                            (size)   Mode  Cnt   Score   Error   Units
VectorUtilBenchmark.accAddAll           256  thrpt   15  20.414 ± 3.693  ops/us
VectorUtilBenchmark.accCopy             256  thrpt   15  24.432 ± 6.862  ops/us

Now acc.copy get a speedup.
Any idea about that? @jpountz

@vsop-479
Copy link
Contributor Author

@jpountz
I moved the copy/addAll selection to the caller, and reverted treeSet's addAll to iterate add, since there is no speedup with TreeSet.addAll in JMH.

Please take a look when you get a chance!

Copy link

github-actions bot commented Jan 8, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 8, 2024
@github-actions github-actions bot removed the Stale label Jan 9, 2024
@vsop-479
Copy link
Contributor Author

@jpountz
I removed the optimization for non-zero level acc, and the empty flag.
Please take a look when you get a chance!

@vsop-479
Copy link
Contributor Author

@jpountz
I removed the contract of empty on acc. Please take a look when you get a chance!

@jpountz jpountz merged commit 4b11803 into apache:main Jan 11, 2024
4 checks passed
@vsop-479 vsop-479 deleted the copy_empty_acc branch March 15, 2024 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants