Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLL: Avoid some allocations when possible. #3314

Merged
merged 1 commit into from
Aug 4, 2016

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Aug 3, 2016

  • HLLC.fold avoids duplicating the other buffer by saving and restoring its position.
  • HLLC.makeCollector(buffer) no longer duplicates incoming BBs.
  • Updated call sites where appropriate to duplicate BBs passed to HLLC.

Inspired by @navis's comment in #3111 (comment) (although I didn't go as far 😄 ).

Benchmarks, jdk1.8.0_60 (19–30% speedup depending on the test):

topN - master + sorting on hyperUniquesMet

Benchmark                                  (numSegments)  (rowsPerSegment)        (schemaAndQuery)  (threshold)  Mode  Cnt        Score       Error  Units
TopNBenchmark.queryMultiQueryableIndex                 1            750000                 basic.A           10  avgt   25   193478.532 ±  5680.045  us/op
TopNBenchmark.querySingleIncrementalIndex              1            750000                 basic.A           10  avgt   25  1571360.141 ± 37579.656  us/op
TopNBenchmark.querySingleQueryableIndex                1            750000                 basic.A           10  avgt   25   192961.326 ±  5667.723  us/op

topN - patch + sorting on hyperUniquesMet 

Benchmark                                  (numSegments)  (rowsPerSegment)  (schemaAndQuery)  (threshold)  Mode  Cnt        Score       Error  Units
TopNBenchmark.queryMultiQueryableIndex                 1            750000           basic.A           10  avgt   25   158187.083 ±  4630.679  us/op
TopNBenchmark.querySingleIncrementalIndex              1            750000           basic.A           10  avgt   25  1082888.189 ± 33717.391  us/op
TopNBenchmark.querySingleQueryableIndex                1            750000           basic.A           10  avgt   25   155275.814 ±  4052.757  us/op

@gianm gianm added this to the 0.9.2 milestone Aug 3, 2016
@fjy
Copy link
Contributor

fjy commented Aug 3, 2016

👍

@navis do you mind reviewing this one as well?

CardinalityAggregator.hashRow(selectorList, collector);
} else {
CardinalityAggregator.hashValues(selectorList, collector);
// Save position, limit and restore later instead of allocating a new ByteBuffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be allocating a new byte buffer. duplicate() simply makes a new buffer view into the underlying data. Are you sure this one affects performance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I believe it. Even for "small" allocations, since this is in aggregate it is going to be called again and again and again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test Cardinality, but I did test HyperUnique, and the performance boost from avoiding the duplicate() of the left-hand side is substantial (a significant chunk of the overall ~19% boost).

Copy link
Contributor Author

@gianm gianm Aug 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, by "allocating a new ByteBuffer" I really meant "allocating a new ByteBuffer object pointing to the existing data". The ByteBuffer object has substantial overhead (8 fields!) and apparently the JVM has trouble optimizing that even for short lived allocations.

fwiw- I also tried adjusting things such that the HyperLogLogCollector was re-used too, but saw zero performance boost from that. I thought that was cool, and figured that was because it only has 3 fields, which made it easier for the JVM to optimize.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you modify the comment to make a clearer distinction between bytebuffer object allocation and the buffer allocation itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, definitely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment to say "ByteBuffer object" instead of "ByteBuffer"

@gianm gianm force-pushed the hll-chill-out branch 2 times, most recently from 066ae07 to 75bbc89 Compare August 3, 2016 18:29
@gianm
Copy link
Contributor Author

gianm commented Aug 3, 2016

@drcrallen I updated the makeCollector contract to say it's allowed to modify the position/limit of the incoming buffer, and updated call sites accordingly


byte myOffset = getRegisterOffset();
short numNonZero = getNumNonZeroRegisters();
// Save position and restore later to avoid allocations due to duplicating the buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@drcrallen
Copy link
Contributor

Minor doc comment at https://github.com/druid-io/druid/pull/3314/files#r73394157 but 👍 after doc change and travis

- HLLC.fold avoids duplicating the other buffer by saving and restoring its position.
- HLLC.makeCollector(buffer) no longer duplicates incoming BBs.
- Updated call sites where appropriate to duplicate BBs passed to HLLC.
@navis
Copy link
Contributor

navis commented Aug 4, 2016

@gianm Looks good. I think I should have dig in more on that.
Just one thing : isn't it possible to lessen overhead of "swap" part in fold() operation in HyperLogLogCollector?

@gianm
Copy link
Contributor Author

gianm commented Aug 4, 2016

@navis possibly, to be honest I did not look that closely for more possible opportunities. I just made the most obvious modifications.

@drcrallen
Copy link
Contributor

I support small incremental improvement/changes.

@navis
Copy link
Contributor

navis commented Aug 4, 2016

ok, I think I can do that after this. 👍 from me.

@drcrallen drcrallen merged commit 9437a7a into apache:master Aug 4, 2016
@gianm gianm deleted the hll-chill-out branch August 4, 2016 17:15
gianm added a commit to gianm/druid that referenced this pull request Oct 15, 2016
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before apache#3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
fjy pushed a commit that referenced this pull request Oct 17, 2016
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before #3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
gianm added a commit to gianm/druid that referenced this pull request Oct 17, 2016
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before apache#3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
fjy pushed a commit that referenced this pull request Oct 17, 2016
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before #3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
@gianm gianm mentioned this pull request Dec 4, 2016
fundead pushed a commit to fundead/druid that referenced this pull request Dec 7, 2016
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before apache#3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants