-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. #11124
Conversation
…eeds. This patch does the following: - Removes OffheapIncrementalIndex. - Clarifies that Aggregators are required to be thread safe. - Clarifies that BufferAggregators and VectorAggregators are not required to be thread safe. - Removes thread safety code from some DataSketches aggregators that had it. (Not all of them did, and that's OK, because it wasn't necessary anyway.) - Makes enabling "useOffheap" with groupBy v1 an error. Rationale for removing the offheap incremental index: - It is only used in one rare scenario: groupBy v1 (which is non-default) in "useOffheap" mode (also non-default). So you have to go pretty deep into the wilderness to get this code to activate in production. It is never used during ingestion. - Its existence complicates developer efforts to reason about how aggregators get used, because the way it uses buffer aggregators is so different from how every other query engine uses them. - It doesn't have meaningful testing. By the way, I do believe that the given way the offheap incremental index works, it actually didn't require buffer aggregators to be thread-safe. It synchronizes on "aggregate" and doesn't call "get" until it has stopped calling "aggregate". Nevertheless, this is a bother to think about, and for the above reasons I think it makes sense to remove the code anyway.
overall +1 on removing this. This PR is failing some intellij inspections, but we might want to suppress them to accommodate #10001, which does appear to use some of these methods. |
I took a look at #10001, and I don't see usage of |
Oh, nevermind, it's used by the parent class. But the OakIncrementalIndex doesn't support that method anyway: https://github.com/apache/druid/pull/10001/files#diff-8a952057b2e59239cbd4cbff14c3202ea7d8cd85b4bb4c174d9375b00a0e7b40R244-R250 |
Wow, there was a lot of thread to pull on behind the intellij "unused item" inspections. The primitive get methods of BufferAggregator are no longer needed; nor is the BufferPool passed to the groupBy v1 merger. I removed them in the latest commit. I added an "unused" suppression to the stuff in IncrementalIndex that is unused now, but that #10001 may need. |
You can avoid using "unused" annotation by allowing I can rebase #10001 after this PR will be merged. Please let me know what you think. |
I'm ok either way @liran-funaro. It depends on which patch is merged first, I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that #10001 ( OakIncrementalIndex
) needs that BufferAggregator
will have the same API as Aggregator
.
OakIncrementalIndex
takes care of the concurrency, so it is OK to remove the concurrency requirements from BufferAggregator
.
I would also suggest removing the template modifier from IncrementalIndex<AggregatorType>
. It is no longer required:
private final AggregatorType[] aggs;
can be removed. It was only used byOffheapIncrementalIndex
.iterableWithPostAggregations()
should be defined asabstract
and be implemented inOnheapIncrementalIndex
.OakIncrementalIndex
has its own implementation for this method.getAggsForRow()
is also redundant.initAggs()
can returnvoid
.
In fact, I already have this modification implemented: #11160.
Please consider merging it before this PR.
/** | ||
* Returns the float representation of the given aggregate byte array | ||
* | ||
* Converts the given byte buffer representation into the intermediate aggregate value. | ||
* | ||
* <b>Implementations must not change the position, limit or mark of the given buffer</b> | ||
* | ||
* Implementations are only required to support this method if they are aggregations which | ||
* have an {@link AggregatorFactory#getType()} ()} of {@link org.apache.druid.segment.column.ValueType#FLOAT}. | ||
* If unimplemented, throwing an {@link UnsupportedOperationException} is common and recommended. | ||
* | ||
* @param buf byte buffer storing the byte array representation of the aggregate | ||
* @param position offset within the byte buffer at which the aggregate value is stored | ||
* @return the float representation of the aggregate | ||
*/ | ||
float getFloat(ByteBuffer buf, int position); | ||
|
||
/** | ||
* Returns the long representation of the given aggregate byte array | ||
* | ||
* Converts the given byte buffer representation into the intermediate aggregate value. | ||
* | ||
* <b>Implementations must not change the position, limit or mark of the given buffer</b> | ||
* | ||
* Implementations are only required to support this method if they are aggregations which | ||
* have an {@link AggregatorFactory#getType()} of of {@link org.apache.druid.segment.column.ValueType#LONG}. | ||
* If unimplemented, throwing an {@link UnsupportedOperationException} is common and recommended. | ||
* | ||
* @param buf byte buffer storing the byte array representation of the aggregate | ||
* @param position offset within the byte buffer at which the aggregate value is stored | ||
* @return the long representation of the aggregate | ||
*/ | ||
long getLong(ByteBuffer buf, int position); | ||
|
||
/** | ||
* Returns the double representation of the given aggregate byte array | ||
* | ||
* Converts the given byte buffer representation into the intermediate aggregate value. | ||
* | ||
* <b>Implementations must not change the position, limit or mark of the given buffer</b> | ||
* | ||
* Implementations are only required to support this method if they are aggregations which | ||
* have an {@link AggregatorFactory#getType()} of of {@link org.apache.druid.segment.column.ValueType#DOUBLE}. | ||
* If unimplemented, throwing an {@link UnsupportedOperationException} is common and recommended. | ||
* | ||
* The default implementation casts {@link BufferAggregator#getFloat(ByteBuffer, int)} to double. | ||
* This default method is added to enable smooth backward compatibility, please re-implement it if your aggregators | ||
* work with numeric double columns. | ||
* | ||
* @param buf byte buffer storing the byte array representation of the aggregate | ||
* @param position offset within the byte buffer at which the aggregate value is stored | ||
* @return the double representation of the aggregate | ||
*/ | ||
default double getDouble(ByteBuffer buf, int position) | ||
{ | ||
return (double) getFloat(buf, position); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is important for #10001 ( OakIncrementalIndex
) that BufferAggregator
will have the same API as Aggregator
.
So please keep these methods. Also please keep isNull()
.
Per #11124 (comment) I reverted the BufferAggregator and IncrementalIndex API changes mentioned in #11124 (comment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1 for removing OffheapIncrementalIndex!
@liran-funaro thank you for taking a look at this PR and sorry for taking a long time to review your PRs. I quickly reviewed #11160 and left one question on it. Since this PR now has enough approvals, would you mind if we merge this PR before #11160 gets merged? |
Sure. Go ahead. |
Thanks! |
* fix type * Revert "Fix Keyclock auth integration test based on upstream changes (apache#622)" This reverts commit c1083fb. * Bump netty4 to 4.1.68; suppress CVE-2021-37136 and CVE-2021-37137 for netty3 (apache#11844) * bump netty4 to 4.1.68 * suppress CVE-2021-37136 and CVE-2021-37137 for netty3 * license * add `prometheus-emitter` to distribution (apache#11812) * add `prometheus-emitter` to distribution Signed-off-by: Đặng Minh Dũng <dungdm93@live.com> * add `druid-momentsketch` to distribution Signed-off-by: Đặng Minh Dũng <dungdm93@live.com> * Web console: update typescript 4.4 for faster build speeds (apache#11725) * update typescript * do not show pagination when there is only one page * update snapshots * fix pagination * Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. (apache#11124) * Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. This patch does the following: - Removes OffheapIncrementalIndex. - Clarifies that Aggregators are required to be thread safe. - Clarifies that BufferAggregators and VectorAggregators are not required to be thread safe. - Removes thread safety code from some DataSketches aggregators that had it. (Not all of them did, and that's OK, because it wasn't necessary anyway.) - Makes enabling "useOffheap" with groupBy v1 an error. Rationale for removing the offheap incremental index: - It is only used in one rare scenario: groupBy v1 (which is non-default) in "useOffheap" mode (also non-default). So you have to go pretty deep into the wilderness to get this code to activate in production. It is never used during ingestion. - Its existence complicates developer efforts to reason about how aggregators get used, because the way it uses buffer aggregators is so different from how every other query engine uses them. - It doesn't have meaningful testing. By the way, I do believe that the given way the offheap incremental index works, it actually didn't require buffer aggregators to be thread-safe. It synchronizes on "aggregate" and doesn't call "get" until it has stopped calling "aggregate". Nevertheless, this is a bother to think about, and for the above reasons I think it makes sense to remove the code anyway. * Remove things that are now unused. * Revert removal of getFloat, getLong, getDouble from BufferAggregator. * OAK-related warnings, suppressions. * Unused item suppressions. Co-authored-by: Đặng Minh Dũng <dungdm93@live.com> Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
This patch does the following:
required to be thread safe.
had it. (Not all of them did, and that's OK, because it wasn't necessary
anyway.)
Rationale for removing the offheap incremental index:
in "useOffheap" mode (also non-default). So you have to go pretty deep
into the wilderness to get this code to activate in production. It is
never used during ingestion.
aggregators get used, because the way it uses buffer aggregators is so
different from how every other query engine uses them.
By the way, I do believe that the given way the offheap incremental index
works, it actually didn't require buffer aggregators to be thread-safe.
It synchronizes on "aggregate" and doesn't call "get" until it has
stopped calling "aggregate". Nevertheless, this is a bother to think about,
and for the above reasons I think it makes sense to remove the code anyway.