Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. #11124

gianm · 2021-04-16T19:02:44Z

This patch does the following:

Removes OffheapIncrementalIndex.
Clarifies that Aggregators are required to be thread safe.
Clarifies that BufferAggregators and VectorAggregators are not
required to be thread safe.
Removes thread safety code from some DataSketches aggregators that
had it. (Not all of them did, and that's OK, because it wasn't necessary
anyway.)
Makes enabling "useOffheap" with groupBy v1 an error.

Rationale for removing the offheap incremental index:

It is only used in one rare scenario: groupBy v1 (which is non-default)
in "useOffheap" mode (also non-default). So you have to go pretty deep
into the wilderness to get this code to activate in production. It is
never used during ingestion.
Its existence complicates developer efforts to reason about how
aggregators get used, because the way it uses buffer aggregators is so
different from how every other query engine uses them.
It doesn't have meaningful testing.

By the way, I do believe that the given way the offheap incremental index
works, it actually didn't require buffer aggregators to be thread-safe.
It synchronizes on "aggregate" and doesn't call "get" until it has
stopped calling "aggregate". Nevertheless, this is a bother to think about,
and for the above reasons I think it makes sense to remove the code anyway.

…eeds. This patch does the following: - Removes OffheapIncrementalIndex. - Clarifies that Aggregators are required to be thread safe. - Clarifies that BufferAggregators and VectorAggregators are not required to be thread safe. - Removes thread safety code from some DataSketches aggregators that had it. (Not all of them did, and that's OK, because it wasn't necessary anyway.) - Makes enabling "useOffheap" with groupBy v1 an error. Rationale for removing the offheap incremental index: - It is only used in one rare scenario: groupBy v1 (which is non-default) in "useOffheap" mode (also non-default). So you have to go pretty deep into the wilderness to get this code to activate in production. It is never used during ingestion. - Its existence complicates developer efforts to reason about how aggregators get used, because the way it uses buffer aggregators is so different from how every other query engine uses them. - It doesn't have meaningful testing. By the way, I do believe that the given way the offheap incremental index works, it actually didn't require buffer aggregators to be thread-safe. It synchronizes on "aggregate" and doesn't call "get" until it has stopped calling "aggregate". Nevertheless, this is a bother to think about, and for the above reasons I think it makes sense to remove the code anyway.

clintropolis · 2021-04-16T23:21:38Z

overall +1 on removing this. This PR is failing some intellij inspections, but we might want to suppress them to accommodate #10001, which does appear to use some of these methods.

gianm · 2021-04-16T23:59:41Z

overall +1 on removing this. This PR is failing some intellij inspections, but we might want to suppress them to accommodate #10001, which does appear to use some of these methods.

I took a look at #10001, and I don't see usage of getAggVal there, so I'll convert it to a private method and remove the unused params. It does use getAggs though, so I'll suppress the warning on that one.

gianm · 2021-04-17T00:03:35Z

I took a look at #10001, and I don't see usage of getAggVal there, so I'll convert it to a private method and remove the unused params. It does use getAggs though, so I'll suppress the warning on that one.

Oh, nevermind, it's used by the parent class. But the OakIncrementalIndex doesn't support that method anyway: https://github.com/apache/druid/pull/10001/files#diff-8a952057b2e59239cbd4cbff14c3202ea7d8cd85b4bb4c174d9375b00a0e7b40R244-R250

gianm · 2021-04-17T00:41:23Z

Wow, there was a lot of thread to pull on behind the intellij "unused item" inspections. The primitive get methods of BufferAggregator are no longer needed; nor is the BufferPool passed to the groupBy v1 merger. I removed them in the latest commit. I added an "unused" suppression to the stuff in IncrementalIndex that is unused now, but that #10001 may need.

liran-funaro · 2021-04-20T10:49:58Z

I added an "unused" suppression to the stuff in IncrementalIndex that is unused now, but that #10001 may need.

You can avoid using "unused" annotation by allowing AggregatorType[] aggs to be defined as protected.
This might be a more straightforward approach.

I can rebase #10001 after this PR will be merged.
Or alternatively, if you prefer to merge #10001 before this PR, I can modify aggs to be protected in #10001. Then you won't have to deal with this at all in this PR.

Please let me know what you think.

gianm · 2021-04-21T22:28:17Z

I'm ok either way @liran-funaro. It depends on which patch is merged first, I think.

liran-funaro

Please note that #10001 ( OakIncrementalIndex ) needs that BufferAggregator will have the same API as Aggregator.
OakIncrementalIndex takes care of the concurrency, so it is OK to remove the concurrency requirements from BufferAggregator.

I would also suggest removing the template modifier from IncrementalIndex<AggregatorType> . It is no longer required:

private final AggregatorType[] aggs; can be removed. It was only used by OffheapIncrementalIndex.
iterableWithPostAggregations() should be defined as abstract and be implemented in OnheapIncrementalIndex. OakIncrementalIndex has its own implementation for this method.
getAggsForRow() is also redundant.
initAggs() can return void.

In fact, I already have this modification implemented: #11160.
Please consider merging it before this PR.

liran-funaro · 2021-04-26T10:18:54Z

processing/src/main/java/org/apache/druid/query/aggregation/BufferAggregator.java

-  /**
-   * Returns the float representation of the given aggregate byte array
-   *
-   * Converts the given byte buffer representation into the intermediate aggregate value.
-   *
-   * <b>Implementations must not change the position, limit or mark of the given buffer</b>
-   *
-   * Implementations are only required to support this method if they are aggregations which
-   * have an {@link AggregatorFactory#getType()} ()} of {@link org.apache.druid.segment.column.ValueType#FLOAT}.
-   * If unimplemented, throwing an {@link UnsupportedOperationException} is common and recommended.
-   *
-   * @param buf byte buffer storing the byte array representation of the aggregate
-   * @param position offset within the byte buffer at which the aggregate value is stored
-   * @return the float representation of the aggregate
-   */
-  float getFloat(ByteBuffer buf, int position);
-
-  /**
-   * Returns the long representation of the given aggregate byte array
-   *
-   * Converts the given byte buffer representation into the intermediate aggregate value.
-   *
-   * <b>Implementations must not change the position, limit or mark of the given buffer</b>
-   *
-   * Implementations are only required to support this method if they are aggregations which
-   * have an {@link AggregatorFactory#getType()} of  of {@link org.apache.druid.segment.column.ValueType#LONG}.
-   * If unimplemented, throwing an {@link UnsupportedOperationException} is common and recommended.
-   *
-   * @param buf byte buffer storing the byte array representation of the aggregate
-   * @param position offset within the byte buffer at which the aggregate value is stored
-   * @return the long representation of the aggregate
-   */
-  long getLong(ByteBuffer buf, int position);
-
-  /**
-   * Returns the double representation of the given aggregate byte array
-   *
-   * Converts the given byte buffer representation into the intermediate aggregate value.
-   *
-   * <b>Implementations must not change the position, limit or mark of the given buffer</b>
-   *
-   * Implementations are only required to support this method if they are aggregations which
-   * have an {@link AggregatorFactory#getType()} of  of {@link org.apache.druid.segment.column.ValueType#DOUBLE}.
-   * If unimplemented, throwing an {@link UnsupportedOperationException} is common and recommended.
-   *
-   * The default implementation casts {@link BufferAggregator#getFloat(ByteBuffer, int)} to double.
-   * This default method is added to enable smooth backward compatibility, please re-implement it if your aggregators
-   * work with numeric double columns.
-   *
-   * @param buf byte buffer storing the byte array representation of the aggregate
-   * @param position offset within the byte buffer at which the aggregate value is stored
-   * @return the double representation of the aggregate
-   */
-  default double getDouble(ByteBuffer buf, int position)
-  {
-    return (double) getFloat(buf, position);
-  }
-


It is important for #10001 ( OakIncrementalIndex ) that BufferAggregator will have the same API as Aggregator.
So please keep these methods. Also please keep isNull().

gianm · 2021-10-22T20:49:28Z

Per #11124 (comment) I reverted the BufferAggregator and IncrementalIndex API changes mentioned in #11124 (comment).

jihoonson

LGTM. +1 for removing OffheapIncrementalIndex!

jihoonson · 2021-10-25T23:29:58Z

@liran-funaro thank you for taking a look at this PR and sorry for taking a long time to review your PRs. I quickly reviewed #11160 and left one question on it. Since this PR now has enough approvals, would you mind if we merge this PR before #11160 gets merged?

liran-funaro · 2021-10-26T06:43:53Z

Sure. Go ahead.

gianm · 2021-10-26T15:05:52Z

Thanks!

* fix type * Revert "Fix Keyclock auth integration test based on upstream changes (apache#622)" This reverts commit c1083fb. * Bump netty4 to 4.1.68; suppress CVE-2021-37136 and CVE-2021-37137 for netty3 (apache#11844) * bump netty4 to 4.1.68 * suppress CVE-2021-37136 and CVE-2021-37137 for netty3 * license * add `prometheus-emitter` to distribution (apache#11812) * add `prometheus-emitter` to distribution Signed-off-by: Đặng Minh Dũng <dungdm93@live.com> * add `druid-momentsketch` to distribution Signed-off-by: Đặng Minh Dũng <dungdm93@live.com> * Web console: update typescript 4.4 for faster build speeds (apache#11725) * update typescript * do not show pagination when there is only one page * update snapshots * fix pagination * Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. (apache#11124) * Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. This patch does the following: - Removes OffheapIncrementalIndex. - Clarifies that Aggregators are required to be thread safe. - Clarifies that BufferAggregators and VectorAggregators are not required to be thread safe. - Removes thread safety code from some DataSketches aggregators that had it. (Not all of them did, and that's OK, because it wasn't necessary anyway.) - Makes enabling "useOffheap" with groupBy v1 an error. Rationale for removing the offheap incremental index: - It is only used in one rare scenario: groupBy v1 (which is non-default) in "useOffheap" mode (also non-default). So you have to go pretty deep into the wilderness to get this code to activate in production. It is never used during ingestion. - Its existence complicates developer efforts to reason about how aggregators get used, because the way it uses buffer aggregators is so different from how every other query engine uses them. - It doesn't have meaningful testing. By the way, I do believe that the given way the offheap incremental index works, it actually didn't require buffer aggregators to be thread-safe. It synchronizes on "aggregate" and doesn't call "get" until it has stopped calling "aggregate". Nevertheless, this is a bother to think about, and for the above reasons I think it makes sense to remove the code anyway. * Remove things that are now unused. * Revert removal of getFloat, getLong, getDouble from BufferAggregator. * OAK-related warnings, suppressions. * Unused item suppressions. Co-authored-by: Đặng Minh Dũng <dungdm93@live.com> Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com>

gianm added the Area - Querying label Apr 16, 2021

gianm mentioned this pull request Apr 16, 2021

Vectorized versions of HllSketch aggregators. #11115

Merged

clintropolis added the Design Review label Apr 16, 2021

Remove things that are now unused.

82064d8

Merge branch 'master' into remove-offheap-incremental-index

55b475b

liran-funaro suggested changes Apr 26, 2021

View reviewed changes

liran-funaro mentioned this pull request Apr 26, 2021

Remove IncrementalIndex template modifier #11160

Merged

gianm added 3 commits October 22, 2021 13:29

Merge branch 'master' into remove-offheap-incremental-index

79985d0

Revert removal of getFloat, getLong, getDouble from BufferAggregator.

0e3c805

OAK-related warnings, suppressions.

687b0f7

Unused item suppressions.

12195a3

clintropolis approved these changes Oct 24, 2021

View reviewed changes

liran-funaro approved these changes Oct 24, 2021

View reviewed changes

jihoonson approved these changes Oct 25, 2021

View reviewed changes

gianm merged commit fc95c92 into apache:master Oct 26, 2021

gianm deleted the remove-offheap-incremental-index branch October 26, 2021 15:05

liran-funaro mentioned this pull request Jan 5, 2022

Remove "offheap" IncrementalIndex from Benchmarks #12121

Closed

1 task

abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. #11124

Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. #11124

gianm commented Apr 16, 2021 •

edited

Loading

clintropolis commented Apr 16, 2021

gianm commented Apr 16, 2021

gianm commented Apr 17, 2021

gianm commented Apr 17, 2021

liran-funaro commented Apr 20, 2021

gianm commented Apr 21, 2021

liran-funaro left a comment •

edited

Loading

liran-funaro Apr 26, 2021

gianm commented Oct 22, 2021

jihoonson left a comment

jihoonson commented Oct 25, 2021

liran-funaro commented Oct 26, 2021

gianm commented Oct 26, 2021

Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. #11124

Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. #11124

Conversation

gianm commented Apr 16, 2021 • edited Loading

clintropolis commented Apr 16, 2021

gianm commented Apr 16, 2021

gianm commented Apr 17, 2021

gianm commented Apr 17, 2021

liran-funaro commented Apr 20, 2021

gianm commented Apr 21, 2021

liran-funaro left a comment • edited Loading

Choose a reason for hiding this comment

liran-funaro Apr 26, 2021

Choose a reason for hiding this comment

gianm commented Oct 22, 2021

jihoonson left a comment

Choose a reason for hiding this comment

jihoonson commented Oct 25, 2021

liran-funaro commented Oct 26, 2021

gianm commented Oct 26, 2021

gianm commented Apr 16, 2021 •

edited

Loading

liran-funaro left a comment •

edited

Loading