Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. #11124

Merged
merged 7 commits into from
Oct 26, 2021

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Apr 16, 2021

This patch does the following:

  • Removes OffheapIncrementalIndex.
  • Clarifies that Aggregators are required to be thread safe.
  • Clarifies that BufferAggregators and VectorAggregators are not
    required to be thread safe.
  • Removes thread safety code from some DataSketches aggregators that
    had it. (Not all of them did, and that's OK, because it wasn't necessary
    anyway.)
  • Makes enabling "useOffheap" with groupBy v1 an error.

Rationale for removing the offheap incremental index:

  • It is only used in one rare scenario: groupBy v1 (which is non-default)
    in "useOffheap" mode (also non-default). So you have to go pretty deep
    into the wilderness to get this code to activate in production. It is
    never used during ingestion.
  • Its existence complicates developer efforts to reason about how
    aggregators get used, because the way it uses buffer aggregators is so
    different from how every other query engine uses them.
  • It doesn't have meaningful testing.

By the way, I do believe that the given way the offheap incremental index
works, it actually didn't require buffer aggregators to be thread-safe.
It synchronizes on "aggregate" and doesn't call "get" until it has
stopped calling "aggregate". Nevertheless, this is a bother to think about,
and for the above reasons I think it makes sense to remove the code anyway.

…eeds.

This patch does the following:

- Removes OffheapIncrementalIndex.
- Clarifies that Aggregators are required to be thread safe.
- Clarifies that BufferAggregators and VectorAggregators are not
  required to be thread safe.
- Removes thread safety code from some DataSketches aggregators that
  had it. (Not all of them did, and that's OK, because it wasn't necessary
  anyway.)
- Makes enabling "useOffheap" with groupBy v1 an error.

Rationale for removing the offheap incremental index:

- It is only used in one rare scenario: groupBy v1 (which is non-default)
  in "useOffheap" mode (also non-default). So you have to go pretty deep
  into the wilderness to get this code to activate in production. It is
  never used during ingestion.
- Its existence complicates developer efforts to reason about how
  aggregators get used, because the way it uses buffer aggregators is so
  different from how every other query engine uses them.
- It doesn't have meaningful testing.

By the way, I do believe that the given way the offheap incremental index
works, it actually didn't require buffer aggregators to be thread-safe.
It synchronizes on "aggregate" and doesn't call "get" until it has
stopped calling "aggregate". Nevertheless, this is a bother to think about,
and for the above reasons I think it makes sense to remove the code anyway.
@clintropolis
Copy link
Member

overall +1 on removing this. This PR is failing some intellij inspections, but we might want to suppress them to accommodate #10001, which does appear to use some of these methods.

@gianm
Copy link
Contributor Author

gianm commented Apr 16, 2021

overall +1 on removing this. This PR is failing some intellij inspections, but we might want to suppress them to accommodate #10001, which does appear to use some of these methods.

I took a look at #10001, and I don't see usage of getAggVal there, so I'll convert it to a private method and remove the unused params. It does use getAggs though, so I'll suppress the warning on that one.

@gianm
Copy link
Contributor Author

gianm commented Apr 17, 2021

I took a look at #10001, and I don't see usage of getAggVal there, so I'll convert it to a private method and remove the unused params. It does use getAggs though, so I'll suppress the warning on that one.

Oh, nevermind, it's used by the parent class. But the OakIncrementalIndex doesn't support that method anyway: https://github.com/apache/druid/pull/10001/files#diff-8a952057b2e59239cbd4cbff14c3202ea7d8cd85b4bb4c174d9375b00a0e7b40R244-R250

@gianm
Copy link
Contributor Author

gianm commented Apr 17, 2021

Wow, there was a lot of thread to pull on behind the intellij "unused item" inspections. The primitive get methods of BufferAggregator are no longer needed; nor is the BufferPool passed to the groupBy v1 merger. I removed them in the latest commit. I added an "unused" suppression to the stuff in IncrementalIndex that is unused now, but that #10001 may need.

@liran-funaro
Copy link
Contributor

I added an "unused" suppression to the stuff in IncrementalIndex that is unused now, but that #10001 may need.

You can avoid using "unused" annotation by allowing AggregatorType[] aggs to be defined as protected.
This might be a more straightforward approach.

I can rebase #10001 after this PR will be merged.
Or alternatively, if you prefer to merge #10001 before this PR, I can modify aggs to be protected in #10001. Then you won't have to deal with this at all in this PR.

Please let me know what you think.

@gianm
Copy link
Contributor Author

gianm commented Apr 21, 2021

I'm ok either way @liran-funaro. It depends on which patch is merged first, I think.

Copy link
Contributor

@liran-funaro liran-funaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that #10001 ( OakIncrementalIndex ) needs that BufferAggregator will have the same API as Aggregator.
OakIncrementalIndex takes care of the concurrency, so it is OK to remove the concurrency requirements from BufferAggregator.

I would also suggest removing the template modifier from IncrementalIndex<AggregatorType> . It is no longer required:

  • private final AggregatorType[] aggs; can be removed. It was only used by OffheapIncrementalIndex.
  • iterableWithPostAggregations() should be defined as abstract and be implemented in OnheapIncrementalIndex. OakIncrementalIndex has its own implementation for this method.
  • getAggsForRow() is also redundant.
  • initAggs() can return void.

In fact, I already have this modification implemented: #11160.
Please consider merging it before this PR.

Comment on lines 96 to 153
/**
* Returns the float representation of the given aggregate byte array
*
* Converts the given byte buffer representation into the intermediate aggregate value.
*
* <b>Implementations must not change the position, limit or mark of the given buffer</b>
*
* Implementations are only required to support this method if they are aggregations which
* have an {@link AggregatorFactory#getType()} ()} of {@link org.apache.druid.segment.column.ValueType#FLOAT}.
* If unimplemented, throwing an {@link UnsupportedOperationException} is common and recommended.
*
* @param buf byte buffer storing the byte array representation of the aggregate
* @param position offset within the byte buffer at which the aggregate value is stored
* @return the float representation of the aggregate
*/
float getFloat(ByteBuffer buf, int position);

/**
* Returns the long representation of the given aggregate byte array
*
* Converts the given byte buffer representation into the intermediate aggregate value.
*
* <b>Implementations must not change the position, limit or mark of the given buffer</b>
*
* Implementations are only required to support this method if they are aggregations which
* have an {@link AggregatorFactory#getType()} of of {@link org.apache.druid.segment.column.ValueType#LONG}.
* If unimplemented, throwing an {@link UnsupportedOperationException} is common and recommended.
*
* @param buf byte buffer storing the byte array representation of the aggregate
* @param position offset within the byte buffer at which the aggregate value is stored
* @return the long representation of the aggregate
*/
long getLong(ByteBuffer buf, int position);

/**
* Returns the double representation of the given aggregate byte array
*
* Converts the given byte buffer representation into the intermediate aggregate value.
*
* <b>Implementations must not change the position, limit or mark of the given buffer</b>
*
* Implementations are only required to support this method if they are aggregations which
* have an {@link AggregatorFactory#getType()} of of {@link org.apache.druid.segment.column.ValueType#DOUBLE}.
* If unimplemented, throwing an {@link UnsupportedOperationException} is common and recommended.
*
* The default implementation casts {@link BufferAggregator#getFloat(ByteBuffer, int)} to double.
* This default method is added to enable smooth backward compatibility, please re-implement it if your aggregators
* work with numeric double columns.
*
* @param buf byte buffer storing the byte array representation of the aggregate
* @param position offset within the byte buffer at which the aggregate value is stored
* @return the double representation of the aggregate
*/
default double getDouble(ByteBuffer buf, int position)
{
return (double) getFloat(buf, position);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important for #10001 ( OakIncrementalIndex ) that BufferAggregator will have the same API as Aggregator.
So please keep these methods. Also please keep isNull().

@gianm
Copy link
Contributor Author

gianm commented Oct 22, 2021

Per #11124 (comment) I reverted the BufferAggregator and IncrementalIndex API changes mentioned in #11124 (comment).

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1 for removing OffheapIncrementalIndex!

@jihoonson
Copy link
Contributor

@liran-funaro thank you for taking a look at this PR and sorry for taking a long time to review your PRs. I quickly reviewed #11160 and left one question on it. Since this PR now has enough approvals, would you mind if we merge this PR before #11160 gets merged?

@liran-funaro
Copy link
Contributor

Sure. Go ahead.

@gianm
Copy link
Contributor Author

gianm commented Oct 26, 2021

Thanks!

@gianm gianm merged commit fc95c92 into apache:master Oct 26, 2021
@gianm gianm deleted the remove-offheap-incremental-index branch October 26, 2021 15:05
jon-wei pushed a commit to jon-wei/druid that referenced this pull request Nov 22, 2021
* fix type

* Revert "Fix Keyclock auth integration test based on upstream changes (apache#622)"

This reverts commit c1083fb.

* Bump netty4 to 4.1.68; suppress CVE-2021-37136 and CVE-2021-37137 for netty3 (apache#11844)

* bump netty4 to 4.1.68

* suppress CVE-2021-37136 and CVE-2021-37137 for netty3

* license

* add `prometheus-emitter` to distribution (apache#11812)

* add `prometheus-emitter` to distribution

Signed-off-by: Đặng Minh Dũng <dungdm93@live.com>

* add `druid-momentsketch` to distribution

Signed-off-by: Đặng Minh Dũng <dungdm93@live.com>

* Web console: update typescript 4.4 for faster build speeds (apache#11725)

* update typescript

* do not show pagination when there is only one page

* update snapshots

* fix pagination

* Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. (apache#11124)

* Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs.

This patch does the following:

- Removes OffheapIncrementalIndex.
- Clarifies that Aggregators are required to be thread safe.
- Clarifies that BufferAggregators and VectorAggregators are not
  required to be thread safe.
- Removes thread safety code from some DataSketches aggregators that
  had it. (Not all of them did, and that's OK, because it wasn't necessary
  anyway.)
- Makes enabling "useOffheap" with groupBy v1 an error.

Rationale for removing the offheap incremental index:

- It is only used in one rare scenario: groupBy v1 (which is non-default)
  in "useOffheap" mode (also non-default). So you have to go pretty deep
  into the wilderness to get this code to activate in production. It is
  never used during ingestion.
- Its existence complicates developer efforts to reason about how
  aggregators get used, because the way it uses buffer aggregators is so
  different from how every other query engine uses them.
- It doesn't have meaningful testing.

By the way, I do believe that the given way the offheap incremental index
works, it actually didn't require buffer aggregators to be thread-safe.
It synchronizes on "aggregate" and doesn't call "get" until it has
stopped calling "aggregate". Nevertheless, this is a bother to think about,
and for the above reasons I think it makes sense to remove the code anyway.

* Remove things that are now unused.

* Revert removal of getFloat, getLong, getDouble from BufferAggregator.

* OAK-related warnings, suppressions.

* Unused item suppressions.

Co-authored-by: Đặng Minh Dũng <dungdm93@live.com>
Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants