-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuse IndexedInts returned from DimensionSelector.getRow() implementations #5172
Conversation
@gianm could you please review this? |
I'll also review this PR. |
@jihoonson thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* {@link io.druid.query.aggregation.AggregateCombiner#fold} should be prepared for that and not storing the object | ||
* returned from this method in their state, assuming that the object will remain unchanged even when the position of | ||
* the selector changes. This may not be the case. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this new contract might this break SelectQuery as there we just call getObject and put that object in an event ?
https://github.com/druid-io/druid/blob/535ec437e925effee920e5643277fde2a1b69175/processing/src/main/java/io/druid/query/select/SelectQueryEngine.java#L313
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this finding. I think I will need to add a method like getObjectNotReusable()
to BaseObjectColumnValueSelector
and use it in SelectQueryEngine
in one of the future PRs, where object reuse is actually implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I am worried about changing the api semantics for existing apis in a backwards incompatible way. In this case if any extensions are using this api, they might break without any compilation error. I think a better way might be to add another api which returns some Holder object.
- Also, IMO api that reuse objects internally are a bit tricky and are quite fragile to implement. Another way could be to use Druid Sequence or a similar api which do not expose the underlying object and provide a way to accumulate or apply computations to the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is fair, while developing the actual PR that makes getObject() to return reusable objects I came to the same conclusion, and decided to also rename the "main" method to getReusableObject(), and the second is getNotReusableObject().
Using any "holder" interlayer won't allow to bypass the fact that we inherently going to reuse the same, reusable objects. So I don't see the point of adding such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Implementors of custom
ObjectStrategy
could always retreat to use not reusable objects, it just won't be that efficient when a column with theirObjectStrategy
is used. Users of this API could also retreat to always usegetNotReusableObject()
if they are unsure.
I don't say that introduction of object reusability doesn't add complexity, it does, but there are no interfaces that allow to hide this complexity, Sequence
, ColumnValueSelector
or any else, because it's still about making immutable objects mutable. Not to say that rewriting everything from ColumnValueSelector
to Sequence
would be a MASSIVE refactoring and it's absolutely not guaranteed that it will make anything clearer. So it doesn't seem realistic to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR, I removed this doc update of getObject()
, because it's premature, this PR is about IndexedInts
and DimensionSelector
s. Let's focus on it.
@nishantmonu51 do you have further comments? |
@nishantmonu51 could you please comment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, one minor nit : would be nice to add some basic unit tests that test the reference equality for DimensionSelector.getRow and verifies that the object is reused, would prevent from breaking this accidentally in some future PR.
@nishantmonu51 Thanks for review. Opened #5267 |
This PR starts the course towards reusable objects returned
ColumnValueSelector.getXxx()
. This PR makesIndexedInts
returned from allDimensionSelector.getRow()
implementations being reused.This is essential for #4622, but could also reduce garbage creation during querying.