Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse IndexedInts returned from DimensionSelector.getRow() implementations #5172

Merged
merged 4 commits into from
Jan 17, 2018

Conversation

leventov
Copy link
Member

This PR starts the course towards reusable objects returned ColumnValueSelector.getXxx(). This PR makes IndexedInts returned from all DimensionSelector.getRow() implementations being reused.

This is essential for #4622, but could also reduce garbage creation during querying.

@leventov leventov changed the title Reuse IndexedInts in DimensionSelector implementations Reuse IndexedInts returned from DimensionSelector.getRow() implementations Dec 18, 2017
@leventov
Copy link
Member Author

leventov commented Jan 4, 2018

@gianm could you please review this?

@jihoonson
Copy link
Contributor

I'll also review this PR.

@leventov
Copy link
Member Author

leventov commented Jan 4, 2018

@jihoonson thanks

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* {@link io.druid.query.aggregation.AggregateCombiner#fold} should be prepared for that and not storing the object
* returned from this method in their state, assuming that the object will remain unchanged even when the position of
* the selector changes. This may not be the case.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this new contract might this break SelectQuery as there we just call getObject and put that object in an event ?
https://github.com/druid-io/druid/blob/535ec437e925effee920e5643277fde2a1b69175/processing/src/main/java/io/druid/query/select/SelectQueryEngine.java#L313

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this finding. I think I will need to add a method like getObjectNotReusable() to BaseObjectColumnValueSelector and use it in SelectQueryEngine in one of the future PRs, where object reuse is actually implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I am worried about changing the api semantics for existing apis in a backwards incompatible way. In this case if any extensions are using this api, they might break without any compilation error. I think a better way might be to add another api which returns some Holder object.
  2. Also, IMO api that reuse objects internally are a bit tricky and are quite fragile to implement. Another way could be to use Druid Sequence or a similar api which do not expose the underlying object and provide a way to accumulate or apply computations to the object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is fair, while developing the actual PR that makes getObject() to return reusable objects I came to the same conclusion, and decided to also rename the "main" method to getReusableObject(), and the second is getNotReusableObject().

Using any "holder" interlayer won't allow to bypass the fact that we inherently going to reuse the same, reusable objects. So I don't see the point of adding such.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Implementors of custom ObjectStrategy could always retreat to use not reusable objects, it just won't be that efficient when a column with their ObjectStrategy is used. Users of this API could also retreat to always use getNotReusableObject() if they are unsure.

I don't say that introduction of object reusability doesn't add complexity, it does, but there are no interfaces that allow to hide this complexity, Sequence, ColumnValueSelector or any else, because it's still about making immutable objects mutable. Not to say that rewriting everything from ColumnValueSelector to Sequence would be a MASSIVE refactoring and it's absolutely not guaranteed that it will make anything clearer. So it doesn't seem realistic to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, I removed this doc update of getObject(), because it's premature, this PR is about IndexedInts and DimensionSelectors. Let's focus on it.

@jihoonson
Copy link
Contributor

@nishantmonu51 do you have further comments?

@leventov
Copy link
Member Author

@nishantmonu51 could you please comment?

Copy link
Member

@nishantmonu51 nishantmonu51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, one minor nit : would be nice to add some basic unit tests that test the reference equality for DimensionSelector.getRow and verifies that the object is reused, would prevent from breaking this accidentally in some future PR.

@leventov
Copy link
Member Author

leventov commented Jan 17, 2018

@nishantmonu51 Thanks for review. Opened #5267

@leventov leventov merged commit ad6cdf5 into apache:master Jan 17, 2018
@leventov leventov deleted the reuse-indexed-ints branch January 17, 2018 15:01
@dclim dclim added this to the 0.13.0 milestone Oct 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants