Skip to content

Use mutable Map for creating DruidPlanner in DruidViewMacro#16376

Closed
Akshat-Jain wants to merge 3 commits intoapache:masterfrom
Akshat-Jain:use-hashmap-for-createPlanner
Closed

Use mutable Map for creating DruidPlanner in DruidViewMacro#16376
Akshat-Jain wants to merge 3 commits intoapache:masterfrom
Akshat-Jain:use-hashmap-for-createPlanner

Conversation

@Akshat-Jain
Copy link
Contributor

@Akshat-Jain Akshat-Jain commented May 2, 2024

Description

Currently, DruidViewMacro was using Collections.emptyMap(), that is, an immutable map for creating the DruidPlanner. This doesn't sit well with the contract of PlannerContext#queryContextMap which is expected to return a mutable map.

Currently, if anyone were to rely on getting a mutable map as per the contract, it can run into issues in code flows where the planner is being created using immutable map. For example, 500+ tests failed because of adding this line since I assumed the return type must be mutable.

This PR shifts away from the immutable map for creating planner to ensure everyone can rely on the returned map being mutable without breaking any code flows. With this PR, the planner's constructor creates a mutable map even if an immutable map was passed to it.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@Akshat-Jain Akshat-Jain force-pushed the use-hashmap-for-createPlanner branch from d235e55 to bcc1db3 Compare May 2, 2024 16:56
@kfaraz
Copy link
Contributor

kfaraz commented May 2, 2024

@Akshat-Jain , could you please include provide more context in the description?

  • Why is the change needed?
  • Without the change, why kind of use cases fail?
  • How have things been working so far?

Also, this seems like a minute difference which a future dev is likely to miss.
Best to comment mentioning why a certain kind of map is needed and also add a test to verify the same.

@Akshat-Jain
Copy link
Contributor Author

@kfaraz Thanks for the review! Answering all questions below:

Why is the change needed?

The contract for PlannerContext#queryContextMap is misleading. It says that it returns a mutable map, but using Collections.emptyMap() for creating the planner ends up violating that. So either the contract should be re-defined, or we shouldn't use immutable map for creating the planner.

Without the change, why kind of use cases fail?

In my PR for loading lookups selectively for MSQ tasks, I had added plannerContext.queryContextMap().putIfAbsent(PlannerContext.CTX_LOOKUPS_TO_LOAD, new HashSet<>()); in SqlResourceCollectorShuttle#visit here since the expectation is that plannerContext.queryContextMap() returns a mutable map. But this ended up failing 500+ tests with Unsupported operation errors since those tests were going through the immutable map code flow.

So the use-cases that would fail are trying to modify the query context map using plannerContext.queryContextMap() in the code flows where PlannerContext's queryContextMap has been initialized as an immutable map.

How have things been working so far?

I guess there wasn't a use-case so far, or we worked around this intricacy somehow? I'm not entirely sure. And I fully agree that if it weren't for my other PR, this change isn't really "needed". It's just that I spent time figuring this out, and would like to prevent the same for any future dev.

and also add a test to verify the same.

Any suggestions on what test can I add for this? I personally think my other PR goes through this code flow, so it would just get tested there. I'm not sure if a test to validate the mutability of a param is needed, or adds much value? Happy to know your thoughts on this though!

could you please include provide more context in the description?

Sure, I'll add some of the details from this answer to the PR description.

@kfaraz
Copy link
Contributor

kfaraz commented May 2, 2024

Thanks for the details, @Akshat-Jain . Based on what you describe, it would make sense to include this change in the other PR itself (where you encounter failing tests) as that would provide the natural context and justification for this change.

@Akshat-Jain
Copy link
Contributor Author

@kfaraz I'm currently discussing the approach in that PR as there was an alternate approach suggestion by Zoltan. But I think this change should go in even if the other PR didn't exist, hence raised a separate PR.

I'm okay either way though.

@kfaraz
Copy link
Contributor

kfaraz commented May 3, 2024

You mentioned that

  • there probably hasn't been a use case for this so far
  • there are tests in that other PR which test this out
  • there are changes in that other PR which require the Map to be mutable

Hence, it would much simpler for other reviewers to understand this change if it came as a part of the other PR.

However, if this needs to go in a separate change, then it must have a test that verifies the mutability of the map.

Also, the correct fix for this is not passing a new HashMap() into PlannerContext. The contract of PlannerContext must not depend on what a caller has provided. In other words, PlannerContext itself must ensure that the map is mutable irrespective of what has been provided. In the constructor, copy over the stuff into a new mutable map.

@Akshat-Jain
Copy link
Contributor Author

@kfaraz Have made the suggested change to shift logic to constructor, and have added a test for the constructor logic (including mutability check).

I'm looking at this change as more of an independent change that the other PR just uncovered. The other PR is just an example use-case of this, not the only one necessarily 😅

Hope this works!

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this change again, I am still not convinced that this needs to be done (at the moment anyway), especially now that the other PR is not even using the queryContextMap anymore.

Also, I am not entirely convinced that returning a mutable map is the right approach.
e.g. the PlannerContext.queryContextMap() is being passed into a bunch of different objects as is, e.g. in DruidQuery.toScanAndSortQuery() where we pass the map into a WindowOperatorQuery. Passing the original mutable map into this class might have unforeseen side effects if this new object decides to change the contents of the map.

So, I think it is better to try to fix this once we have a real requirement so that we know exactly what needs to be done. Just trying to conform to the javadoc doesn't seem reason enough to me. We might as well just fix up the javadoc to not say immutable.

public void testMutabilityOfQueryContextMap()
{
// Validate that the map is mutable, even though an immutable map was passed during creation.
plannerContext.queryContextMap().put("test", "value");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't assert anything. Just verifying that put doesn't throw an exception is probably not enough.
A better verification would be to get() the value of the key from the map and check it against the original value used in the put().

@Akshat-Jain
Copy link
Contributor Author

Looking at this change again, I am still not convinced that this needs to be done (at the moment anyway), especially now that the other PR is not even using the queryContextMap anymore.

Also, I am not entirely convinced that returning a mutable map is the right approach. e.g. the PlannerContext.queryContextMap() is being passed into a bunch of different objects as is, e.g. in DruidQuery.toScanAndSortQuery() where we pass the map into a WindowOperatorQuery. Passing the original mutable map into this class might have unforeseen side effects if this new object decides to change the contents of the map.

So, I think it is better to try to fix this once we have a real requirement so that we know exactly what needs to be done. Just trying to conform to the javadoc doesn't seem reason enough to me. We might as well just fix up the javadoc to not say immutable.

@kfaraz This PR is certainly not related to the other PR, like I mentioned before. The other PR just uncovered this intricacy and ran into issues because of it.

We've moved away from query context approach in the other PR, so this PR certainly isn't blocking anything. So I agree that this change isn't really needed right now.

I raised this PR as I felt that this was just a minor unintentional intricacy left unnoticed, which someone would eventually have to deal with.

Changing the javadoc doesn't seem ideal to me, as it's intended to return a mutable map. There's a separate method for returning immutable query context for any use-cases like WindowOperatorQuery that you mentioned:

  /**
   * Return the query context as an immutable object. Use this form
   * when querying the context as it provides type-safe accessors.
   */
  public QueryContext queryContext()
  {
    return QueryContext.of(queryContext);
  }

But since this PR isn't really needed right now, I'll close this PR for now.

@Akshat-Jain Akshat-Jain closed this May 3, 2024
@kfaraz
Copy link
Contributor

kfaraz commented May 5, 2024

I raised this PR as I felt that this was just a minor unintentional intricacy left unnoticed, which someone would eventually have to deal with.
Changing the javadoc doesn't seem ideal to me, as it's intended to return a mutable map.

I agree, if the mutability of the map is really needed, someone will eventually have to deal with it.
But given the current code flow, I am not sure if it is even a requirement.

If you are interested in fixing this, try to go through all the usages of queryContextMap() and see if the returned map is ever actually being updated. If not, we might as well get rid of this method to avoid confusion later. You could update the changes here itself and reopen this PR.

If it turns out from your investigation that we really do need the map to be mutable, then exposing the internal Map<String, Object> for any caller to mutate as they see fit is not appropriate. PlannerContext itself must expose methods such as putInQueryContext() or removeFromQueryContext() and thus be aware of any updates made to the internal map.

Alternatively, the method queryContextMap() could return a MutableQueryContext object which itself exposes put() and remove() methods. Exposing the internal map implementation makes it very difficult to trace back the places which are actually mutating the map.

@Akshat-Jain
Copy link
Contributor Author

see if the returned map is ever actually being updated

@kfaraz It is. Example:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants