Skip to content

Add interface method for returning canonical lookup name#16557

Merged
abhishekrb19 merged 4 commits intoapache:masterfrom
Akshat-Jain:canonical-lookup-name-getter
Jun 5, 2024
Merged

Add interface method for returning canonical lookup name#16557
abhishekrb19 merged 4 commits intoapache:masterfrom
Akshat-Jain:canonical-lookup-name-getter

Conversation

@Akshat-Jain
Copy link
Contributor

@Akshat-Jain Akshat-Jain commented Jun 5, 2024

Problem

Custom lookup extensions can have different lookup name syntax - i.e., the LOOKUP function can have semantics for the lookup name. With the selective loading of lookups in MSQ, the lookup loads only the required lookups based on what's specified in the ingest SQL.

However, with custom lookup implementations, the lookupName parsed from the query can differ from what's configured. Because of this mismatch, the LookupReferencesManager will not load the lookups specified in the query if it doesn't find an exact match with what's configured with the coordinator. As a result, MSQ queries involving such lookups will fail.

Description

This PR adds support for returning the canonical lookup name in the LookupExtractorFactoryContainerProvider interface. It delegates getting the sanitized lookup name to the underlying LookupExtractorFactoryContainerProvider:

  • For the druid-lookups-cached-global, druid-lookups-cached-single and kafka-extraction-namespace, the canonical name is the same as the lookup name.
  • For other implementations, it offers the flexibility to return a different canonical lookup name for the parsed lookup name if needed.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@abhishekrb19 abhishekrb19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @Akshat-Jain!

/**
* Returns the canonical lookup name from a lookup name.
*/
String getCanonicalLookupName(String lookupName);
Copy link
Contributor

@LakshSingla LakshSingla Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming nit:

  1. This can be getLookupName(String name). Canonical is confusing, and isn't defined in the javadoc here. See point (2) below as well.

other comments:

  1. This method isn't marked as PublicApi however custom extensions can extend it for own lookup implementations. Please add a default implementation for the same.
  2. When should this method be called. As of now, this is a super specialized method that is called in QueryLookupOperatorConversion. Perhaps it should be called out and named as such. As a developer, it is unclear to me when must I call it and when not. For example, why am I not calling it on https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/schema/LookupSchema.java#L61?

Copy link
Contributor Author

@Akshat-Jain Akshat-Jain Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LakshSingla
I had a discussion with @abhishekrb19 about the naming of this method. We concluded on getCanonicalLookupName().

Regarding the other comments:

  1. I don't see the other methods marked as PublicApi either? So seems like an unrelated issue to this PR? Thoughts?
  2. This would be used in a custom extension. Would raise a future PR once this merges where it would be more clear. But in general, if in future we add a lookup name with special syntaxing, then this would be needed.

Copy link
Contributor

@abhishekrb19 abhishekrb19 Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LakshSingla, good points. I think we need to clarify the Javadoc a bit more to clarify the intent of the new method and/or rename accordingly. We can also call it getLookupName(String name) too, but that's a bit ambiguous imo. Naming, thoughts? 😅

As far as 1 is concerned, the interface isn't annotated @PublicAPI or @ExtensionsPoint. Also, this PR #9281 changed the method signature and added a new interface method without default implementations. Following that pattern, I believe it's safe to add new methods without providing a default implementation. In general, my understanding is that for custom extensions that directly use/extend interfaces that are not public APIs (i.e., not annotated with PublicAPI or ExtensionsPoint), the onus is on the developers maintaining custom implementations to sync with changes coming from upstream.

For 2, LookupSchema is powering the SQL view for the lookups configured in the system. For consistency, I think it'd make sense to also call the new function there as you pointed out. I'm okay with doing that as a follow up too.

@abhishekrb19
Copy link
Contributor

Merging this PR. The two suggestions from @LakshSingla can be addressed in a follow-up. Thanks for the fix, @Akshat-Jain!

@abhishekrb19 abhishekrb19 merged commit 6d7d2ff into apache:master Jun 5, 2024
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants