Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate catalog schema validation into planner -WIP #15711

Closed
wants to merge 7 commits into from

Conversation

zachjsh
Copy link
Contributor

@zachjsh zachjsh commented Jan 17, 2024

Description

This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner #13686 from @paul-rogers, allowing the datasource table schemas defined in the catalog to be validated against when ingesting data into the underlying datasource, during SQL based ingestion. This allows for SQL based ingestion into a table with a declared schema to produce segments that conform to that schema. If partitioning and clustering is not defined at ingestion time, defaults for these parameters, as defined in the catalog for the table, if found, are used.

TODO: add more tests.

Release note


Key changed/added classes in this PR
  • MyFoo
  • OurBar
  • TheirBaz

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@zachjsh zachjsh requested a review from jon-wei January 17, 2024 21:47
@github-actions github-actions bot added Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jan 17, 2024
* Instead return what would have been sent to the execution engine.
* The result is a Jackson-serializable query plan.
*/
default Object explain(DruidQuery druidQuery)

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'druidQuery' is never used.
@@ -94,6 +94,7 @@
QueryMaker buildQueryMakerForInsert(
String targetDataSource,
RelRoot relRoot,
PlannerContext plannerContext
PlannerContext plannerContext,
RelDataType targetType

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'targetType' is never used.
}
{
(
<HOUR>
{
granularity = Granularities.HOUR;
unparseString = "HOUR";
result = SqlLiteral.createCharString(DruidSqlParserUtils.HOUR_GRAIN, getPos());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be possible to use SqlLiteral.createSymbol here instead; that could remove the need for the string based matching as well...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using SqlLiteral.createSymbol as you suggested

// Add the necessary indirection. The type factory used here
// is the Druid one, since the per-query one is not yet available
// here. Nor are built-in function associated with per-query types.
this.operatorTable = new ChainedSqlOperatorTable(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is new functionality - could it be in a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! removed this

@@ -58,7 +58,7 @@ public void testUnparseReplaceAll() throws ParseException
+ "OVERWRITE ALL\n"
+ "SELECT *\n"
+ " FROM \"foo\"\n"
+ "PARTITIONED BY ALL TIME "
+ "PARTITIONED BY 'ALL TIME' "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the result of the unparse valid ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed back

Copy link

github-actions bot commented Apr 3, 2024

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 3, 2024
Copy link

github-actions bot commented May 1, 2024

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants