Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for EXTRACT syntax and converts it to appropriate Pinot expression #9184

Merged
merged 11 commits into from Sep 7, 2022

Conversation

tanmesh
Copy link
Contributor

@tanmesh tanmesh commented Aug 9, 2022

Description

This PR adds support EXTRACT syntax and converts it to its Pinot expression.

This PR will solve the following issue -- #9075

Testing

Verified the desired behavior locally by running CalciteSqlCompilerTest

@tanmesh tanmesh force-pushed the tm/exact_syntax branch 2 times, most recently from 9e508a9 to 80123d6 Compare August 9, 2022 23:41
@tanmesh tanmesh changed the title [WIP] working on the issue to support EXTRACT syntax working on the issue to support EXTRACT syntax Aug 10, 2022
@tanmesh tanmesh changed the title working on the issue to support EXTRACT syntax Add support for EXTRACT syntax and converts it to appropriate Pinot expression Aug 10, 2022
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for help contribute this!

EXTRACT can be modeled as a transform function in Pinot, and we want to change the CalciteSqlParser to support its syntax, and parse it to a function: extract(field, expression) where field is a literal.
We already have all kinds of time conversion functions available (see DateTimeTransformFunction), we may add a ExtractTransformFunction to support the functionality

@tanmesh
Copy link
Contributor Author

tanmesh commented Aug 15, 2022

thanks, @Jackie-Jiang for providing context. Very helpful.

I am trying to understand how ExtractTransformFunction as you suggested will be used in the overall flow (write path and read path). Few follow-up ques from my investigation:

  • Do we store the transformed data (after applying transformations as defined here on raw data) into segments or only raw data into segments and apply the transformations on the fly during Pinot query execution?

@Jackie-Jiang
Copy link
Contributor

ExtractTransformFunction will only be applied at query time. If we want to support ingestion transform, we need to add a ScalarFunction for the extract (you may take a look at DateTimeFunctions` class).

To support this feature, we need 2 parts of the changes:

  1. Calcite parser change to support extract syntax and parse the query into extract(field, expression)
  2. Add ExtractTransformFunction or ScalarFunction to support query/ingestion time transform

@tanmesh tanmesh changed the title Add support for EXTRACT syntax and converts it to appropriate Pinot expression [WIP] Add support for EXTRACT syntax and converts it to appropriate Pinot expression Aug 17, 2022
@tanmesh
Copy link
Contributor Author

tanmesh commented Aug 17, 2022

Hey @Jackie-Jiang , I have completed part 1 of the task.

For Part2:
Can you please help me point to where ‘DateTimeTransformFunction’ (or other query time transformers defined here) are getting applied? I was able to locate TransformFunctionFactory but couldn’t locate where transformations are actually getting applied.

@tanmesh tanmesh force-pushed the tm/exact_syntax branch 2 times, most recently from f986bc6 to 0c381c6 Compare August 17, 2022 21:08
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransformFunction is applied in the TransformOperator. You shouldn't need to change that though

@tanmesh tanmesh changed the title [WIP] Add support for EXTRACT syntax and converts it to appropriate Pinot expression Add support for EXTRACT syntax and converts it to appropriate Pinot expression Aug 19, 2022
@tanmesh
Copy link
Contributor Author

tanmesh commented Aug 19, 2022

PTAL @Jackie-Jiang when you get a chance. Thanks!

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there are other formatting issues. Can you please take a look at the linter failure?

Copy link
Contributor

@siddharthteotia siddharthteotia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just briefly went over it since it has already been thoroughly reviewed. LGTM

@siddharthteotia siddharthteotia merged commit 857d277 into apache:master Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants