[HUDI-4364]: changes for integrating column stats index into presto-h…#6087
[HUDI-4364]: changes for integrating column stats index into presto-h…#6087pratyakshsharma wants to merge 2 commits intoapache:masterfrom
Conversation
|
@hudi-bot run azure |
|
@pratyakshsharma nice work! |
|
@xiarixiaoyao this is a good question, something I have been thinking about too. The idea is to build a layer that will help in integrating column stats index with all java based engines like presto, trino and hive. This lays the foundation, since we need something like ranges or column domains so as to be able to filter the files using min and max values. Few classes here are actually inspired from those present in presto, but they are not exactly similar. Although since this is just the beginning of this work, I am open to hear others' thoughts on this. |
|
@pratyakshsharma thanks for taking the time to contribute this! We definitely want to make sure that the code integrating w/ Presto/Trino/Hive is reusable as much as possible, and i think we should start to think about it upfront to avoid churn of refactoring things back and forth. Given the scope of this integration as well as its impact, i think we'd def go for RFC for it to make sure we solicit the feedback from the community before go too far w/ the implementation. |
|
@alexeykudinkin An epic is filed here - https://issues.apache.org/jira/browse/HUDI-4394. Please note this draft PR is intended as a POC and would work well with Presto. We were actually planning to get this with 0.12 release. If not, we can target this for 1.0.0 |
|
Got it. Yeah, i don't think we'll be able to make it into 0.12 given that we're planning to do a code freeze next week. And again, i don't think we can go with the project of this size, scope and more importantly impact (it'll be affecting all forthcoming execution engines like Flink, Presto, Trino, Hive, etc) w/o an RFC. |
|
Agree with you on this. Let me draft an RFC and we can take it up from there. |
…udi connector
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.