-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for filters in the Druid Delta Lake connector #16288
Support for filters in the Druid Delta Lake connector #16288
Conversation
...talake-extensions/src/main/java/org/apache/druid/delta/filter/DeltaBinaryOperatorFilter.java
Fixed
Show fixed
Hide fixed
...talake-extensions/src/main/java/org/apache/druid/delta/filter/DeltaBinaryOperatorFilter.java
Fixed
Show fixed
Hide fixed
...talake-extensions/src/main/java/org/apache/druid/delta/filter/DeltaBinaryOperatorFilter.java
Fixed
Show fixed
Hide fixed
...talake-extensions/src/main/java/org/apache/druid/delta/filter/DeltaBinaryOperatorFilter.java
Fixed
Show fixed
Hide fixed
...talake-extensions/src/main/java/org/apache/druid/delta/filter/DeltaBinaryOperatorFilter.java
Fixed
Show fixed
Hide fixed
...talake-extensions/src/main/java/org/apache/druid/delta/filter/DeltaBinaryOperatorFilter.java
Fixed
Show fixed
Hide fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nits on the docs
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review.
I have an overarching question, regarding the <, >, <=, >= filters.
I think it would be easier if we have the type as "predicate", a predicate field like ">" and then the LHS and RHS for the predicate, rather than having a separate type for each, since that resembles closely to what we want to do.
Apart from that, is there a way to write something like the following, without extra effort from the developer to convert it to its normal form?
!(col1 > col2 && col3 < col4)
...ib/druid-deltalake-extensions/src/main/java/org/apache/druid/delta/filter/DeltaOrFilter.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
Thanks for the review, @LakshSingla!
Yeah, as we discussed offline, the Delta Kernel API considers all the expressions, including
Yes, that should be possible! Please see the unit tests |
...druid-deltalake-extensions/src/main/java/org/apache/druid/delta/filter/DeltaFilterUtils.java
Dismissed
Show dismissed
Hide dismissed
Thanks for the additional feature 🚀 |
* Delta Lake support for filters. * Updates * cleanup comments * Docs * Remmove Enclosed runner * Rename * Cleanup test * Serde test for the Delta input source and fix jackson annotation. * Updates and docs. * Update error messages to be clearer * Fixes * Handle NumberFormatException to provide a nicer error message. * Apply suggestions from code review Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Doc fixes based on feedback * Yes -> yes in docs; reword slightly. * Update docs/ingestion/input-sources.md Co-authored-by: Laksh Singla <lakshsingla@gmail.com> * Update docs/ingestion/input-sources.md Co-authored-by: Laksh Singla <lakshsingla@gmail.com> * Documentation, javadoc and more updates. * Not with an or expression end-to-end test. * Break up =, >, >=, <, <= into its own types instead of sub-classing. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
Adds a text box for delta filter that can accept an optional json object.
Adds a text box for delta filter that can accept an optional json object.
Adds a text box for delta filter that can accept an optional json object.
Adds a text box for delta filter that can accept an optional json object.
Adds a text box for delta filter that can accept an optional json object.
Adds a text box for delta filter that can accept an optional json object.
This patch adds support for Delta filters in the Delta Lake input source. Before this patch, the Delta Lake connector would read all Delta files in the latest snapshot, which could be inefficient. With the addition of filters, the connector translates them into Delta predicates, enabling the pushing down of filter predicates to the underlying Delta Kernel. This allows for the Kernel to perform data skipping and read only Delta files that match the filter predicates.
Description
An example of a
>=
range filter:An example of an
AND
filter:Core changes:
DeltaFilter
interface and implementations forand
,or
,not
,=
,>
,>=
,<
,<=
.filter
in theDeltaInputSource
. If supplied, the filters are translated to DeltaPredicate
that is pushed down to the Kernel for pruning out data files.IcebergFilter
where the filters can be an expression tree that translates nicely to underlying Delta Predicate to be used in the Kernel. Updated ingestion and delta docs to show the usage of these filter objects.Test changes
employee-delta-table-partitioned-name
partitioned by employee name. Added instructions on how to generate in the test README.PartitionedDeltaTable
. The existing non-partitioned table is moved toNonPartitionedDeltaTable
. The test classes can further be cleaned up by having a commonDeltaTestTable
interface. I can refactor and clean that up later when we add more test tables as needed.delta/filter
package.DeltaInputSourceSerdeTest
to test out jackson serialization of the input source objects.Release note
Support for filters in the Delta Lake input source has been added. Utilize the Delta filter to prune unnecessary Delta files and read only data that matches the filter predicates. Please refer to the documentation for instructions on how to use filters during ingestion.
Key changed/added classes in this PR
DeltaFilter
interface and its implementationsDeltaInputSource
This PR has: