-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-39312][SQL] Use parquet native In predicate for in filter push down #36696
Conversation
cc @wangyum |
if (values.length <= pushDownInFilterThreshold) { | ||
values.distinct.flatMap { v => | ||
makeEq.lift(fieldType).map(_(fieldNames, v)) | ||
}.reduceLeftOption(FilterApi.or) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a regression, For example:
In(id, 1, 3, 5, 100000)
Before this PR, the pushded predicate is: id = 1 or id = 3 or id = 5 or id = 100000,
the current pushded predicate is: id >= 1 and id <= 100000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test coverage for @wangyum 's case, please, @huaxingao ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change back to use equal Or equal if the number of values in the In
set <= pushDownInFilterThreshold
. Only use parquet In
predicate if the number of values in the In
set > pushDownInFilterThreshold
. The existing tests coverage should be good, just need to change the expected type back to equal Or equal when the number of values in the In
set <= pushDownInFilterThreshold
.
PartialFunction[ParquetSchemaType, (Array[String], Any) => FilterPredicate] = { | ||
case ParquetBooleanType => | ||
(n: Array[String], v: Any) => | ||
val values = Option(v).map(_.asInstanceOf[Array[Object]]).orNull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If orNull
is hit, for (value <- values) { ... }
will throw NPE
ornull
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look. I don't think v
could be null, but this line is actually not needed, so I deleted.
cc @sunchao |
@wangyum is helping me testing this because he has lots of |
Please wait me several days. |
I have tested it more than 2 weeks and no data issue. |
@wangyum Thank you very much for helping me test this! |
What is the next step for us, @huaxingao ? |
@dongjoon-hyun Thanks for the ping. Give me a couple of more days. I want to check one more time before mark this ready for review. |
No problem~ Thank you for informing that, @huaxingao . Take your time. |
Could you update the |
Gentle ping, @huaxingao . |
… > pushDownInFilterThreshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for update. Could you check the relevant failures?
[error] Failed tests:
[error] org.apache.spark.sql.execution.datasources.parquet.ParquetV2FilterSuite
[error] org.apache.spark.sql.execution.datasources.parquet.ParquetV1FilterSuite
Could you re-trigger once more, @huaxingao ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged to master. |
Thanks you all very much! |
Thank you, @huaxingao and all! |
What changes were proposed in this pull request?
Since now Parquet supports its native in predicate, we want to simplify the current In filter pushdown using Parquet's native in predicate.
Why are the changes needed?
code enhancement
Does this PR introduce any user-facing change?
No
How was this patch tested?
modify the existing tests