-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5989] Fix date conversion issue when performing partition pruning on Spark #8298
Conversation
Co-authored-by: Rex An <bonean131@gmail.com>
@boneanxs for visibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@voonhous Can you also attach the file system view log, how the partition looks like with this fix?
...-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala
Show resolved
Hide resolved
// TLDR: | ||
// execution order of [B] = pass | ||
// execution order of [B, A] = pass | ||
// execution order of [A] = fail | ||
// execution order of [A, B] = fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comment and add it inline where each part or the combination of different execution orders is being tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, will remove these as these are just comments that were used to aid debugging.
Leaving these in the final PR will only confuse any other engineers working on this component.
I will also simplify the tests to only test the paths that will trigger the error.
After the fix, it should look like this:
|
Co-authored-by: Rex An <bonean131@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codope CI is green, can you please help to review this, thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, can we add some test for custom timezone because the DataFormat is related.
Done, I am not sure if what i am doing is correct. CMIIW, changing the timezone should have no impact on the date conversion here. |
@danny0405 CI is green. Can we merge this in? Another PR #8402 has been raised that fixes another issue around this code. Can we merge this in first so that there are no merge conflicts moving forward? |
…ng on Spark (apache#8298) When lazy fetching partition path & file slice for HoodieFileIndex is used, date cannot be converted to the correct string representation. This is the case as Spark store dates as an integer value representing the number of days that has past since 1970-01-01. When rebuilding the partition path, this FS path could be rebuilt wrongly causing a partition to be empty, and hence, the query result to be empty/incorrect. Co-authored-by: Rex An <bonean131@gmail.com>
…ng on Spark (apache#8298) When lazy fetching partition path & file slice for HoodieFileIndex is used, date cannot be converted to the correct string representation. This is the case as Spark store dates as an integer value representing the number of days that has past since 1970-01-01. When rebuilding the partition path, this FS path could be rebuilt wrongly causing a partition to be empty, and hence, the query result to be empty/incorrect. Co-authored-by: Rex An <bonean131@gmail.com>
Change Logs
When lazy fetching partition path & file slice for HoodieFileIndex is used, date cannot be converted to the correct string representation.
This is the case as Spark store dates as an integer value representing the number of days that has past since 1970-01-01.
When rebuilding the partition path, this FS path could be rebuilt wrongly causing a partition to be empty, and hence, the query result to be empty/incorrect.
Impact
None
Risk level (write none, low medium or high below)
None
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist