[HUDI-1748] Read operation will possibly fail on mor table rt view when a write operations is concurrency running#2751
[HUDI-1748] Read operation will possibly fail on mor table rt view when a write operations is concurrency running#2751li36909 wants to merge 1 commit intoapache:masterfrom
Conversation
…when a write operations is concurrency running
|
cc @nsivabalan could you help to take a look, thank you |
Codecov Report
@@ Coverage Diff @@
## master #2751 +/- ##
=============================================
+ Coverage 52.04% 69.73% +17.69%
+ Complexity 3625 371 -3254
=============================================
Files 479 54 -425
Lines 22804 1989 -20815
Branches 2415 236 -2179
=============================================
- Hits 11868 1387 -10481
+ Misses 9911 471 -9440
+ Partials 1025 131 -894
Flags with carried forward coverage won't be shown. Click here to find out more. |
|
@nsivabalan I run the test at hudi 0.7. yes, you are right, I start a spark-shell for upserting, and query the same table by spark datasouce api, then the problem arises. The cause of the problem is clear, during the query, hudi get partitions at MergeOnReadSnapshotRelation, and build a new fsview at HoodieRealtimeInputFormatUtils.groupLogsByBaseFile, when a write operation is happening, HoodieRealtimeInputFormatUtils.groupLogsByBaseFile will find some new base files. import org.apache.hudi.QuickstartUtils. val tableName = "hudi_mor_table" step 2: run a query at new spark-shell (when the query hang at Thread.sleep, start to write a new batch at step3) setp 3: go to the spark-shell at step1, write a new batch: we can see the step2 will throw a exception |
|
@li36909 If I understand this correctly, you are saying reading a MOR table |
|
@n3nash yes |
|
Okay, thanks for the confirmation, I will try to reproduce this issue on my end and get back. |
|
cc @n3nash could you help to take a look again, thank you |
|
@n3nash : once you verify the issue, please move it to "ready to review" col in the PR board. |
|
@alexeykudinkin : this has been a long pending PR. Can you follow up on this please. Guess there are steps to reproduce. If its not a valid issue, we can close the patch and the jira. |
|
@alexeykudinkin Is this already fixed by the new input format classes? |
|
@li36909 can you please rebase this on the latest master and validate whether your fix is still valid? |
|
@li36909 : gentle ping. feel free to close if its not an issue anymore. |
bvaradar
left a comment
There was a problem hiding this comment.
The change would cause skipping file groups with no base parquet file (case when log file can be indexed). Also, File system view should be read only committed files. So, unless the finished commit is reverted ( explicitly which is not snapshot isolation.), this issue should not be seen.
|
Closing this PR due to reasons above. |
Tips
What is the purpose of the pull request
Solve read write concurrency bug on mor table rt view
Brief change log
(for example:)
Verify this pull request
Testing concurrency probabilistic problems in UT is difficult. I add sleep stability to the getRealtimeSplits method to reproduce the problem and verify the fix.
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.