-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5289] Fix writeStatus RDD recalculated in cluster #7373
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I put up the fix, I did verify that clustering was not triggered multiple times. Can you confirm that you are seeing clustering being triggered twice?
If you keep inspecting the data directory, you will find additional data files just during the validaiton.
Or other way to test this is, marker directory reconciliation will delete some additional files if dag was triggered twice. If not, you should not see any marker file deletions.
c762aa3
to
a683b28
Compare
@nsivabalan yes, I confirm, and @boneanxs find it too, So I think we should persist it |
I find there was a issue when I first implemented But I'm thinking whether we can do |
@@ -122,6 +123,8 @@ public HoodieWriteMetadata<HoodieData<WriteStatus>> performClustering(final Hood | |||
.stream(); | |||
JavaRDD<WriteStatus>[] writeStatuses = convertStreamToArray(writeStatusesStream.map(HoodieJavaRDD::getJavaRDD)); | |||
JavaRDD<WriteStatus> writeStatusRDD = engineContext.union(writeStatuses); | |||
// Persist writeStatus, since it may be reused | |||
writeStatusRDD.persist(StorageLevel.MEMORY_AND_DISK()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't rely on cache since cache could be invalidated and we have had experiences where actual files written did not match w/ whats in writeStatus. Hence if something fails, we let dag to get retriggered here. we added guard rails at BasecommitActionExecutor by means of cloning.
CC @alexeykudinkin
Are you guys saying that, even in a successful path, the dag is getting triggered twice. Or it happens only during exception path. |
@nsivabalan |
Change Logs
fix https://issues.apache.org/jira/projects/HUDI/issues/HUDI-5289
There is a patch similar to this problem, but after this patch is added, the problem still exists.
Impact
Could improve the clustering performance.
Risk level (write none, low medium or high below)
low
Documentation Update
None
Contributor's checklist