-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-7606] Unpersist RDDs after table services, mainly compaction and clustering #11000
Conversation
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
Outdated
Show resolved
Hide resolved
c6da103
to
6c81f31
Compare
@@ -268,6 +268,7 @@ public boolean commitStats(String instantTime, HoodieData<WriteStatus> writeStat | |||
commitCallback.call(new HoodieWriteCommitCallbackMessage( | |||
instantTime, config.getTableName(), config.getBasePath(), stats, Option.of(commitActionType), extraMetadata)); | |||
} | |||
releaseResources(instantTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code path covers all writers correct? spark ds writer, and deltastreamer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, since SparkRDDWriteClient implements the method and is used across Spark DS and deltastreamer, we should be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we are missing async compaction flows.
hudi/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/HoodieSparkCompactor.java
Line 61 in 17ea14a
writeClient.commitCompaction(instant.getTimestamp(), compactionMetadata.getCommitMetadata().get(), Option.empty()); |
->
hudi/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
Line 1045 in 17ea14a
public void commitCompaction(String compactionInstantTime, HoodieCommitMetadata metadata, |
->
Line 322 in 17ea14a
protected void completeCompaction(HoodieCommitMetadata metadata, HoodieTable table, String compactionCommitTime) { |
I don't think we are unpersisting rdds in this flow.
Likely same for async clustering too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it for all table services, for both inline and async flows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch btw
6c81f31
to
7948894
Compare
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
Show resolved
Hide resolved
…d clustering (#11000) --------- Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>
…d clustering (#11000) --------- Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>
…d clustering (#11000) --------- Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>
…d clustering (#11000) --------- Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>
…d clustering (#11000) --------- Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>
…d clustering (#11000) --------- Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>
Change Logs
Unpersist RDDs after table services. Currently, the releaseResources is only called for commit or deltacommits. Tests show that the RDDs persisted by compaction and clustering are not explicitly unpersisted.
Impact
Cleans up persisted rdds w/o leaving any residues.
Risk level (write none, low medium or high below)
Low
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist