-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] Spark driver memory overflows after using spark streaming for a long time #2215
Comments
@bvaradar Ok, let me verify. |
Thanks @hj2016 . I have updated the PR to fix this issue as well. Can you please check ? |
@bvaradar The spark dirver memory is no problem, but spark execute memory has the same problem。 |
Supplement: HoodieLogFileReader object may also have the same problem。 |
@hj2016 : Apologies. I missed seeing your comment. I have update the PR to take care of this. Can you kindly try. I will go ahead and merge the PR to master once the unit-test passes. |
@bvaradar You missed the DiskBasedMap object. |
@bvaradar I tried to modify it and tested it in version 0.5.2, no problem |
@bvaradar may be I jumped the gun and merged the PR? :( . Feel free to do a follow on |
No worries @vinothchandar . @hj2016 : Opened a new PR #2249 Also identified one more place where this could happen and did the similar fix. |
@vinothchandar @bvaradar Hey guys, just to help you. I have the same problem when executing a streaming writing into a hudi table for a while. Any updates about the solution? Affected versions? I use 0.5.3 currently. |
The fixes are available in 0.7.0 which is in the process of getting released. The release branch will be cut in couple of days (https://lists.apache.org/thread.html/r3473e0aea28a2ffb050a650a09f93b8fabdcef0d456d71afba6c2694%40%3Cdev.hudi.apache.org%3E) Balaji.V |
@bvaradar I have the same problem in 0.9.0 release |
@wxplovecc I have fired a fix here #3951 |
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
3.Analyzing the source code of HoodieLogFormatWriter, HoodieLogFormatWriter object will be constructed every time the metadata is submitted by commit. As spark streamming submits more and more batches, it leads to the accumulation of HoodieLogFormatWriter objects. GC does not seem to be able to perform garbage collection.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version :0.5.2
Spark version :2.4.0
Hive version :2.1.1
Hadoop version :3.0.0
Storage (HDFS/S3/GCS..) :hdfs
Running on Docker? (yes/no) :no
Additional context
Question 1: Has the problem been fixed in the higher version?
Question 2: The HoodieLogFormatWriter object internally refers to a global object, which causes gc to fail to reclaim the object. I feel that setting the object to null after the HoodieLogFormatWriter object executes the close method can fix the problem. Does this have any other impact?
Stacktrace
The text was updated successfully, but these errors were encountered: