-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16550. Improper cache-size for journal node may cause cluster crash #4209
Conversation
💔 -1 overall
This message was automatically generated. |
I think this change is a bit too restrictive. There may well be valid use cases for setting it above the 90% threshold. For example if you configured a 100GB heap, you really don't need 10GB of non-cache overhead, so you could safely allocate 95GB for the cache. If we want to add fail-fast behavior, I would say it should only apply when Alternatively, you could make the 90% threshold configurable, and point users to a config they can adjust if they really want to exceed it. But I think this may be overkill. |
Thanks @xkrogen for review and comments. Maybe we can do this: What do you think of this? If there is no problem, I will update it in this way. Thanks again. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@tomscut Thanks for involving me. In my case, I think this PR is unnecessary. But we can print some warning logs to prompt the admin if the set memory is too large, such as more than 90% of the heap size. But, if anyone thinks this modification is necessary, I will review it carefully later. |
Thanks @ZanderXu for the review. There are already waring logs, but they are easy to ignore. Because there is no connection between memory and cache size, it's easy to miss when updating configuration. Using proportions is the safer way. |
Yeah, either fixe value or scale value look fine to me. If you want to limit the total cache memory, you need to consider multiple namespace case. As you said, the warn log may be ignored, so you need to take some measures to prevent OOM, such as failure to initialize one namespace if currently cache memory is big enough. So for me, the current logic is enough, and the project admin needs to set a reasonable value according to the number of namespace and the total heap size of journalNode if she/he want to enable this feature. Declare again, my idea does not block this PR. If anyone thinks this modification is necessary, I will review it carefully later. |
I am -1 on the PR as-is. We have publicly exposed the current config
This still does change the default behavior slightly, since before you would get a 1GB cache and now you get |
Thank you for your comments and detailed suggestions. It is a good idea to add a new config. My initial idea was to change the parameter to fraction/ratio, but we can adjust the parameter to keep the cache in 1GB. Then if this feature is released, we add a release note to let users know that the parameters will change after the upgrade. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Hi @xkrogen @ZanderXu , I updated the code according to the suggestion, please take a look. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall behavior LGTM, though I think the documentation could be improved.
...ct/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournaledEditsCache.java
Outdated
Show resolved
Hide resolved
...ct/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournaledEditsCache.java
Show resolved
Hide resolved
...adoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/server/TestJournaledEditsCache.java
Outdated
Show resolved
Hide resolved
...adoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/server/TestJournaledEditsCache.java
Outdated
Show resolved
Hide resolved
...adoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/server/TestJournaledEditsCache.java
Outdated
Show resolved
Hide resolved
...adoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/server/TestJournaledEditsCache.java
Show resolved
Hide resolved
...adoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/server/TestJournaledEditsCache.java
Outdated
Show resolved
Hide resolved
hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
Outdated
Show resolved
Hide resolved
Thanks @xkrogen for your thoughtful advice. It makes perfect sense to me. I updated the code. |
💔 -1 overall
This message was automatically generated. |
...ct/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournaledEditsCache.java
Outdated
Show resolved
Hide resolved
LGTM. Will wait for a clean Jenkins run |
💔 -1 overall
This message was automatically generated. |
|
JIRA: HDFS-16550.
For details, please refer to the JIRA.