Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-16209. Add description for dfs.namenode.caching.enabled #3378

Merged
merged 4 commits into from Sep 8, 2021

Conversation

tomscut
Copy link
Contributor

@tomscut tomscut commented Sep 3, 2021

JIRA: HDFS-16209

Namenode config:
dfs.namenode.write-lock-reporting-threshold-ms=50ms
dfs.namenode.caching.enabled=true (default)

In fact, the caching feature is not used in our cluster, but this switch is turned on by default(dfs.namenode.caching.enabled=true), incurring some additional write lock overhead.

We count the number of write lock warnings in a log file, and find that the number of rescan cache warnings reaches about 32%, which greatly affects the performance of Namenode.
namenode-write-lock

We should set 'dfs.namenode.caching.enabled' to false by default and turn it on when we wants to use it.

@tomscut
Copy link
Contributor Author

tomscut commented Sep 3, 2021

@tasanuma @jojochuang @Hexiaoqiao @ferhui Please help review the change. Thanks a lot.

@tomscut
Copy link
Contributor Author

tomscut commented Sep 3, 2021

Hi @ayushtkn , could you please also take a look. Thank you.

@ayushtkn
Copy link
Member

ayushtkn commented Sep 3, 2021

HDFS-13820, added this configuration to disable the feature, But still it was made to true by default, guess due to compatibility reasons.
Folks using the Cache feature would get impacted with this change, right? they have to now enable this explicitly. There was a proposal on on HDFS-13820

Please implement a way to disable the CacheReplicationMonitor class if there are no paths specified. Adding the first cached path to the NameNode should kick off the CacheReplicationMonitor and when the last one is deleted, the CacheReplicationMonitor should be disabled again.

Is something like this possible?

@tomscut
Copy link
Contributor Author

tomscut commented Sep 3, 2021

HDFS-13820, added this configuration to disable the feature, But still it was made to true by default, guess due to compatibility reasons.
Folks using the Cache feature would get impacted with this change, right? they have to now enable this explicitly. There was a proposal on on HDFS-13820

Please implement a way to disable the CacheReplicationMonitor class if there are no paths specified. Adding the first cached path to the NameNode should kick off the CacheReplicationMonitor and when the last one is deleted, the CacheReplicationMonitor should be disabled again.

Is something like this possible?

Thanks @ayushtkn for your comments.

I have also seen HDFS-13820. But that feature(auto enable or auto disable) is not currently implemented. For new users who may not know this feature(Centralized Cache Management) exists, but it already runs quietly in the background, which incurs performance overhead.

IMO, if we need to use this feature, it makes sense to turn it on and specify the path. What do you think?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 12m 51s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 30m 59s trunk passed
+1 💚 compile 1m 22s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 1m 18s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 2s trunk passed
+1 💚 mvnsite 1m 27s trunk passed
+1 💚 javadoc 0m 58s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 27s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 11s trunk passed
+1 💚 shadedclient 16m 31s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 9s the patch passed
+1 💚 compile 1m 14s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 1m 14s the patch passed
+1 💚 compile 1m 5s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 1m 5s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 53s the patch passed
+1 💚 mvnsite 1m 13s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 0m 48s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 18s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 7s the patch passed
+1 💚 shadedclient 16m 8s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 249m 5s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 45s The patch does not generate ASF License warnings.
345m 57s
Reason Tests
Failed junit tests hadoop.fs.TestEnhancedByteBufferAccess
hadoop.hdfs.server.datanode.fsdataset.impl.TestPmemCacheRecovery
hadoop.hdfs.server.datanode.fsdataset.impl.TestCacheByPmemMappableBlockLoader
hadoop.hdfs.server.datanode.TestFsDatasetCacheRevocation
hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetCache
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3378/1/artifact/out/Dockerfile
GITHUB PR #3378
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml markdownlint
uname Linux 8f8fcb9e3d81 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 9a6e363
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3378/1/testReport/
Max. process+thread count 3095 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3378/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@ferhui
Copy link
Contributor

ferhui commented Sep 3, 2021

As @ayushtkn said, facing the same problem, HDFS-13820 add ability to disable the feature, you can also set it false.
If you change the default value, it's an incompatible change, especially for upgrading(using this feature). Seem that it's not so good.

@tomscut
Copy link
Contributor Author

tomscut commented Sep 3, 2021

As @ayushtkn said, facing the same problem, HDFS-13820 add ability to disable the feature, you can also set it false.
If you change the default value, it's an incompatible change, especially for upgrading(using this feature). Seem that it's not so good.

Thanks @ferhui for your comments.

Maybe we can add a release note for this change. For new users who may not know this feature(Centralized Cache Management) exists, but it already runs quietly in the background. I think it's not a very elegant way.

@virajjasani
Copy link
Contributor

Thanks for the reference and find @tomscut !
I agree to the general opinion of not making incompatible change until absolutely required, and on the other hand, this change looks quite obvious given that write lock overhead is redundant for non-Cache use-cases. If we want to use the feature, enable the config, that sounds right.

However, I believe, as of now, we should provide one fat warning log at appropriate place stating that "please disable this config unless you are using Cache feature and we are going to disable this config by default in 4.0.0 and above releases". And we might also want to reference this Jira for perf degradation case. Thoughts?

Overall, perhaps we might want to wait at least one more major release before disabling this by default rather than making incompatible change on 3.x releases.

@tomscut
Copy link
Contributor Author

tomscut commented Sep 6, 2021

@ayushtkn @ferhui @virajjasani Thank you very much for your comments and suggestions. I think what you are saying is reasonable, we should not change the default value of this parameter. But we can add a caption, as @virajjasani said.

I have changed the parameter description information, please have a look. Thanks again.

@ferhui
Copy link
Contributor

ferhui commented Sep 6, 2021

@tomscut @virajjasani, Thanks. I think It's a good way to add description here.
BTW, change the title of JIRA and PR.

@tomscut tomscut changed the title HDFS-16209. Set dfs.namenode.caching.enabled to false as default HDFS-16209. Add description for dfs.namenode.caching.enabled Sep 6, 2021
@tomscut
Copy link
Contributor Author

tomscut commented Sep 6, 2021

@tomscut @virajjasani, Thanks. I think It's a good way to add description here.
BTW, change the title of JIRA and PR.

Thanks @ferhui for your reply. I changed the title of JIRA and PR.

@virajjasani
Copy link
Contributor

Thanks @tomscut, just one more thing. If you could add a TODO comment in DFSConfigKeys on top of DFS_NAMENODE_CACHING_ENABLED_DEFAULT = true; line stating that TODO: Default value to be set false in 4.0.0 release onwards (HDFS-16209), that would be really great.

@tomscut
Copy link
Contributor Author

tomscut commented Sep 6, 2021

Thanks @tomscut, just one more thing. If you could add a TODO comment in DFSConfigKeys on top of DFS_NAMENODE_CACHING_ENABLED_DEFAULT = true; line stating that TODO: Default value to be set false in 4.0.0 release onwards (HDFS-16209), that would be really great.

Thanks @virajjasani for your suggestion, I added the todo description for it.

@tomscut
Copy link
Contributor Author

tomscut commented Sep 6, 2021

Hi @ayushtkn @ferhui , this PR seems not to trigger compilation, can you help trigger it. Thanks a lot.

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 (non-binding)

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 5s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 31m 56s trunk passed
+1 💚 compile 1m 27s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 1m 20s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 0s trunk passed
+1 💚 mvnsite 1m 31s trunk passed
+1 💚 javadoc 1m 0s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 28s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 7s trunk passed
+1 💚 shadedclient 16m 29s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 15s the patch passed
+1 💚 compile 1m 17s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 1m 17s the patch passed
+1 💚 compile 1m 7s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 1m 7s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 50s the patch passed
+1 💚 mvnsite 1m 12s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 0m 46s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 17s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 13s the patch passed
+1 💚 shadedclient 18m 30s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 236m 58s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 46s The patch does not generate ASF License warnings.
325m 34s
Reason Tests
Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3378/2/artifact/out/Dockerfile
GITHUB PR #3378
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml markdownlint
uname Linux bb662cbf55f8 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1c453e2
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3378/2/testReport/
Max. process+thread count 3344 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3378/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@tomscut
Copy link
Contributor Author

tomscut commented Sep 7, 2021

@ayushtkn @ferhui Could you please review again? Thanks.

Copy link
Contributor

@ferhui ferhui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@aajisaka aajisaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ferhui ferhui merged commit 5a30695 into apache:trunk Sep 8, 2021
@ferhui
Copy link
Contributor

ferhui commented Sep 8, 2021

@tomscut Thanks for contribution. @ayushtkn @virajjasani @aajisaka @tasanuma Thanks for review! Merged to trunk.

@tomscut
Copy link
Contributor Author

tomscut commented Sep 8, 2021

Thanks @ferhui @ayushtkn @tasanuma @virajjasani @aajisaka for your review and merge.

kiran-maturi pushed a commit to kiran-maturi/hadoop that referenced this pull request Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants