Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-17285. RBF: Add a safe mode check period configuration #6347

Merged
merged 2 commits into from
Dec 21, 2023

Conversation

LiuGuH
Copy link
Contributor

@LiuGuH LiuGuH commented Dec 12, 2023

JIRA: HDFS-17285. RBF: Add a safe mode check period configuration.

When dfsrouter start, it enters safe mode. And it will cost 1min to leave.

The log is blow:

14:35:23,717 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leave startup safe mode after 30000 ms
14:35:23,717 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Enter safe mode after 180000 ms without reaching the State Store
14:35:23,717 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Entering safe mode
14:35:24,996 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Delaying safemode exit for 28721 milliseconds...
14:36:25,037 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leaving safe mode after 61319 milliseconds

It depends on these configs.
DFS_ROUTER_SAFEMODE_EXTENSION 30s
DFS_ROUTER_SAFEMODE_EXPIRATION 3min
DFS_ROUTER_CACHE_TIME_TO_LIVE_MS 1min (it is the period for check safe mode)

Because in safe mode dfsrouter will reject write requests, so it should be shorter in check period if refreshCaches is done. And we should remove DFS_ROUTER_CACHE_TIME_TO_LIVE_MS from RouterSafemodeService.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 52m 16s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 49m 46s trunk passed
+1 💚 compile 0m 40s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 35s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 30s trunk passed
+1 💚 mvnsite 0m 40s trunk passed
+1 💚 javadoc 0m 41s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 30s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 22s trunk passed
+1 💚 shadedclient 37m 49s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 28s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 28s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 19s the patch passed
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 javadoc 0m 28s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 23s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 21s the patch passed
+1 💚 shadedclient 38m 1s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 22m 53s hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
215m 7s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6347/1/artifact/out/Dockerfile
GITHUB PR #6347
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 9526337ed5ea 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 74f7ae0
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6347/1/testReport/
Max. process+thread count 2412 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6347/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@@ -279,6 +279,10 @@ public class RBFConfigKeys extends CommonConfigurationKeysPublic {
FEDERATION_ROUTER_PREFIX + "safemode.expiration";
public static final long DFS_ROUTER_SAFEMODE_EXPIRATION_DEFAULT =
3 * DFS_ROUTER_CACHE_TIME_TO_LIVE_MS_DEFAULT;
public static final String DFS_ROUTER_SAFEMODE_CHECKPERIOD =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are indirectly reducing the safe mode period but what we are really doing is adding a separate argument for this.
I agree that this should happen.
Let's just change the JIRA and PR title to "RBF: Add a safe mode check period configuration".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thank you

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we write the unit after the variable? DFS_ROUTER_SAFEMODE_CHECKPERIOD_S?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review. Test case is in testRouterExitSafemode.

Copy link
Contributor

@slfan1989 slfan1989 Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiuGuH Thanks for the contribution! Thanks @goiri for helping with the review.

It would be better if the variable name could indicate the time unit.

This looks better:

  • RBFConfigKeys.DFS_ROUTER_CACHE_TIME_TO_LIVE_MS
  • RBFConfigKeys.DFS_ROUTER_CACHE_TIME_TO_LIVE_MS_DEFAULT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfan1989 Thanks for your advise. Add MS for variable.

@LiuGuH LiuGuH changed the title HDFS-17285. [RBF] Decrease dfsrouter safe mode check period. HDFS-17285. RBF: Add a safe mode check period configuration Dec 13, 2023
@ayushtkn
Copy link
Member

Passing by, what is blocking here? If nothing anyone hitting the merge button?

@slfan1989
Copy link
Contributor

slfan1989 commented Dec 20, 2023

Passing by, what is blocking here? If nothing anyone hitting the merge button?

@ayushtkn Thanks for helping with the review! I will merge this pr to the trunk branch.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 17m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 40s trunk passed
+1 💚 compile 0m 41s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 36s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 31s trunk passed
+1 💚 mvnsite 0m 41s trunk passed
+1 💚 javadoc 0m 42s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 30s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 21s trunk passed
+1 💚 shadedclient 38m 8s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 31s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 33s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 18s the patch passed
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 javadoc 0m 28s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 21s the patch passed
+1 💚 shadedclient 37m 53s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 22m 51s hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
177m 21s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6347/2/artifact/out/Dockerfile
GITHUB PR #6347
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 6fc9ed377839 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / daa3b80
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6347/2/testReport/
Max. process+thread count 2230 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6347/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@slfan1989 slfan1989 merged commit b4fed58 into apache:trunk Dec 21, 2023
4 checks passed
@slfan1989
Copy link
Contributor

@LiuGuH Thank you for your contribution! @goiri @ayushtkn Thanks for helping with the review!

@LiuGuH LiuGuH deleted the router-safemode branch December 21, 2023 09:45
jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024
)  Contributed by LiuGuH.

Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants