Skip to content

Conversation

@LiuGuH
Copy link
Contributor

@LiuGuH LiuGuH commented Jan 3, 2024

Description of PR

Router should have the ability to to auto msync to a nameservice. And it can ensure router periodically refreshes its record of a namespace's state.

Different from HDFS-17027, this is controled by router itself without configuring with AbstractNNFailoverProxyProvider.
And HDFS-16890 maybe lead to many read requests into active NN at the same time.

This PR provides a new way to implememts auto msync in Router.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 20s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 31m 53s trunk passed
+1 💚 compile 0m 22s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 19s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 17s trunk passed
+1 💚 mvnsite 0m 28s trunk passed
+1 💚 javadoc 0m 27s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 22s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 49s trunk passed
+1 💚 shadedclient 19m 28s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 20s the patch passed
+1 💚 compile 0m 18s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 18s the patch passed
+1 💚 compile 0m 16s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 16s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 11s the patch passed
+1 💚 mvnsite 0m 21s the patch passed
+1 💚 javadoc 0m 17s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 17s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 48s the patch passed
+1 💚 shadedclient 19m 43s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 20s hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 0m 24s The patch does not generate ASF License warnings.
99m 48s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6404/1/artifact/out/Dockerfile
GITHUB PR #6404
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 715c46e9e3f8 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / f8183f8
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6404/1/testReport/
Max. process+thread count 2620 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6404/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@slfan1989
Copy link
Contributor

@simbadzina Can you help review this PR? Thank you very much! I see that HDFS-16890 and HDFS-17027 were both completed by you.


@Test
public void testMsync() throws InterruptedException, IOException {
Thread.sleep(msyncInterval);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In unit tests, we'd better not use sleep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed with GenericTestUtils . Thanks.

@simbadzina
Copy link
Member

@LiuGuH can you expand more on the following

And HDFS-16890 maybe lead to many read requests into active NN at the same time.

I see how multiple concurrent calls can read the same false value from isNamespaceStateIdFresh(nsId) before the accumulator is updated, but the windows for these reads should be very small. I'm wondering if we can instead fix the existing mechanism such that only a single read is sent to the active, vs. adding a new mechanism.

Additionally, the periodic redirection of calls to the active only happens in the case when there are no calls going to the active already so having some reads be sent to the active should not overload it.

@LiuGuH
Copy link
Contributor Author

LiuGuH commented Jan 4, 2024

I'm wondering if we can instead fix the existing mechanism such that only a single read is sent to the active, vs. adding a new mechanism.

Yes, it can. But for only a single read is send to the active, we should add synchronized. And this maybe have performance impact. Add a seperate RouterAutoMsyncService maybe a way to slove it.

Additionally, the periodic redirection of calls to the active only happens in the case when there are no calls going to the active already so having some reads be sent to the active should not overload it.

In most cases, that's true. But I think add RouterAutoMsyncService will be more robustness. Thanks

@simbadzina
Copy link
Member

I'm wondering if we can instead fix the existing mechanism such that only a single read is sent to the active, vs. adding a new mechanism.

Yes, it can. But for only a single read is send to the active, we should add synchronized. And this maybe have performance impact. Add a seperate RouterAutoMsyncService maybe a way to slove it.

Additionally, the periodic redirection of calls to the active only happens in the case when there are no calls going to the active already so having some reads be sent to the active should not overload it.

In most cases, that's true. But I think add RouterAutoMsyncService will be more robustness. Thanks

That is fair. I'll take a closer look at the implementation now that we've discussed the higher level details. Please also address the unit test comment by @slfan1989

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 21s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 40m 54s trunk passed
+1 💚 compile 0m 22s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 20s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 19s trunk passed
+1 💚 mvnsite 0m 27s trunk passed
+1 💚 javadoc 0m 29s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 18s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 49s trunk passed
+1 💚 shadedclient 19m 31s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 18s the patch passed
+1 💚 compile 0m 18s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 18s the patch passed
+1 💚 compile 0m 18s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 18s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 10s the patch passed
+1 💚 mvnsite 0m 22s the patch passed
+1 💚 javadoc 0m 19s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 17s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 52s the patch passed
+1 💚 shadedclient 19m 45s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 24s hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 0m 24s The patch does not generate ASF License warnings.
108m 38s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6404/2/artifact/out/Dockerfile
GITHUB PR #6404
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 8a23deefa7cb 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / efcc74f
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6404/2/testReport/
Max. process+thread count 2631 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6404/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@slfan1989
Copy link
Contributor

@simbadzina Thanks for helping review the code!

@LiuGuH Thanks for the contribution! but I think it's better if we don't introduce a new one if the original one works. Personally, I'm worried about the synchronization state of a single thread, it doesn't seem to be a good design. If some boundaries occur (such as NN restart, hang, etc.), how do we deal with it? If we are going to design a more complex mechanism to handle boundary issues, I think it is better to use the mechanisms that are already available.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 01s No case conflicting files found.
+0 🆗 spotbugs 0m 01s spotbugs executables are not available.
+0 🆗 codespell 0m 01s codespell was not available.
+0 🆗 detsecrets 0m 01s detect-secrets was not available.
+0 🆗 xmllint 0m 01s xmllint was not available.
+1 💚 @author 0m 00s The patch does not contain any @author tags.
+1 💚 test4tests 0m 00s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 89m 58s trunk passed
+1 💚 compile 4m 55s trunk passed
+1 💚 checkstyle 4m 31s trunk passed
+1 💚 mvnsite 5m 00s trunk passed
+1 💚 javadoc 4m 40s trunk passed
+1 💚 shadedclient 146m 45s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 53s the patch passed
+1 💚 compile 2m 24s the patch passed
+1 💚 javac 2m 24s the patch passed
+1 💚 blanks 0m 00s The patch has no blanks issues.
+1 💚 checkstyle 2m 02s the patch passed
+1 💚 mvnsite 2m 26s the patch passed
+1 💚 javadoc 2m 07s the patch passed
+1 💚 shadedclient 176m 27s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 8m 22s The patch does not generate ASF License warnings.
439m 42s
Subsystem Report/Notes
GITHUB PR #6404
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname MINGW64_NT-10.0-17763 daf58be70672 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / efcc74f
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6404/1/testReport/
modules C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6404/1/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2025

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

@github-actions github-actions bot added the Stale label Oct 5, 2025
@github-actions github-actions bot closed this Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants