Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-15960 RBF: Router should talk to namenode with security context. #2887

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

bolerio
Copy link
Contributor

@bolerio bolerio commented Apr 9, 2021

No description provided.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 34m 39s trunk passed
+1 💚 compile 0m 41s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 0m 35s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 0m 26s trunk passed
+1 💚 mvnsite 0m 44s trunk passed
+1 💚 javadoc 0m 36s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 58s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 1m 20s trunk passed
+1 💚 shadedclient 14m 32s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 33s the patch passed
+1 💚 compile 0m 34s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 0m 34s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 0m 29s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 18s the patch passed
+1 💚 mvnsite 0m 33s the patch passed
+1 💚 javadoc 0m 31s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 46s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 1m 20s the patch passed
+1 💚 shadedclient 14m 48s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 17m 48s /patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 0m 30s The patch does not generate ASF License warnings.
94m 43s
Reason Tests
Failed junit tests hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination
hadoop.hdfs.server.federation.router.TestRouterRpc
hadoop.hdfs.server.federation.router.TestRouterAllResolver
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2887/1/artifact/out/Dockerfile
GITHUB PR #2887
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux ed950788e3c0 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ed8da9f
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2887/1/testReport/
Max. process+thread count 2441 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2887/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

updateState();
try {
SecurityUtil.doAsCurrentUser(
new PrivilegedExceptionAction<Object>() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a lambda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, agreed that'll be more readable here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @goiri and @bolerio for your comments. I am just concern if is it necessary to do as current user here because Router has login when start daemon and it is already execute with current login user. Do you meet some exception here? Thanks.

});
} catch (IOException e) {
// Generic error that we don't know about
LOG.error("Unexpected exception while communicating with {}: {}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a unit test for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will try to create one. Thanks for checking this out @goiri , somehow I missed the notification re your comments, Will follow up with a unit test soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @goiri , following up this. I was able to create a unit test that reproduces the problem and demonstrate that the patch fixes it. However, there is a challenge.

The failure is when the router calls the JMX endpoint which returns some info stats in addition to the basic alive status which is obtained in a separate RPC call. The failure is soft - logs the exception and continues, without the information it tried to obtain. However that information is needed later during load balancing, which is how the original bug was discovered.

Now, because the main interface capturing knowledge about a NN on the router side (FederationNamenodeContext) does not contain these stats, there is no way to write a unit test against it. There are some unit tests in that area that mock this interface and I modified the mock to include stats, but then I have to downcast to the mock object in the test which is very ugly.

So the options are: (1) accept this ugly downcast (2) don't write the test and eventually if Hadoop has an integration test suite, cover the use case there and (3) modify the FederationNamenodeContext to include the stats (see MembershipState and MembershipStats class). My vote would be for (3) as those stats seem essential to the operation of a federated cluster. It would be ok not to make all of the numbers part of the public interface, but the fact that we need stats about resource utilization should be part of the interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3 sounds reasonable, do you mind giving it a try in this PR?

@goiri goiri changed the title HDFS-15960 Router should talk to namenode with security context. HDFS-15960 RBF: Router should talk to namenode with security context. May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants