Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-16944 Add audit log for RouterAdminServer to save privileged operation log seperately. #5464

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

curie71
Copy link
Contributor

@curie71 curie71 commented Mar 8, 2023

HDFS-16944 We found that in other components (like namenode in hdfs or resourcemanager in yarn), debug log and audit log are record seperately, except RouterAdminServer.

There are lots of simple logs to help with debugging for the developers who can access to the source code. And there are also audit logs record privileged operations with more detailed information to help system admins understand what happened in a real run. 

There is an example in yarn: 

   try {
      // Safety
      userUgi = UserGroupInformation.getCurrentUser();
      user = userUgi.getShortUserName();
    } catch (IOException ie) {
      LOG.warn("Unable to get the current user.", ie); // debug log
      RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
          ie.getMessage(), "ClientRMService",
          "Exception in submitting application", applicationId, callerContext,
          submissionContext.getQueue()); // audit log
      throw RPCUtil.getRemoteException(ie);
    }

So I suggest to add an audit log for RouterAdminServer to save privileged operation logs seperately.
The logger' s name may be:

// hadoop security
public static final Logger AUDITLOG =
      LoggerFactory.getLogger(
          "SecurityLogger." + ServiceAuthorizationManager.class.getName());
// namenode
  public static final Log auditLog = LogFactory.getLog(
      FSNamesystem.class.getName() + ".audit");

I choose className.audit finally and record AUDITLOG instead of LOG for the privileged operations that call permission check function checkSuperuserPrivilege.
 

Description of PR

How was this patch tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

…ration log seperately.

We found that in other components (like namenode in hdfs or resourcemanager in yarn), debug log and audit log are record seperately, except RouterAdminServer.
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 57s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 41m 25s trunk passed
+1 💚 compile 0m 43s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 38s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 0m 30s trunk passed
+1 💚 mvnsite 0m 41s trunk passed
+1 💚 javadoc 0m 48s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 56s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 31s trunk passed
+1 💚 shadedclient 23m 29s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 33s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 37s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 0m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 16s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
+1 💚 mvnsite 0m 34s the patch passed
+1 💚 javadoc 0m 33s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 49s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 22s the patch passed
+1 💚 shadedclient 23m 24s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 21m 1s hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
123m 29s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5464/1/artifact/out/Dockerfile
GITHUB PR #5464
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux d6d53c4ce83d 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 78b6013
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5464/1/testReport/
Max. process+thread count 2205 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5464/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if they qualify for audit logs. Only initial log for every RPC call should be considered audit rather than result or internal result state of the given RPC?

Also we usually have audit logs disabled by default (using NullAppender) i.e. at least configured separately from main application logs.

@virajjasani
Copy link
Contributor

For RBF, if we really want audit logs, they should cover all like add/update mount table entry etc, and not just name service APIs. If we only need auditing for name service RPC, then we can rather name the logger as NameserviceManager.class.getName() + ".audit" rather than RouterAdminServer.class.getName() + ".audit" correct?

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are classifying these as Audit logs, we should have them in a proper way and for all the operations, not just selectively.

We should even consider having a subclass of DefaultAuditLogger Like RouterAdminAuditLoger and then push the entires in the certain format.

@virajjasani
Copy link
Contributor

Agree to the above suggestion of having a new audit logger class and use it for all operations.

Also Ayush, do you think we should rather have audit logs that indicate the audit and not the result as such? For instance, here it seems we are trying to audit the result instead like "whether nameservice was successfully enabled".
Instead, audit log could have been something like "enable nameservice {ns} is called by client {client/ip etc}". The result of whether or not nameservice was enabled successfully is something to be saved as internal application logs only rather than audit, WDYT?

@virajjasani
Copy link
Contributor

virajjasani commented Mar 17, 2023

On the other hand, while hdfs audit logs report whether the given operation had successful authorization (i.e. whether AccessControlException was thrown), but perhaps RMAuditLogger seems to have wider variety of failures being reported in audit logs. So maybe we don't follow specific convention of audit as such (i might be wrong), then maybe it's upto us to report success/failure of the given operation in RBF overall?

@ayushtkn
Copy link
Member

The only policy around Audit logs that I am aware of, they shouldn't change and the parsers should be able to parse them, So, if we make the format similar to that of the HDFS ones, that should be good but still not a strict ask.

Regarding logging success, we do log successful result everytime, it is just the failure which get logs only for ACE. Why only ACE has historical reasons, which we can decide here, if we want to go with that approach in RBF also or not, whatever we choose is good, it is just we can't change it once released.

One main reason for only ACE was: We are always interested In successful cases, since that change the state of the FS, other failure we don't care because they didn't do anything, for RPC load or so use the NamenodeMetrics.

ACE finds special attention as it is bit alarming: that someone who doesn't have access attempted to do some operation, "Some illegal guy in the town trying to do operation X, which he ain't allowed to do"

That is what I know, that is very old stuff, will research sometime more about that....

@virajjasani
Copy link
Contributor

Wow, quite a history behind ACE, thanks a lot for summarizing everything here :)

I was wondering in the past also as to why is it that hdfs audit logs only care for auth success/failure rather than overall operation success/failure but yes it does make sense to audit if authorized access was granted for the given operation, as it's more alarming than other failures. (for other failures, there are tons of logs anyways)
Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants