Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-1260:Audit logging in ZooKeeper servers. #338

Closed
wants to merge 2 commits into from
Closed

ZOOKEEPER-1260:Audit logging in ZooKeeper servers. #338

wants to merge 2 commits into from

Conversation

arshadmohammad
Copy link
Contributor

Audit logging in ZooKeeper Servers.

@@ -52,6 +52,7 @@ See http://forrest.apache.org/docs/linking.html for more info.
<jmx label="JMX" href="zookeeperJMX.html" />
<observers label="Observers Guide" href="zookeeperObservers.html" />
<reconfig label="Dynamic Reconfiguration" href="zookeeperReconfig.html" />
<reconfig label="Audit Logging" href="zookeeperAuditLogs.html" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be something like <audit label="Audit Logging" href="zookeeperAuditLogs.html" /> rather than <reconfig ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it should be <audit. Corrected it

Copy link
Contributor

@hanm hanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Some quick comments. Also please fix FindBug warnings as they are legitimate.

<emphasis role="bold">zookeeper.audit.enabled</emphasis>)
</para>
<para>
<emphasis role="bold">New in 3.5.3:</emphasis>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 3.5.4 at least. We've shipped 3.5.3 already :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected it

</imageobject>
</mediaobject>
<para>The audit log captures the detailed information for the operations that are selected to be audited. The audit
information is written as a set of key=value pairs for the following keys</para>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing full stop after following keys

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected it

audit.logger=INFO, RFAAUDIT
log4j.logger.org.apache.zookeeper.audit.ZKAuditLogger=${audit.logger}
log4j.additivity.org.apache.zookeeper.audit.ZKAuditLogger=false
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure this is a very dumb question, why is this called RFAAUDIT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very valid question. By RFAAUDIT i meant Rolling File Audit. Also one A is extra by mistake. I think it is better to change the name to AUDITFILE.

log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.RFAAUDIT.File=${zookeeper.log.dir}/${zookeeper.auditlog.file}
log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we use the same conversion pattern as the other log types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging is done only from one class, which thread is logging is not important. so only required parameters are logged.

@@ -66,6 +66,7 @@
<li><a href="zookeeperHierarchicalQuorums.html">Hierarchical quorums</a></li>
<li><a href="zookeeperObservers.html">Observers</a> - non-voting ensemble members that easily improve ZooKeeper's scalability</li>
<li><a href="zookeeperReconfig.html">Dynamic Reconfiguration</a> - a guide on how to use dynamic reconfiguration in ZooKeeper</li>
<li><a href="zookeeperAuditLogs.html">Audit Logging</a> - a guide on how to configure audit logs in ZooKeeper Server and what contents are logged.</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think we need the "and what contents are logged."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

</para>
<para>
<emphasis role="bold">New in 3.5.4:</emphasis>
By default audit logs are disabled. Set to "true" to enable this feature. Default is "false". See
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We don't need to tell the user what the default is twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected.

</articleinfo>
<section id="ch_auditLogs">
<title>ZooKeeper Audit Logs</title>
<para>Apache ZooKeeper supports audit logs form version 3.5.4. By default audit logs are disabled. To enable audit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"from version"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

@@ -476,6 +624,33 @@ private boolean connClosedByClient(Request request) {
return request.cnxn == null;
}

private String getHostAddress(ServerCnxn cnxn) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we could do this logic in ServerCnxn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

@@ -54,6 +54,7 @@
*/
static final ByteBuffer closeConn = ByteBuffer.allocate(0);

private static String loginUser = System.getProperty("user.name", "<NA>");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand what this represents

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loginUser is the user who started zk server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this can be renamed? Maybe "serverUser" or something to clarify this is not the same thing as the user "logged in" on the client. Or at least a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to serverUser

*/
package org.apache.zookeeper.server.util;

import static org.junit.Assert.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: avoid * imports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


@Test
public void testGetUserFromAllAuthenticationScheme() {
System.setProperty("zookeeper.authProvider.sasl",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have/need a test for custom authentication schemes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only digest authentication is different, rest all are same. That is why not needed.

private ZooKeeper zk;

@Test
public void testAuditLogs() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this integration test. I think it would be great if it was a little more rigorous.

Would it be possible to use multiple auth providers oruse multiple servers and make sure logs are only appearing on the correct server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Authentication provider make difference only for user attribute, rest all audit log attributes will remain same.
Test is already running on 3 node cluster. It is validating number of audit entries but in total. Not on which server. In all the cases expected number of entry is either 1 or three. In case of 1 entry test case currently not checking on which specific server entry is expected. But I think it is fine.

@arshadmohammad
Copy link
Contributor Author

Thanks @hanm @afine for reviewing this feature. I have addressed all the comments, Please have a look.

zookeeper.auditlog.threshold=INFO
audit.logger=INFO, AUDITFILE
log4j.logger.org.apache.zookeeper.audit.ZKAuditLogger=${audit.logger}
log4j.additivity.org.apache.zookeeper.audit.ZKAuditLogger=false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like this would be better here (see how HDFS defines audit log properties for example).
log4j.additivity.org.apache.zookeeper.audit = false
or
log4j.additivity.org.apache.zookeeper.audit.enabled = false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log4j.additivity.org.apache.zookeeper.audit = false is superset configuration of log4j.additivity.org.apache.zookeeper.audit.ZKAuditLogger=false
As all the all audit logging is done through ZKAuditLogger it is ok to configure only this

audit.logger=INFO, AUDITFILE
log4j.logger.org.apache.zookeeper.audit.ZKAuditLogger=${audit.logger}
log4j.additivity.org.apache.zookeeper.audit.ZKAuditLogger=false
log4j.appender.AUDITFILE=org.apache.log4j.RollingFileAppender
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the rolling of the audit log file defined? Is it controlled by the log4j.appender.AUDITFILE.Threshold property?

Should we also add something like DailyRollingFileAppender and a date pattern property as another alternative for rolling logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rolling of the audit log file is defined by following properties
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender

Max log file size of 10MB

log4j.appender.RFAAUDIT.MaxFileSize=10MB
log4j.appender.RFAAUDIT.MaxBackupIndex=10

public static final String KEY_VAL_SEPARATOR = "=";
public static final char PAIR_SEPARATOR = '\t';

public static final String OP_START = "serverStart";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not expose the internal op codes, etc to end users, through API or through the public constant definition included in this patch. We can get rid of the constants definitions here and instead, whenever we want to audit log something we create the operation string on site and directly pass the string to the audit event function call. Something like:
case OpCode.create: subResult = new CreateResult(subTxnResult.path); addSuccessAudit(request, cnxn, "create", subTxnResult.path); break;
This approach seems make the maintain overhead low as we don't have to worry about the correlations between various op codes definitions and the only catch is we might have to duplicate the op code string in different places if we audit logging same thing in different places - but in this case it seems the cost of duplication is low comparing to maintaining explicit op constants.

I also checked HDFS code base and it took a similar approach.

@hanm
Copy link
Contributor

hanm commented Sep 11, 2017

@arshadmohammad Audit logging, when enabled, will for sure degrade performance as it adds additional IO cost. Have you done any sort of performance evaluation so we can roughly get an idea of what kind of performance expectation we should have when audit logging is enabled?

Depends on how the performance degraded on different use cases, you might want to consider adding an async version of audit logging otherwise this feature might not be usable (note I am just guessing on the perf numbers here). Similar case happened on HDFS where an async audit logging was added to avoid degrading performance on hotspots.

@pmoust
Copy link

pmoust commented Apr 23, 2018

@arshadmohammad I'd also be curious as to the perf implications of this, have you run a load test-suite against this?

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a very great addition to 3.5 version. I only have a few nitpick and I think it'd also very useful to get some insight into the performance implications of this patch.

* ACL permissions
* @return string representation of permissions
*/
public static String getPermString(int perms) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered caching these values in some way?
I see 2 options to make this faster:

  1. dynamic caching of already generated values,
  2. static caching of all possible values in a static hashmap (32 possible values if I'm not mistaken)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion
selecting dynamic caching

*/
public static String createLog(String user, String operation, String znode, String acl, String session, String ip,
String status) {
ZKAuditLogFormatter fmt = new ZKAuditLogFormatter();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I got it right, this is the only place where you use ZKAuditLogFormatter. Why don't you just use String.format() instead of a custom formatter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZKAuditLogFormatter is not putting info for which value is null. I think we can not do with String.format()

break;
}
case OpCode.setACL: {
lastOp = "SETA";
rsp = new SetACLResponse(rc.stat);
err = Code.get(rc.err);
addAuditLog(request, cnxn, AuditConstants.OP_SETACL, rc.path, getACLs(request),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this approach has some performance impact even if audit logging is disabled. The flag gets checked within the method, therefore getACLs() will be evaluated even if there's no need for the result.
Passing only request and calling getACLs() from addAuditLog() would be slightly faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected, added one more check at the before making audit log itself
if (ZKAuditLogger.isAuditEnabled()) {
addAuditLog(request, cnxn, AuditConstants.OP_SETACL, rc.path, getACLs(request), err);
}

return ZKUtil.aclToString(setACLRequest.getAcl());
}

private void addFailedTxnAduitLog(Request request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

case OpCode.delete:
case OpCode.deleteContainer:
op = AuditConstants.OP_DELETE;
//path = new String(request.request.array());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please the commented line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@anmolnar
Copy link
Contributor

@arshadmohammad Can we move on with this great new feature?

@arshadmohammad
Copy link
Contributor Author

Hi all, please find performance impact analysis in jira comments.

@pmoust
Copy link

pmoust commented Sep 17, 2018

@arshadmohammad hm - there don't seem to be any(?)

@arshadmohammad
Copy link
Contributor Author

@pmoust I was pointing to comments: 1, 2 and 3

@s4s0l
Copy link

s4s0l commented Sep 3, 2019

What is the status of this MR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants