New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-1260:Audit logging in ZooKeeper servers. #338
ZOOKEEPER-1260:Audit logging in ZooKeeper servers. #338
Conversation
@@ -52,6 +52,7 @@ See http://forrest.apache.org/docs/linking.html for more info. | |||
<jmx label="JMX" href="zookeeperJMX.html" /> | |||
<observers label="Observers Guide" href="zookeeperObservers.html" /> | |||
<reconfig label="Dynamic Reconfiguration" href="zookeeperReconfig.html" /> | |||
<reconfig label="Audit Logging" href="zookeeperAuditLogs.html" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be something like <audit label="Audit Logging" href="zookeeperAuditLogs.html" />
rather than <reconfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it should be <audit. Corrected it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Some quick comments. Also please fix FindBug warnings as they are legitimate.
<emphasis role="bold">zookeeper.audit.enabled</emphasis>) | ||
</para> | ||
<para> | ||
<emphasis role="bold">New in 3.5.3:</emphasis> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be 3.5.4 at least. We've shipped 3.5.3 already :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected it
</imageobject> | ||
</mediaobject> | ||
<para>The audit log captures the detailed information for the operations that are selected to be audited. The audit | ||
information is written as a set of key=value pairs for the following keys</para> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: missing full stop after following keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected it
audit.logger=INFO, RFAAUDIT | ||
log4j.logger.org.apache.zookeeper.audit.ZKAuditLogger=${audit.logger} | ||
log4j.additivity.org.apache.zookeeper.audit.ZKAuditLogger=false | ||
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure this is a very dumb question, why is this called RFAAUDIT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very valid question. By RFAAUDIT i meant Rolling File Audit. Also one A is extra by mistake. I think it is better to change the name to AUDITFILE.
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender | ||
log4j.appender.RFAAUDIT.File=${zookeeper.log.dir}/${zookeeper.auditlog.file} | ||
log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout | ||
log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we use the same conversion pattern as the other log types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging is done only from one class, which thread is logging is not important. so only required parameters are logged.
@@ -66,6 +66,7 @@ | |||
<li><a href="zookeeperHierarchicalQuorums.html">Hierarchical quorums</a></li> | |||
<li><a href="zookeeperObservers.html">Observers</a> - non-voting ensemble members that easily improve ZooKeeper's scalability</li> | |||
<li><a href="zookeeperReconfig.html">Dynamic Reconfiguration</a> - a guide on how to use dynamic reconfiguration in ZooKeeper</li> | |||
<li><a href="zookeeperAuditLogs.html">Audit Logging</a> - a guide on how to configure audit logs in ZooKeeper Server and what contents are logged.</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think we need the "and what contents are logged."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
</para> | ||
<para> | ||
<emphasis role="bold">New in 3.5.4:</emphasis> | ||
By default audit logs are disabled. Set to "true" to enable this feature. Default is "false". See |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We don't need to tell the user what the default is twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected.
</articleinfo> | ||
<section id="ch_auditLogs"> | ||
<title>ZooKeeper Audit Logs</title> | ||
<para>Apache ZooKeeper supports audit logs form version 3.5.4. By default audit logs are disabled. To enable audit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"from version"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
@@ -476,6 +624,33 @@ private boolean connClosedByClient(Request request) { | |||
return request.cnxn == null; | |||
} | |||
|
|||
private String getHostAddress(ServerCnxn cnxn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we could do this logic in ServerCnxn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
@@ -54,6 +54,7 @@ | |||
*/ | |||
static final ByteBuffer closeConn = ByteBuffer.allocate(0); | |||
|
|||
private static String loginUser = System.getProperty("user.name", "<NA>"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand what this represents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loginUser is the user who started zk server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this can be renamed? Maybe "serverUser" or something to clarify this is not the same thing as the user "logged in" on the client. Or at least a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to serverUser
*/ | ||
package org.apache.zookeeper.server.util; | ||
|
||
import static org.junit.Assert.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: avoid * imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
@Test | ||
public void testGetUserFromAllAuthenticationScheme() { | ||
System.setProperty("zookeeper.authProvider.sasl", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have/need a test for custom authentication schemes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only digest authentication is different, rest all are same. That is why not needed.
private ZooKeeper zk; | ||
|
||
@Test | ||
public void testAuditLogs() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this integration test. I think it would be great if it was a little more rigorous.
Would it be possible to use multiple auth providers oruse multiple servers and make sure logs are only appearing on the correct server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Authentication provider make difference only for user attribute, rest all audit log attributes will remain same.
Test is already running on 3 node cluster. It is validating number of audit entries but in total. Not on which server. In all the cases expected number of entry is either 1 or three. In case of 1 entry test case currently not checking on which specific server entry is expected. But I think it is fine.
conf/log4j.properties
Outdated
zookeeper.auditlog.threshold=INFO | ||
audit.logger=INFO, AUDITFILE | ||
log4j.logger.org.apache.zookeeper.audit.ZKAuditLogger=${audit.logger} | ||
log4j.additivity.org.apache.zookeeper.audit.ZKAuditLogger=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something like this would be better here (see how HDFS defines audit log properties for example).
log4j.additivity.org.apache.zookeeper.audit = false
or
log4j.additivity.org.apache.zookeeper.audit.enabled = false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log4j.additivity.org.apache.zookeeper.audit = false is superset configuration of log4j.additivity.org.apache.zookeeper.audit.ZKAuditLogger=false
As all the all audit logging is done through ZKAuditLogger it is ok to configure only this
conf/log4j.properties
Outdated
audit.logger=INFO, AUDITFILE | ||
log4j.logger.org.apache.zookeeper.audit.ZKAuditLogger=${audit.logger} | ||
log4j.additivity.org.apache.zookeeper.audit.ZKAuditLogger=false | ||
log4j.appender.AUDITFILE=org.apache.log4j.RollingFileAppender |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the rolling of the audit log file defined? Is it controlled by the log4j.appender.AUDITFILE.Threshold
property?
Should we also add something like DailyRollingFileAppender and a date pattern property as another alternative for rolling logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rolling of the audit log file is defined by following properties
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
Max log file size of 10MB
log4j.appender.RFAAUDIT.MaxFileSize=10MB
log4j.appender.RFAAUDIT.MaxBackupIndex=10
public static final String KEY_VAL_SEPARATOR = "="; | ||
public static final char PAIR_SEPARATOR = '\t'; | ||
|
||
public static final String OP_START = "serverStart"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not expose the internal op codes, etc to end users, through API or through the public constant definition included in this patch. We can get rid of the constants definitions here and instead, whenever we want to audit log something we create the operation string on site and directly pass the string to the audit event function call. Something like:
case OpCode.create: subResult = new CreateResult(subTxnResult.path); addSuccessAudit(request, cnxn, "create", subTxnResult.path); break;
This approach seems make the maintain overhead low as we don't have to worry about the correlations between various op codes definitions and the only catch is we might have to duplicate the op code string in different places if we audit logging same thing in different places - but in this case it seems the cost of duplication is low comparing to maintaining explicit op constants.
I also checked HDFS code base and it took a similar approach.
@arshadmohammad Audit logging, when enabled, will for sure degrade performance as it adds additional IO cost. Have you done any sort of performance evaluation so we can roughly get an idea of what kind of performance expectation we should have when audit logging is enabled? Depends on how the performance degraded on different use cases, you might want to consider adding an async version of audit logging otherwise this feature might not be usable (note I am just guessing on the perf numbers here). Similar case happened on HDFS where an async audit logging was added to avoid degrading performance on hotspots. |
@arshadmohammad I'd also be curious as to the perf implications of this, have you run a load test-suite against this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a very great addition to 3.5 version. I only have a few nitpick and I think it'd also very useful to get some insight into the performance implications of this patch.
* ACL permissions | ||
* @return string representation of permissions | ||
*/ | ||
public static String getPermString(int perms) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered caching these values in some way?
I see 2 options to make this faster:
- dynamic caching of already generated values,
- static caching of all possible values in a static hashmap (32 possible values if I'm not mistaken)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion
selecting dynamic caching
*/ | ||
public static String createLog(String user, String operation, String znode, String acl, String session, String ip, | ||
String status) { | ||
ZKAuditLogFormatter fmt = new ZKAuditLogFormatter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I got it right, this is the only place where you use ZKAuditLogFormatter
. Why don't you just use String.format()
instead of a custom formatter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZKAuditLogFormatter is not putting info for which value is null. I think we can not do with String.format()
break; | ||
} | ||
case OpCode.setACL: { | ||
lastOp = "SETA"; | ||
rsp = new SetACLResponse(rc.stat); | ||
err = Code.get(rc.err); | ||
addAuditLog(request, cnxn, AuditConstants.OP_SETACL, rc.path, getACLs(request), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this approach has some performance impact even if audit logging is disabled. The flag gets checked within the method, therefore getACLs()
will be evaluated even if there's no need for the result.
Passing only request
and calling getACLs()
from addAuditLog()
would be slightly faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected, added one more check at the before making audit log itself
if (ZKAuditLogger.isAuditEnabled()) {
addAuditLog(request, cnxn, AuditConstants.OP_SETACL, rc.path, getACLs(request), err);
}
return ZKUtil.aclToString(setACLRequest.getAcl()); | ||
} | ||
|
||
private void addFailedTxnAduitLog(Request request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
case OpCode.delete: | ||
case OpCode.deleteContainer: | ||
op = AuditConstants.OP_DELETE; | ||
//path = new String(request.request.array()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please the commented line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@arshadmohammad Can we move on with this great new feature? |
Hi all, please find performance impact analysis in jira comments. |
@arshadmohammad hm - there don't seem to be any(?) |
What is the status of this MR? |
Audit logging in ZooKeeper Servers.