HDDS-6904. Cleanup customer related events in S3G logs#3531
HDDS-6904. Cleanup customer related events in S3G logs#3531adoroszlai merged 6 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Thanks @DuongNguyen0 for working on this. The change looks good, but I have two questions on scope.
I see one more exception stack trace (see below) related to invalid auth headers. Is HDDS-6905 intended to address it, or should it be fixed here?
s3g_1 | 2022-06-18 00:28:29,054 [qtp1305004711-24] ERROR signature.AuthorizationV4HeaderParser: AWS access id shouldn't be empty. credential:/20220618/us-west-1/s3/aws4_request
s3g_1 | Jun 18, 2022 12:28:29 AM org.glassfish.jersey.internal.Errors logErrors
s3g_1 | WARNING: The following warnings have been detected: WARNING: Unknown HK2 failure detected:
s3g_1 | MultiException stack 1 of 1
s3g_1 | javax.ws.rs.WebApplicationException: The authorization header you provided is invalid.
s3g_1 | at org.apache.hadoop.ozone.s3.OzoneClientProducer.wrapOS3Exception(OzoneClientProducer.java:140)
s3g_1 | at org.apache.hadoop.ozone.s3.OzoneClientProducer.getSignature(OzoneClientProducer.java:101)
...
s3g_1 | at org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:88)
...
s3g_1 | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
...
s3g_1 | Caused by: org.apache.hadoop.ozone.s3.exception.OS3Exception
s3g_1 | at org.apache.hadoop.ozone.s3.exception.S3ErrorTable.newError(S3ErrorTable.java:139)
s3g_1 | at org.apache.hadoop.ozone.s3.exception.S3ErrorTable.newError(S3ErrorTable.java:126)
s3g_1 | at org.apache.hadoop.ozone.s3.signature.AuthorizationV4HeaderParser.parseCredentials(AuthorizationV4HeaderParser.java:171)
s3g_1 | at org.apache.hadoop.ozone.s3.signature.AuthorizationV4HeaderParser.parseSignature(AuthorizationV4HeaderParser.java:91)
s3g_1 | at org.apache.hadoop.ozone.s3.signature.AWSSignatureProcessor.parseSignature(AWSSignatureProcessor.java:70)
s3g_1 | at org.apache.hadoop.ozone.s3.signature.AWSSignatureProcessor$Proxy$_$$_WeldClientProxy.parseSignature(Unknown Source)
s3g_1 | at org.apache.hadoop.ozone.s3.OzoneClientProducer.getSignature(OzoneClientProducer.java:80)
s3g_1 | ... 114 more
| protected void auditWriteFailure(AuditAction action, Throwable ex) { | ||
| AUDIT.logWriteFailure( | ||
| buildAuditMessageForFailure(action, getAuditParameters(), ex)); | ||
| } | ||
|
|
||
| protected void auditReadFailure(AuditAction action, Exception ex) { | ||
| AUDIT.logReadFailure( | ||
| buildAuditMessageForFailure(action, getAuditParameters(), ex)); | ||
| } |
There was a problem hiding this comment.
Extracting these is a good idea. Did you intend to change only parts of the code which were touched anyway? Would you be converting other call places in a future task? Also, what about audit(Read|Write)Success?
There was a problem hiding this comment.
Yeah, I intended to keep the scope of this PR for the error logging. But actually we can extend it a little bit to success audits to make the code consistent. Good catch.
Thanks for the detailed review, @adoroszlai. The following log line, as well as others in And the other looks like a mishandling of exception and that results in an unhandled exception logging, it should be a part of HDDS-6905. |
|
There is one complication when adding a separate auth failure audit log. S3 gateway only validates if the auth signature is well-formed. The actual check for credentials occurs at OM. We would need to peek into the exception to know if this is related to auth or a different issue if we want to log it as an AUTH exception entry in the audit log. |
Thanks for the clarification. I introduced the new AUTH marker just because at the time of validating signature, we don't know if the operation is either Yet, as you mentioned, if we're to do that, need to do that consistently for the actual credentials check as well. If we don't do that, we can probably infer READ/WRITE base on the web context (e.g. if method is GET/HEAD then READ). However, this is also prone for inconsistence. Also, the logged action is another complication is the logged action, which is also manually put at the endpoint handler. So, if the adit log is issued at in the handler, it appears as But at the time of validating signature, the best we can do is: I guess it depends on how consistent we want to introduce in the AUDIT log. Another refactor would be needed to make audit logging a consistent mechanism in S3G. |
I agree, I think it would be best to not expand the current PR to fix all the issue but to do incrementally and keep the parent Jira open and add tasks as identified. For now, the scope of the changes here look good and it is more important to get the customer initiated logging off the S3G logs first. I will need to discuss the issue you bring up around the fidelity of the action and what details we capture. I don't think it will be possible to cleanly audit all actions centrally without refactoring. |
|
@adoroszlai do you want to give one last look? |
| try { | ||
| signatureInfo = parser.parseSignature(); | ||
| } catch (MalformedResourceException e) { | ||
| AuditMessage message = buildAuthFailureMessage(e); | ||
| AUDIT.logAuthFailure(message); | ||
| throw S3ErrorTable.newError(MALFORMED_HEADER, e.getResource()); | ||
| } | ||
| if (signatureInfo != null) { | ||
| break; | ||
| } |
There was a problem hiding this comment.
I see this exception in the log:
s3g_1 | Jul 11, 2022 10:46:07 PM org.glassfish.jersey.internal.Errors logErrors
s3g_1 | WARNING: The following warnings have been detected: WARNING: Unknown HK2 failure detected:
s3g_1 | MultiException stack 1 of 1
s3g_1 | javax.ws.rs.WebApplicationException: The authorization header you provided is invalid.
s3g_1 | at org.apache.hadoop.ozone.s3.OzoneClientProducer.wrapOS3Exception(OzoneClientProducer.java:141)
s3g_1 | at org.apache.hadoop.ozone.s3.OzoneClientProducer.getSignature(OzoneClientProducer.java:102)
...
s3g_1 | Caused by: org.apache.hadoop.ozone.s3.exception.OS3Exception
s3g_1 | at org.apache.hadoop.ozone.s3.exception.S3ErrorTable.newError(S3ErrorTable.java:139)
s3g_1 | at org.apache.hadoop.ozone.s3.exception.S3ErrorTable.newError(S3ErrorTable.java:126)
s3g_1 | at org.apache.hadoop.ozone.s3.signature.AWSSignatureProcessor.parseSignature(AWSSignatureProcessor.java:86)
instead of the AUTH failure in audit.
Triggered by:
ozone/hadoop-ozone/dist/src/main/smoketest/s3/bucketlist.robot
Lines 43 to 46 in 843fac2
I'm not sure the exception belongs to HDDS-6905, since it is not an unhandled OMException, rather an S3-specific one thrown by S3 Gateway. But the main problem is that it is missing from audit.
There was a problem hiding this comment.
Thanks for looking at the logs. Actually, the audit log was there but the action is not properly named. I've just submit an update to fixed this.
The warning logs is another problem. It's from the fact that S3G parses and links S3 signature to Endpoints using dependency injection. i.e.
@RequestScoped
public class OzoneClientProducer {
@Produces
public S3Auth getSignature() {}
}
public abstract class EndpointBase implements Auditor {
@Inject
private OzoneClient client;
The resolution of S3Auth is done by HK2 (Jersey lightweight IoC framework). When HK2 tries to create an S3Auth instance and gets exceptions for invalid/malformed signatures, it always logs the exceptions as warnings before passing to Jersey to resolve the exceptions to 4xx responses. I found no way to override this behavior.
To avoid the warning logs for those user event exceptions, we can either refactor to avoid using DI for creating and injecting S3Auth, or just disable the those warning logs using log config (note that Jersey just doesn't use log4j but Java Util Logging, so this may be some complication). Think I should log this in another JIRA.
There was a problem hiding this comment.
What changes were proposed in this pull request?
Customer related events, including validation errors, should be logged in Audi logs only. The goal is to avoid a bad client to over load the s3g log and keep s3g log purely for internal issues.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-6904
How was this patch tested?
Manual test.