Skip to content

Comments

HDDS-6904. Cleanup customer related events in S3G logs#3531

Merged
adoroszlai merged 6 commits intoapache:masterfrom
duongkame:HDDS-6904
Jul 19, 2022
Merged

HDDS-6904. Cleanup customer related events in S3G logs#3531
adoroszlai merged 6 commits intoapache:masterfrom
duongkame:HDDS-6904

Conversation

@duongkame
Copy link
Contributor

What changes were proposed in this pull request?

Customer related events, including validation errors, should be logged in Audi logs only. The goal is to avoid a bad client to over load the s3g log and keep s3g log purely for internal issues.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6904

How was this patch tested?

Manual test.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DuongNguyen0 for working on this. The change looks good, but I have two questions on scope.

I see one more exception stack trace (see below) related to invalid auth headers. Is HDDS-6905 intended to address it, or should it be fixed here?

s3g_1       | 2022-06-18 00:28:29,054 [qtp1305004711-24] ERROR signature.AuthorizationV4HeaderParser: AWS access id shouldn't be empty. credential:/20220618/us-west-1/s3/aws4_request
s3g_1       | Jun 18, 2022 12:28:29 AM org.glassfish.jersey.internal.Errors logErrors
s3g_1       | WARNING: The following warnings have been detected: WARNING: Unknown HK2 failure detected:
s3g_1       | MultiException stack 1 of 1
s3g_1       | javax.ws.rs.WebApplicationException: The authorization header you provided is invalid.
s3g_1       | 	at org.apache.hadoop.ozone.s3.OzoneClientProducer.wrapOS3Exception(OzoneClientProducer.java:140)
s3g_1       | 	at org.apache.hadoop.ozone.s3.OzoneClientProducer.getSignature(OzoneClientProducer.java:101)
...
s3g_1       | 	at org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:88)
...
s3g_1       | 	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
...
s3g_1       | Caused by: org.apache.hadoop.ozone.s3.exception.OS3Exception
s3g_1       | 	at org.apache.hadoop.ozone.s3.exception.S3ErrorTable.newError(S3ErrorTable.java:139)
s3g_1       | 	at org.apache.hadoop.ozone.s3.exception.S3ErrorTable.newError(S3ErrorTable.java:126)
s3g_1       | 	at org.apache.hadoop.ozone.s3.signature.AuthorizationV4HeaderParser.parseCredentials(AuthorizationV4HeaderParser.java:171)
s3g_1       | 	at org.apache.hadoop.ozone.s3.signature.AuthorizationV4HeaderParser.parseSignature(AuthorizationV4HeaderParser.java:91)
s3g_1       | 	at org.apache.hadoop.ozone.s3.signature.AWSSignatureProcessor.parseSignature(AWSSignatureProcessor.java:70)
s3g_1       | 	at org.apache.hadoop.ozone.s3.signature.AWSSignatureProcessor$Proxy$_$$_WeldClientProxy.parseSignature(Unknown Source)
s3g_1       | 	at org.apache.hadoop.ozone.s3.OzoneClientProducer.getSignature(OzoneClientProducer.java:80)
s3g_1       | 	... 114 more

Comment on lines 312 to 320
protected void auditWriteFailure(AuditAction action, Throwable ex) {
AUDIT.logWriteFailure(
buildAuditMessageForFailure(action, getAuditParameters(), ex));
}

protected void auditReadFailure(AuditAction action, Exception ex) {
AUDIT.logReadFailure(
buildAuditMessageForFailure(action, getAuditParameters(), ex));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting these is a good idea. Did you intend to change only parts of the code which were touched anyway? Would you be converting other call places in a future task? Also, what about audit(Read|Write)Success?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I intended to keep the scope of this PR for the error logging. But actually we can extend it a little bit to success audits to make the code consistent. Good catch.

@duongkame
Copy link
Contributor Author

Thanks @DuongNguyen0 for working on this. The change looks good, but I have two questions on scope.

I see one more exception stack trace (see below) related to invalid auth headers. Is HDDS-6905 intended to address it, or should it be fixed here?

Thanks for the detailed review, @adoroszlai.

The following log line, as well as others in AuthorizationV4HeaderParser should be addressed in this CR. It's great that you pointed out. I'll be sending a new version for this.

s3g_1       | 2022-06-18 00:28:29,054 [qtp1305004711-24] ERROR signature.AuthorizationV4HeaderParser: AWS access id shouldn't be empty. credential:/20220618/us-west-1/s3/aws4_request

And the other looks like a mishandling of exception and that results in an unhandled exception logging, it should be a part of HDDS-6905.

Jun 18, 2022 12:28:29 AM org.glassfish.jersey.internal.Errors logErrors
s3g_1       | WARNING: The following warnings have been detected: WARNING: Unknown HK2 failure detected:
s3g_1       | MultiException stack 1 of 1
s3g_1       | javax.ws.rs.WebApplicationException: The authorization header you provided is invalid.
s3g_1       | 	at org.apache.hadoop.ozone.s3.OzoneClientProducer.wrapOS3Exception(OzoneClientProducer.java:140)
s3g_1       | 	at org.apache.hadoop.ozone.s3.OzoneClientProducer.getSignature(OzoneClientProducer.java:101)

@adoroszlai adoroszlai added the s3 S3 Gateway label Jun 28, 2022
@kerneltime
Copy link
Contributor

There is one complication when adding a separate auth failure audit log. S3 gateway only validates if the auth signature is well-formed. The actual check for credentials occurs at OM. We would need to peek into the exception to know if this is related to auth or a different issue if we want to log it as an AUTH exception entry in the audit log.

@duongkame
Copy link
Contributor Author

There is one complication when adding a separate auth failure audit log. S3 gateway only validates if the auth signature is well-formed. The actual check for credentials occurs at OM. We would need to peek into the exception to know if this is related to auth or a different issue if we want to log it as an AUTH exception entry in the audit log.

Thanks for the clarification. I introduced the new AUTH marker just because at the time of validating signature, we don't know if the operation is either READ or WRITE yet. These action/classification is determined manually on-by-one at the endpoint handlers, e.g..

Yet, as you mentioned, if we're to do that, need to do that consistently for the actual credentials check as well.

If we don't do that, we can probably infer READ/WRITE base on the web context (e.g. if method is GET/HEAD then READ). However, this is also prone for inconsistence.

Also, the logged action is another complication is the logged action, which is also manually put at the endpoint handler. So, if the adit log is issued at in the handler, it appears as

INFO | S3GAudit | op=CREATE_BUCKET  {bucket=myBucket} | Some failure inside the handler

But at the time of validating signature, the best we can do is:

INFO | S3GAudit | op=PUT /bucket/myBucket  {bucket=myBucket} | Error parsing signature...

I guess it depends on how consistent we want to introduce in the AUDIT log. Another refactor would be needed to make audit logging a consistent mechanism in S3G.

@kerneltime
Copy link
Contributor

kerneltime commented Jul 12, 2022

There is one complication when adding a separate auth failure audit log. S3 gateway only validates if the auth signature is well-formed. The actual check for credentials occurs at OM. We would need to peek into the exception to know if this is related to auth or a different issue if we want to log it as an AUTH exception entry in the audit log.

Thanks for the clarification. I introduced the new AUTH marker just because at the time of validating signature, we don't know if the operation is either READ or WRITE yet. These action/classification is determined manually on-by-one at the endpoint handlers, e.g..

Yet, as you mentioned, if we're to do that, need to do that consistently for the actual credentials check as well.

If we don't do that, we can probably infer READ/WRITE base on the web context (e.g. if method is GET/HEAD then READ). However, this is also prone for inconsistence.

Also, the logged action is another complication is the logged action, which is also manually put at the endpoint handler. So, if the adit log is issued at in the handler, it appears as

INFO | S3GAudit | op=CREATE_BUCKET  {bucket=myBucket} | Some failure inside the handler

But at the time of validating signature, the best we can do is:

INFO | S3GAudit | op=PUT /bucket/myBucket  {bucket=myBucket} | Error parsing signature...

I guess it depends on how consistent we want to introduce in the AUDIT log. Another refactor would be needed to make audit logging a consistent mechanism in S3G.

I agree, I think it would be best to not expand the current PR to fix all the issue but to do incrementally and keep the parent Jira open and add tasks as identified. For now, the scope of the changes here look good and it is more important to get the customer initiated logging off the S3G logs first. I will need to discuss the issue you bring up around the fidelity of the action and what details we capture. I don't think it will be possible to cleanly audit all actions centrally without refactoring.

@kerneltime
Copy link
Contributor

@adoroszlai do you want to give one last look?

Comment on lines +81 to 90
try {
signatureInfo = parser.parseSignature();
} catch (MalformedResourceException e) {
AuditMessage message = buildAuthFailureMessage(e);
AUDIT.logAuthFailure(message);
throw S3ErrorTable.newError(MALFORMED_HEADER, e.getResource());
}
if (signatureInfo != null) {
break;
}
Copy link
Contributor

@adoroszlai adoroszlai Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this exception in the log:

s3g_1        | Jul 11, 2022 10:46:07 PM org.glassfish.jersey.internal.Errors logErrors
s3g_1        | WARNING: The following warnings have been detected: WARNING: Unknown HK2 failure detected:
s3g_1        | MultiException stack 1 of 1
s3g_1        | javax.ws.rs.WebApplicationException: The authorization header you provided is invalid.
s3g_1        | 	at org.apache.hadoop.ozone.s3.OzoneClientProducer.wrapOS3Exception(OzoneClientProducer.java:141)
s3g_1        | 	at org.apache.hadoop.ozone.s3.OzoneClientProducer.getSignature(OzoneClientProducer.java:102)
...
s3g_1        | Caused by: org.apache.hadoop.ozone.s3.exception.OS3Exception
s3g_1        | 	at org.apache.hadoop.ozone.s3.exception.S3ErrorTable.newError(S3ErrorTable.java:139)
s3g_1        | 	at org.apache.hadoop.ozone.s3.exception.S3ErrorTable.newError(S3ErrorTable.java:126)
s3g_1        | 	at org.apache.hadoop.ozone.s3.signature.AWSSignatureProcessor.parseSignature(AWSSignatureProcessor.java:86)

instead of the AUTH failure in audit.

Triggered by:

List buckets with empty access id
Execute aws configure set aws_access_key_id ''
${result} = Execute AWSS3APICli and checkrc list-buckets 255
Should contain ${result} The authorization header you provided is invalid

I'm not sure the exception belongs to HDDS-6905, since it is not an unhandled OMException, rather an S3-specific one thrown by S3 Gateway. But the main problem is that it is missing from audit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at the logs. Actually, the audit log was there but the action is not properly named. I've just submit an update to fixed this.

The warning logs is another problem. It's from the fact that S3G parses and links S3 signature to Endpoints using dependency injection. i.e.

@RequestScoped
public class OzoneClientProducer {
  @Produces
  public S3Auth getSignature() {} 
}
public abstract class EndpointBase implements Auditor {

  @Inject
  private OzoneClient client;

The resolution of S3Auth is done by HK2 (Jersey lightweight IoC framework). When HK2 tries to create an S3Auth instance and gets exceptions for invalid/malformed signatures, it always logs the exceptions as warnings before passing to Jersey to resolve the exceptions to 4xx responses. I found no way to override this behavior.

To avoid the warning logs for those user event exceptions, we can either refactor to avoid using DI for creating and injecting S3Auth, or just disable the those warning logs using log config (note that Jersey just doesn't use log4j but Java Util Logging, so this may be some complication). Think I should log this in another JIRA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DuongNguyen0 for digging into the exception problem.

I should log this in another JIRA.

👍

@adoroszlai adoroszlai merged commit 2395d04 into apache:master Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

s3 S3 Gateway

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants