-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18183. s3a audit logs to publish range start/end of GET requests in audit header #5110
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one feedback - see comment thread on the code itself.
Following that, I think we're good if Yetus is happy. Thanks @sauraank.
if (rangeValue.length != 2) { | ||
WARN_INCORRECT_RANGE.warn("Expected range to contain 0 or 2 elements. Got " | ||
+ rangeValue.length + ". Ignoring"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't actually ignore the value here. We should return or put the next few lines in an else
block.
We'd get an IndexOutOfBoundsException when we access rangeValue[1]
when there's only one element, for example.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 (non-binding), assuming Yetus is happy with the changes
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sauraank for your contribution. I have few comments. Can you please address those:
- Please fix the style check
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java:243: /**: First sentence should end with a period. [JavadocStyle]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/audit/AbstractAuditingTest.java:143: /**: First sentence should end with a period. [JavadocStyle]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/audit/TestHttpReferrerAuditHeader.java:315: /**: First sentence should end with a period. [JavadocStyle]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/audit/TestHttpReferrerAuditHeader.java:334: /**: First sentence should end with a period. [JavadocStyle]
- Run integration tests for s3 and please share that results in PR.
- Please tick the relevant boxes in PR description
Does the title or this PR starts with the corresponding JIRA issue id (e.g. '[HADOOP-17799](https://issues.apache.org/jira/browse/HADOOP-17799). Your PR title ...')?
Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?
Oops, didn't realise this doesn't fail CI. Thanks for calling that out, Ashutosh. |
Thanks @dannycjones @ashutoshcipher for the review. I have fixed the checkstyle recommendation. I have also added the integration test result for hadoop-aws. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 (non-binding) now Yetus is happy incl. checkstyle.
Thanks @dannycjones ! Hey @steveloughran can you please review this PR? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks good; some minor suggestions.
it does need a documentation update too, I'm afraid
} | ||
if (rangeValue.length != 2) { | ||
WARN_INCORRECT_RANGE.warn("Expected range to contain 0 or 2 elements. Got " | ||
+ rangeValue.length + " elements. Ignoring"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use the slf4j {} expansion for on demand expansion and resilience to NPEs in toString(). not that important here, but good to keep to the same code style everywhere
+ rangeValue.length + " elements. Ignoring"); | ||
return; | ||
} | ||
String combinedRangeValue = String.format("bytes=%d-%d", rangeValue[0], rangeValue[1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make a two byte prefix ("rg"?) and put in AuditConstants?
- shorter as we don't know the upper limit on referrer strings and would like to keep it down
- in the file for other connectors to use, and to make it easy to look at what to expect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveloughran, can you elaborate on the rg
prefix? Is it for the query parameter? If so, we already have that as r
(PARAM_RANGE
).
bytes=
is part of the range header value which - following the SDK upgrade - can just be dumped in place, without needing reconstruction like this.
That being said, it does seem a bit of a waste. Maybe we drop it entirely here (and for all connectors) and just include whatever the value of the range header was minus bytes=
prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, i'd missed that. yes, you are right. just include it without the bytes= stuff.
i guess we do have to consider what happens if/when s3 supports multiple ranges, but that would be just something like "1-3,4-7", wouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that makes sense.
The only risk I see is if the object store uses a range specified in another unit from bytes. In V1 SDK, it's hard coded to bytes
- no issue there. In V2, I believe you specify the header itself so we can warn once if we see something other than bytes=
prefix and ignore the header if so. We are the ones specifying the header in the first place anyway, right? (FYI @ahmarsuhail @passaro)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Steve mentioned, let's update documentation.
Here's the two most relevant that I can think of, also try a search in the code base too for anything auditor related.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 (non-binding). LGTM, I see you ran the tests again yesterday.
hey @steveloughran - do you have time to give this another pass over the next week? |
yeah, looking at it. got distracted by the various crises... |
@@ -251,11 +251,10 @@ private void attachRangeFromRequest(AmazonWebServiceRequest request) { | |||
return; | |||
} | |||
if (rangeValue.length != 2) { | |||
WARN_INCORRECT_RANGE.warn("Expected range to contain 0 or 2 elements. Got " | |||
+ rangeValue.length + " elements. Ignoring"); | |||
WARN_INCORRECT_RANGE.warn("Expected range to contain 0 or 2 elements. Got {} elements. Ignoring.", rangeValue.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, checkstyle is unhappy. needs splitting.
i will do that myself locally as i do a retest and merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I always miss that one. I wish it would fail the Yetus approval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
+1
a couple of minor nits; addressing in #5218 which will be the merge commit. i needed to do a local run anyway to see what the logs look like
…ts. (#5110) The start and end of the range is set in a new audit param "rg", e.g "?rg=100-200" Contributed by Ankit Saurabh
ok, merged in a different branch, backporting with retesting. closing this one as done. thanks! |
…ts. (#5110) The start and end of the range is set in a new audit param "rg", e.g "?rg=100-200" Contributed by Ankit Saurabh
…ts. (#5110) The start and end of the range is set in a new audit param "rg", e.g "?rg=100-200" Contributed by Ankit Saurabh
…ts. (apache#5110) The start and end of the range is set in a new audit param "rg", e.g "?rg=100-200" Contributed by Ankit Saurabh
Description of PR
Added the Range in the referer header for GetObjectRequest. It is of the format "%d-%d".
How was this patch tested?
Added the unit tests for it. Tested by running the unit tests and integration test on hadoop-aws successfully. Integration test was run in the region eu-west-1 by command
mvn -Dparallel-tests clean verify
.NOTE- Initially, integration test failed and then succeded on a retry.
Following is the latest result -
[INFO] Results:
[INFO]
[WARNING] Tests run: 1154, Failures: 0, Errors: 0, Skipped: 182
[INFO]
[INFO]
[INFO] --- maven-failsafe-plugin:3.0.0-M1:integration-test (sequential-integration-tests) @ hadoop-aws ---
[INFO]
[INFO] -------------------------------------------------------
[INFO] Results:
[INFO]
[WARNING] Tests run: 124, Failures: 0, Errors: 0, Skipped: 84
[INFO]
[INFO]
[INFO] --- maven-enforcer-plugin:3.0.0:enforce (depcheck) @ hadoop-aws ---
[INFO]
[INFO] --- maven-failsafe-plugin:3.0.0-M1:verify (default-integration-test) @ hadoop-aws ---
[INFO]
[INFO] --- maven-failsafe-plugin:3.0.0-M1:verify (sequential-integration-tests) @ hadoop-aws ---
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 14:01 min
[INFO] Finished at: 2022-11-23T14:27:39Z
[INFO] Final Memory: 71M/1541M
[INFO] ------------------------------------------------------------------------
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?