Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17511. Add audit/telemetry logging to S3A connector #2807

Merged
merged 3 commits into from
May 25, 2021

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Mar 23, 2021

(Squashed and rebased from #2675)

Notion of AuditSpan which is created for a given operation; goal is
to pass it along everywhere.

It's thread local per FS-instance; store operations pick this up in their
constructor from the StoreContext.

The entryPoint() method in S3A FS has been enhanced to initiate the spans.
For this to work, internal code SHALL NOT call those entry points (Done)
and all public API points MUST be declared as entry points.

This is done, with a marker attribute @AuditEntryPoint to indicate this.

The audit span create/deactivate sequence is ~the same as the duration tracking
so the operation is generally merged: most of the metrics
S3AFS collects are now durations

Part of the isolation into spans means that there's explicit operations
for mkdirs() and getContentSummary()

The auditing is intended to be a plugin point; currently there is
the LoggingAuditor which

  • logs at debug
  • adds an HTTP "referer" header with audit tracing
  • can be set to raise an exception if the SDK is handed an AWS Request and there's no active span (skipped for the multipart upload part and complete calls as TransferManager in the SDK does that out of span).

NoopAuditor which:
*does nothing

A recent change is that we want every span to have a spanID (string, unique across all spans of that FS instance); even the no-op span has unique IDs.

@steveloughran steveloughran added enhancement fs/s3 changes related to hadoop-aws; submitter must declare test endpoint labels Mar 23, 2021
@steveloughran steveloughran changed the title S3/hadoop 17511 auditing HADOOP-17511. Add audit/telemetry logging to S3A connector Mar 23, 2021
@steveloughran
Copy link
Contributor Author

its not merging and I've over-squashed things into the AWS metrics patch. Will need to unroll it

Copy link

@bgaborg bgaborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the walkthrough @steveloughran, and also for working on this.
I started to review the code, but I have some issues with my aws account now which will be resolved next week so I can't run the integration tests until then.

I found an interesting error btw which I havent seen before when running the tests with mvn clean verify -Dparallel-tests -DtestsThreadCount=8 (NullMetadataStore)

[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.307 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.TestListing
[ERROR] testProvidedFileStatusIteratorEnd(org.apache.hadoop.fs.s3a.TestListing)  Time elapsed: 2.922 s  <<< ERROR!
java.lang.NullPointerException
	at org.apache.hadoop.fs.s3a.audit.impl.ActiveAuditManager.createRequestHandlers(ActiveAuditManager.java:223)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.bindAWSClient(S3AFileSystem.java:785)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:478)
	at org.apache.hadoop.fs.s3a.AbstractS3AMockTest.setup(AbstractS3AMockTest.java:61)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)

I have a lot of failures like this, and since it's not even integration tests I'm worried that maybe this is something new added by this PR.

Copy link

@bgaborg bgaborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if the failures are related, these are not happening on trunk.

@steveloughran
Copy link
Contributor Author

I'm going to say the failures are related as its in the auditor code. interesting that you saw and not me. Will look at next week

@steveloughran steveloughran force-pushed the s3/HADOOP-17511-auditing branch 2 times, most recently from 29f5f0e to 665bb01 Compare March 30, 2021 20:05
@steveloughran
Copy link
Contributor Author

I'm going to do a squash of the PR and push up, as yetus has completely given up trying to build this

@steveloughran steveloughran force-pushed the s3/HADOOP-17511-auditing branch 2 times, most recently from 77bd725 to b3fc443 Compare April 13, 2021 10:05
@apache apache deleted a comment from hadoop-yetus Apr 14, 2021
@apache apache deleted a comment from hadoop-yetus Apr 14, 2021
@apache apache deleted a comment from hadoop-yetus Apr 22, 2021
@apache apache deleted a comment from hadoop-yetus Apr 22, 2021
@apache apache deleted a comment from hadoop-yetus Apr 22, 2021
@apache apache deleted a comment from hadoop-yetus Apr 22, 2021
@apache apache deleted a comment from hadoop-yetus Apr 22, 2021
@apache apache deleted a comment from hadoop-yetus Apr 22, 2021
@apache apache deleted a comment from hadoop-yetus Apr 22, 2021
@apache apache deleted a comment from hadoop-yetus Apr 22, 2021
@apache apache deleted a comment from hadoop-yetus Apr 22, 2021
@steveloughran steveloughran force-pushed the s3/HADOOP-17511-auditing branch 2 times, most recently from 1c7c9ab to b6bb916 Compare April 23, 2021 17:52
@HyukjinKwon
Copy link
Member

cc @mswit-databricks FYI

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 18s #2807 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #2807
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2807/25/console
versions git=2.17.1
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@bogthe
Copy link
Contributor

bogthe commented May 21, 2021

@steveloughran are any more changes coming in? I'm happy with the state of this CR, 👍 to get it merged

Copy link

@bgaborg bgaborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review sessions and for working on this Steve, it grew a little big over time :)
We should merge it until there will be more merge conflicts.

LGTM +1

Copy link
Contributor

@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, some minor comments, looks really good.

* @param expected expected value
* @return the expected value.
*/
protected long verifyCounter(final Statistic statistic,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can make this private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving protected in case a subclass wants to use it


@Test
public void testSpanActivation() throws Throwable {
// real activation switches spans in the current thead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "thread"

span2.activate();
assertActiveSpan(span2);
span2.close();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we close span1 here? Maybe some assertion regarding span1's lifecycle after span2 was closed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, added span1 close & assert still in reset span. Did something similarish in the test case underneath (accidentally; I'd navigated to the wrong line)

}

/**
* Creaate the config from {@link AuditTestSupport#loggingAuditConfig()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "Create"

@Test
public void testFileAccessAllowed() throws Throwable {
describe("Enable checkaccess and verify it works with expected"
+ " statitics");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "statistics"

whenRaw(FILE_STATUS_FILE_PROBE));
}

private String access(final S3AFileSystem fs, final Path path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this to the end of the test, after all @Test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done; added javadocs too.

import static org.apache.hadoop.test.LambdaTestUtils.intercept;

/**
* Test S3A FS Access permit/deny is passed through all the way to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDocs doesn't seem right for this test.

fs.getBucketLocation();
// which MUST have ended up calling the extension request handler
Assertions.assertThat(SimpleAWSRequestHandler.getInvocationCount())
.describedAs("Invocatin count of plugged in request handler")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "invocation"

}

/**
* Overrride point: create the callbacks for S3AInputStream.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "Override"

@steveloughran
Copy link
Contributor Author

thank's for the reviews, comments, votes etc.
I'll address all of @mehakmeet's little details, push up a rebased/squashed PR to force it through yetus, then merge

@steveloughran
Copy link
Contributor Author

steveloughran commented May 24, 2021

tests aws london -Dparallel-tests -DtestsThreadCount=5 -Dmarkers=delete -Dscale run all good, getting a bit slow (tombstones?)

[INFO]
[WARNING] Tests run: 151, Failures: 0, Errors: 0, Skipped: 17
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  37:52 min (Wall Clock)
[INFO] Finished at: 2021-05-24T12:48:00+01:00
[INFO] ------------------------------------------------------------------------

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 53s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 2s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 44 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 12m 27s Maven dependency ordering for branch
+1 💚 mvninstall 22m 47s trunk passed
+1 💚 compile 22m 30s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 19m 9s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 4m 3s trunk passed
+1 💚 mvnsite 2m 21s trunk passed
+1 💚 javadoc 1m 31s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 8s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 40s trunk passed
+1 💚 shadedclient 17m 45s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 24s Maven dependency ordering for patch
+1 💚 mvninstall 1m 34s the patch passed
+1 💚 compile 23m 15s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
-1 ❌ javac 23m 15s /results-compile-javac-root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 3 new + 1995 unchanged - 3 fixed = 1998 total (was 1998)
+1 💚 compile 20m 49s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
-1 ❌ javac 20m 49s /results-compile-javac-root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu120.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu120.04-b10 generated 3 new + 1871 unchanged - 3 fixed = 1874 total (was 1874)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 3m 58s /results-checkstyle-root.txt root: The patch generated 5 new + 188 unchanged - 5 fixed = 193 total (was 193)
+1 💚 mvnsite 2m 30s the patch passed
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 javadoc 1m 31s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 39s hadoop-common in the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.
+1 💚 javadoc 0m 39s hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu120.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu120.04-b10 generated 0 new + 63 unchanged - 25 fixed = 63 total (was 88)
+1 💚 spotbugs 4m 30s the patch passed
+1 💚 shadedclient 19m 7s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 31s hadoop-common in the patch passed.
-1 ❌ unit 2m 37s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch passed.
+1 💚 asflicense 0m 48s The patch does not generate ASF License warnings.
213m 16s
Reason Tests
Failed junit tests hadoop.fs.s3a.audit.TestHttpReferrerAuditHeader
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2807/26/artifact/out/Dockerfile
GITHUB PR #2807
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell markdownlint xml
uname Linux 45509ec1e952 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 14d944bc6fe8b0bc94e72be2d9f73a001061d187
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2807/26/testReport/
Max. process+thread count 2004 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2807/26/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

legit test regression. The code to determine the principal is returning null

[ERROR] testHeaderComplexPaths(org.apache.hadoop.fs.s3a.audit.TestHttpReferrerAuditHeader)  Time elapsed: 0.006 s  <<< FAILURE!
org.junit.ComparisonFailure: [pr] expected:<"jenkins"> but was:<null>
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at org.apache.hadoop.fs.s3a.audit.AbstractAuditingTest.assertMapContains(AbstractAuditingTest.java:210)
	at org.apache.hadoop.fs.s3a.audit.TestHttpReferrerAuditHeader.testHeaderComplexPaths(TestHttpReferrerAuditHeader.java:135)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

@steveloughran
Copy link
Contributor Author

Somehow the header test had failed on the principal. Changes

  • how the principal is added has changed
  • fixed up the referrer entry which adding a hadoop/1 prefix change of friday was no longer a valid URI.

This wasn't just a yetus failure; I replicated it locally. How did my tests pass? They didn't, but I hadn't noticed because the failsafe test run was happening anyway...and the failure of the unit tests was happening in a scrolled of test run I wasn't looking at.

That's not good: I've always expected maven to fail as soon as unit tests do. Will investigate separately

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 20s #2807 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #2807
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2807/27/console
versions git=2.17.1
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Notion of AuditSpan which is created for a given operation; goal is
to pass it along everywhere.

It's thread local per FS-instance; store operations pick this up in their
constructor from the StoreContext.

The entryPoint() method in S3A FS has been enhanced to initiate the spans.
For this to work, internal code SHALL NOT call those entry points (Done)
and all public API points MUST be declared as entry points.

This is done, with a marker attribute @AuditEntryPoint to indicate this.

The audit span create/deactivate sequence is ~the same as the duration tracking
so the operation is generally merged: most of the metrics
S3AFS collects are now durations

Part of the isolation into spans means that there's explicit operations
for mkdirs() and getContentSummary()

The auditing is intended to be a plugin point; currently there is
the LoggingAuditor which

logs at debug
- adds an HTTP "referer" header with audit tracing
- can be set to raise an exception if the SDK is handed an AWS Request and there's
  no active span (skipped for the multipart upload part and complete calls
  as TransferManager in the SDK does that out of span).

NoopAuditor which:
 - does nothing

Change-Id: If11a2c48b00db530fb6bc1ad363e24b202acb827

HADOOP-17511 Auditing: getContentSummary and dynamic evaluation (wip)

* added getContentSummary as a single span, with some minor speedups
* still trying to do best design to wire up dynamically evaluated attributes

Currently logs should include the current thread ID; we don't yet pick up and
include the thread where the span was created, which is equally important

Change-Id: Ieea88e4228da0ac4761d8c006051cd1095c5fce8

HADOOP-17511. Audit Spans to have unique IDs.

+ an interface ActiveThreadSpanSource to give current thread span.

This is to allow integrations with the AWS SDK &C to query
the active span for an FS and immediately be able to identify span by ID,
for logging etc.

Adding a unique ID to all audit spans and supporting active thread span
(with activate/deactivate)
causes major changes in the no-op code, as suddenly there's a lot more state
there across manager, auditor and span.

will be testable though.

Change-Id: Id4dddcab7b735bd01f3d3b8a8236ff6da8f97671

HADOOP-17511. Audit review

* SpanIDs must now be unique over time (helps in log analysis)
* All AWS SDK events go to AuditSpan
* FS.access() check also goes to Auditor. This is used by
  Hive

Change-Id: Id1ffffd928f2e274f1bac73109d16e6624ba0e9d

HADOOP-17511. Audit -timestamp, headers and handlers

- timestamp of span creation picked up (epoch millis in UTC) and passed
  in to referrer
- docs on referrer fields
- section on privacy implications
- referrer header can be disabled (privacy...)
- custom request handlers will be created (TODO: tests)

Change-Id: I6e94b43a209eee53748ac14270f318352d512fb8

HADOOP-17511: Unit test work

There's lots of implicit tests in the functional suites, but this
adds tests for
* binding of both bundled auditors
* adding extra request handlers
* wiring up of context to span
* and to TransferManager lifecycle events
* common context static and dynamic eval
* WiP: parsing back of the http referrer header.

This gives reasonable coverage of what's going on, though
another day's work would round it out.

Change-Id: I6b2d0f1dff223875268c18ded481d9e9fea2f250

HADOOP-17511. Unit and integration tests

* Tests of audit execution and stats update.
* Integration test suite for S3A.access()
* Tuning of implementation classes for ease of testing.
* Exporting auditor from S3AFS.
* More stats collected
* Move AuditCheckingException under CredentialInitializationException so that
  s3a translateException doesn't need changing.
* audit p1 & p2 paths moving to be key only
* auditing docs includes real s3 logs and breakdown of referrer (TODO update)

The main remaining bits of testing would be to take existing code and verify
that the headers got through, especially some of the commit logic and job
ID. Distcp might if the jobID is in the config.

Change-Id: I5723db55ba189f6c400cf29a90aa5605b0d98ad0

HADOOP-17511. Improving auditor binding inside AWS SDK

Audit opname in Span callbacks; always set active span from
request. This allows the active operation to always be
determined, including from any other plugins (Cred provider, signer...)
used in same execution.
This is also why Auditor is now in StoreContext.

Tests for RequestFactory.

Change-Id: I9528253cf21253e14714b838d3a8ae85d52ba8b7

HADOOP-17511. checkstyle and more testing

Change-Id: If12f8204237eb0d79f2edcff03fc45f31b7d196a

HADOOP-17511. Auditing: move AuditSpan and common context to hadoop-common

Small move of code, changes in imports more traumatic

Change-Id: Ide158d884bd7a873e07f0ddaff8334882eb28595

HADOOP-17511. Auditing

* avoiding accidentally deactivating spans
* caching of and switching to active span in rename/delete callbacks
* consistent access/use of AuditorId for FS ID
* which is printed in S3AFileSystem.toString().
* S3Guard Tools doing more for audit; also printing IOStats on -verbose.
* Store Context taking/returning an AuditSpanSource, not the full
  OperationAuditor interface.

Change-Id: Ifc5f807a2d3a329b8a1184dd1fcba63205c1f174

HADOOP-17511. Auditing - marker tool.

Marker tool always uses a span; this is created in the FS,
rather than the marker

Change-Id: I03e31dd58c76a41e8a1b73e958b130ed405a29fe

HADOOP-17511. Auditing -explicit integration tests.

Tests to deliberately create problems and so verify
that things fail as expected.

Change-Id: If2e863cee54aa303c24a3d02174e466c272f24b2

HADOOP-17511. Auditing  code/docs cleanup

* Move audit classes in hadoop-common into their own package
* move ActiveThreadSpanSource interface there and implement in S3AFS.
  that's not for public consumption, but it may be good to have there
  so that abfs can implement the same API.

Change-Id: I1b7d924555a1294f7acb3f47dc613adc32ffb003

HADOOP-17511. Auditing: S3 Log parser Pattern with real tests

Show everything works with a full parse of the output captured from a
log entry of a real test run. This is the most complicated Regexp I
have ever written.

Change-Id: I090b2dcefad9938bea3b95ef717a7cb2e9eea10c

HADOOP-17511 add filtering of header fields; with docs

Change-Id: I0da9487a708b5a8fd700ffddd0290d6c0621f3e2

HADOOP-17511. Audit: add o.a.h.fs.audit package with the public classes.

fix up yetus complaints, where valid.

Change-Id: I98e4f7a9c277c993555db6d62a20f2a00515c5e8

HADOOP-17511 review

* Moved HttpReferrerAuditHeader class
* Added version string to URL
* Explained some design decisions in the architecture doc
* Added mukund's comments

Change-Id: I356e5428c51f74b25584bfb1674296ac193c81d5
Change-Id: Ic6ce36df6a3f6e12ed4ee8b6829460e8e875121b
@steveloughran
Copy link
Contributor Author

rebased to trunk again after the AWS region patch from mehakmeet

Change-Id: I10c39db39f62af5606974ff38d497b36dbfb6823
@steveloughran
Copy link
Contributor Author

not sure what is up with yetus there. Submitted again, with some updated docs

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 1s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 3s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 44 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 12m 50s Maven dependency ordering for branch
+1 💚 mvninstall 28m 22s trunk passed
+1 💚 compile 29m 16s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 24m 24s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 4m 39s trunk passed
+1 💚 mvnsite 2m 45s trunk passed
+1 💚 javadoc 1m 45s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 30s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 4m 44s trunk passed
+1 💚 shadedclient 20m 49s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 1m 56s the patch passed
+1 💚 compile 22m 37s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
-1 ❌ javac 22m 37s /results-compile-javac-root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 3 new + 1995 unchanged - 3 fixed = 1998 total (was 1998)
+1 💚 compile 19m 5s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
-1 ❌ javac 19m 5s /results-compile-javac-root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu120.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu120.04-b10 generated 3 new + 1871 unchanged - 3 fixed = 1874 total (was 1874)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 3m 57s /results-checkstyle-root.txt root: The patch generated 5 new + 188 unchanged - 5 fixed = 193 total (was 193)
+1 💚 mvnsite 2m 19s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 1m 30s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 35s hadoop-common in the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.
+1 💚 javadoc 0m 39s hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu120.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu120.04-b10 generated 0 new + 63 unchanged - 25 fixed = 63 total (was 88)
+1 💚 spotbugs 3m 59s the patch passed
+1 💚 shadedclient 17m 34s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 17s hadoop-common in the patch passed.
+1 💚 unit 2m 48s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 53s The patch does not generate ASF License warnings.
232m 59s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2807/28/artifact/out/Dockerfile
GITHUB PR #2807
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell markdownlint xml
uname Linux d5165efc37e3 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / beba5f5
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2807/28/testReport/
Max. process+thread count 3135 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2807/28/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 33s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 3s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 44 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 16m 7s Maven dependency ordering for branch
+1 💚 mvninstall 20m 6s trunk passed
+1 💚 compile 20m 54s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 18m 10s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 3m 48s trunk passed
+1 💚 mvnsite 2m 34s trunk passed
+1 💚 javadoc 1m 46s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 30s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 43s trunk passed
+1 💚 shadedclient 14m 37s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 1m 30s the patch passed
+1 💚 compile 20m 8s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
-1 ❌ javac 20m 8s /results-compile-javac-root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 3 new + 1995 unchanged - 3 fixed = 1998 total (was 1998)
+1 💚 compile 17m 58s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
-1 ❌ javac 17m 58s /results-compile-javac-root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu120.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu120.04-b08 generated 3 new + 1871 unchanged - 3 fixed = 1874 total (was 1874)
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 3m 43s /results-checkstyle-root.txt root: The patch generated 5 new + 188 unchanged - 5 fixed = 193 total (was 193)
+1 💚 mvnsite 2m 31s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 1m 46s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 41s hadoop-common in the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.
+1 💚 javadoc 0m 49s hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu120.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu120.04-b08 generated 0 new + 63 unchanged - 25 fixed = 63 total (was 88)
+1 💚 spotbugs 4m 4s the patch passed
+1 💚 shadedclient 14m 41s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 0s hadoop-common in the patch passed.
+1 💚 unit 2m 16s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 0s The patch does not generate ASF License warnings.
198m 50s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2807/29/artifact/out/Dockerfile
GITHUB PR #2807
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell markdownlint xml
uname Linux 19f577c6f5fa 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / abd89a0
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2807/29/testReport/
Max. process+thread count 1840 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2807/29/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

Yetus reports are a bit confused, but the output is good

  • checkstyles are mistaken/unavoidable
  • tests are good
    merging

@steveloughran steveloughran merged commit 832a3c6 into apache:trunk May 25, 2021
@steveloughran
Copy link
Contributor Author

backporting to branch-3.3 if the tests run successfully. Merge has gone in and first test run is happy.

asfgit pushed a commit that referenced this pull request May 25, 2021
The S3A connector supports
"an auditor", a plugin which is invoked
at the start of every filesystem API call,
and whose issued "audit span" provides a context
for all REST operations against the S3 object store.

The standard auditor sets the HTTP Referrer header
on the requests with information about the API call,
such as process ID, operation name, path,
and even job ID.

If the S3 bucket is configured to log requests, this
information will be preserved there and so can be used
to analyze and troubleshoot storage IO.

Contributed by Steve Loughran.

Change-Id: Ic0a105c194342ed2d529833ecc42608e8ba2f258
@steveloughran steveloughran deleted the s3/HADOOP-17511-auditing branch October 15, 2021 19:42
kiran-maturi pushed a commit to kiran-maturi/hadoop that referenced this pull request Nov 24, 2021
The S3A connector supports
"an auditor", a plugin which is invoked
at the start of every filesystem API call,
and whose issued "audit span" provides a context
for all REST operations against the S3 object store.

The standard auditor sets the HTTP Referrer header
on the requests with information about the API call,
such as process ID, operation name, path,
and even job ID.

If the S3 bucket is configured to log requests, this
information will be preserved there and so can be used
to analyze and troubleshoot storage IO.

Contributed by Steve Loughran.
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
apache#2807)

The S3A connector supports
"an auditor", a plugin which is invoked
at the start of every filesystem API call,
and whose issued "audit span" provides a context
for all REST operations against the S3 object store.

The standard auditor sets the HTTP Referrer header
on the requests with information about the API call,
such as process ID, operation name, path,
and even job ID.

If the S3 bucket is configured to log requests, this
information will be preserved there and so can be used
to analyze and troubleshoot storage IO.

Contributed by Steve Loughran.

MUST be followed by:

CDPD-28457. HADOOP-17822. fs.s3a.acl.default not working after S3A Audit feature (apache#3249)
CDPD-24982. HADOOP-17801. No error message reported when bucket doesn't exist in S3AFS

Conflicts:
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java
  hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AbstractStoreOperation.java
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/DeleteOperation.java
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/StoreContext.java
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DynamoDBMetadataStore.java
	hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestPartialDeleteFailures.java

Mostly related to shaded guava.

this patch really needs CDPD-10473. HADOOP-16645. S3A Delegation Token
extension point to use StoreContext; had to CP a file in, and even then
the auditing may not be complete there. Will revisit, even though
Knox and Ranger will both need a matching change

Change-Id: Ic0a105c194342ed2d529833ecc42608e8ba2f258
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement fs/s3 changes related to hadoop-aws; submitter must declare test endpoint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants