Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17227. Marker Tool tuning #2254

Merged

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Aug 27, 2020

  • move from -expect to -min and -max; easier for CLI testing. Plus works
  • in -nonauth mode, even when policy == keep, files not in an auth path
    count as failure.
  • bucket-info option also prints out the authoritative path, so you have
    more idea what is happening
  • reporting of command failure more informative

The reason for change #2 is a workflow where you want to audit a dir, even
though you are in keep mode, and you don't have any auth path. You'd expect
-nonauth to say "no auth path", but instead it treats the whole dir as
auth.

https://issues.apache.org/jira/browse/HADOOP-17227

@steveloughran
Copy link
Contributor Author

bucket info gets improved too, based on working on the CLI.

bin/hadoop s3guard bucket-info s3a://stevel-london/
2020-08-28 15:45:02,624 [main] INFO  impl.DirectoryPolicyImpl (DirectoryPolicyImpl.java:getDirectoryPolicy(193)) - Directory markers will be kept on authoritative paths
Filesystem s3a://stevel-london
Location: eu-west-2
Filesystem s3a://stevel-london is using S3Guard with store DynamoDBMetadataStore{region=eu-west-2, tableName=stevel-london, tableArn=arn:aws:dynamodb:eu-west-2:152813717728:table/stevel-london}
Authoritative Metadata Store: fs.s3a.metadatastore.authoritative=false
Authoritative Path: fs.s3a.authoritative.path=/tables
Qualified Authoritative Paths:
	s3a://stevel-london/tables/

	Metadata time to live: (set in fs.s3a.metadatastore.metadata.ttl) = 00:15:00.000
Metadata Store Diagnostics:
	ARN=arn:aws:dynamodb:eu-west-2:152813717728:table/stevel-london
	billing-mode=per-request
	description=S3Guard metadata store in DynamoDB
	name=stevel-london
	persist.authoritative.bit=true
	read-capacity=0
	region=eu-west-2
	retryPolicy=ExponentialBackoffRetry(maxRetries=9, sleepTime=100 MILLISECONDS)
	size=38538
	sse=DISABLED
	status=ACTIVE
	table={AttributeDefinitions: [{AttributeName: child,AttributeType: S}, {AttributeName: parent,AttributeType: S}],TableName: stevel-london,KeySchema: [{AttributeName: parent,KeyType: HASH}, {AttributeName: child,KeyType: RANGE}],TableStatus: ACTIVE,CreationDateTime: Mon Mar 16 20:21:32 GMT 2020,ProvisionedThroughput: {NumberOfDecreasesToday: 0,ReadCapacityUnits: 0,WriteCapacityUnits: 0},TableSizeBytes: 38538,ItemCount: 300,TableArn: arn:aws:dynamodb:eu-west-2:152813717728:table/stevel-london,TableId: 422878eb-823c-4071-826a-3746b7c8fd18,BillingModeSummary: {BillingMode: PAY_PER_REQUEST,LastUpdateToPayPerRequestDateTime: Mon Mar 16 20:21:32 GMT 2020},}
	write-capacity=0

S3A Client
	Signing Algorithm: fs.s3a.signing-algorithm=(unset)
	Endpoint: fs.s3a.endpoint=s3.eu-west-2.amazonaws.com
	Encryption: fs.s3a.server-side-encryption-algorithm=none
	Input seek policy: fs.s3a.experimental.input.fadvise=normal
	Change Detection Source: fs.s3a.change.detection.source=etag
	Change Detection Mode: fs.s3a.change.detection.mode=server

S3A Committers
	The "magic" committer is supported in the filesystem
	S3A Committer factory class: mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
	S3A Committer name: fs.s3a.committer.name=directory
	Cluster filesystem staging directory: fs.s3a.committer.staging.tmp.path=tmp/staging
	Local filesystem buffer directory: fs.s3a.buffer.dir=/tmp/hadoop-stevel/s3a
	File conflict resolution: fs.s3a.committer.staging.conflict-mode=append

Security
	Delegation token support is disabled

Security
	The directory marker policy is "authoritative"
	Available Policies: delete, keep, authoritative
	Authoritative paths: fs.s3a.authoritative.path=/tables

@apache apache deleted a comment from hadoop-yetus Aug 28, 2020
@steveloughran steveloughran added the fs/s3 changes related to hadoop-aws; submitter must declare test endpoint label Aug 28, 2020
@steveloughran
Copy link
Contributor Author

./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3GuardTool.java:28:import java.time.Duration;:8: Unused import - java.time.Duration. [UnusedImports]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3GuardTool.java:763:    public static final String PURPOSE = "destroy the Metadata Store including its contents": Line is longer than 80 characters (found 92). [LineLength]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:332:            String.format("Argument for %s is not a number: %s", option, value));: Line is longer than 80 characters (found 81). [LineLength]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:435:    /**: First sentence should end with a period. [JavadocStyle]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:622:  private ScanResult failScan(ScanResult result, int code, String message, Object...args) {: Line is longer than 80 characters (found 91). [LineLength]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/AbstractS3ATestBase.java

@steveloughran
Copy link
Contributor Author

aah, the builder API raises checkstyles

./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3GuardTool.java:762:    public static final String PURPOSE = "destroy the Metadata Store including its": Line is longer than 80 characters (found 83). [LineLength]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:933:    public ScanArgsBuilder withSourceFS(final FileSystem sourceFS) {:58: 'sourceFS' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:939:    public ScanArgsBuilder withPath(final Path path) {:48: 'path' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:945:    public ScanArgsBuilder withDoPurge(final boolean doPurge) {:54: 'doPurge' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:951:    public ScanArgsBuilder withMinMarkerCount(final int minMarkerCount) {:57: 'minMarkerCount' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:957:    public ScanArgsBuilder withMaxMarkerCount(final int maxMarkerCount) {:57: 'maxMarkerCount' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:963:    public ScanArgsBuilder withLimit(final int limit) {:48: 'limit' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:969:    public ScanArgsBuilder withNonAuth(final boolean nonAuth) {:54: 'nonAuth' hides a field. [HiddenField]

@steveloughran
Copy link
Contributor Author

  • for testing, it'd be good if we could count objects -minobjects, -maxobjects, which would be for all objects, markers included, under a path. Helps verify rename &c, even when s3guard is enabled.

@apache apache deleted a comment from hadoop-yetus Sep 3, 2020
@apache apache deleted a comment from hadoop-yetus Sep 3, 2020
* move from -expect to -min and -max; easier for CLI testing. Plus works
* in -nonauth mode, even when policy == keep, files not in an auth path
  count as failure.
* bucket-info option also prints out the authoritative path, so you have
  more idea what is happening
* reporting of command failure more informative

The reason for change #2 is a workflow where you want to audit a dir, even
though you are in keep mode, and you don't have any auth path. You'd expect
-nonauth to say "no auth path", but instead it treats the whole dir as
auth.

Change-Id: Ib310e321e5862957fbd92bebfade93231f92b16f
Change-Id: Iddcefb26a7de0fce0c7b6ae0d679590005cd63b6
* fix checkstyle
* use bulder API for passing (Growing) set of params around

Change-Id: I1ce980a4d7d4f5e9ad7f1c7b7fa4c6fd9806b8f1
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 29m 2s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 compile 0m 37s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 checkstyle 0m 29s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 shadedclient 15m 7s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 24s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 31s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗 spotbugs 1m 5s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 2s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 34s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javac 0m 33s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 javac 0m 29s the patch passed
-0 ⚠️ checkstyle 0m 19s hadoop-tools/hadoop-aws: The patch generated 8 new + 11 unchanged - 0 fixed = 19 total (was 11)
+1 💚 mvnsite 0m 33s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 14m 34s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 19s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 27s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 findbugs 1m 11s the patch passed
_ Other Tests _
+1 💚 unit 1m 30s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 32s The patch does not generate ASF License warnings.
72m 4s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/4/artifact/out/Dockerfile
GITHUB PR #2254
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint
uname Linux 0ced8146f717 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 5e12dc5
Default Java Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
checkstyle https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/4/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/4/testReport/
Max. process+thread count 413 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/4/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@bgaborg
Copy link

bgaborg commented Sep 3, 2020

Nice improvement Steve, LGTM, +1

Change-Id: I49cc69ad61601fd858005323e73ae5ad7178e82e
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 29m 10s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 compile 0m 36s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 checkstyle 0m 29s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 shadedclient 15m 24s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 25s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗 spotbugs 1m 5s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 3s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 36s the patch passed
+1 💚 compile 0m 34s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javac 0m 34s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 javac 0m 33s the patch passed
-0 ⚠️ checkstyle 0m 22s hadoop-tools/hadoop-aws: The patch generated 1 new + 11 unchanged - 0 fixed = 12 total (was 11)
+1 💚 mvnsite 0m 37s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 13m 47s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 19s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 26s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 findbugs 1m 5s the patch passed
_ Other Tests _
+1 💚 unit 1m 31s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
71m 59s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/5/artifact/out/Dockerfile
GITHUB PR #2254
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint
uname Linux 6007ceecd302 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 5c15815
Default Java Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
checkstyle https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/5/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/5/testReport/
Max. process+thread count 413 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/5/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran merged commit 5346cc3 into apache:trunk Sep 4, 2020
asfgit pushed a commit that referenced this pull request Sep 4, 2020
Contributed by Steve Loughran.
@steveloughran steveloughran deleted the s3/HADOOP-17227-markers-expect branch October 15, 2021 19:43
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
Contributed by Steve Loughran.

Change-Id: Ia36f058456db94c7358bc113ef298652445b03d3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs/s3 changes related to hadoop-aws; submitter must declare test endpoint
Projects
None yet
3 participants