Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18930. Make fs.s3a.create.performance a bucket-wide setting #6168

Merged

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Oct 10, 2023

Description of PR

HADOOP-18930. Make fs.s3a.create.performance a bucket-wide settings.

Adds the switch;

How was this patch tested?

doesn't change the tests which will fail when enabled, or provide and explicit way to set in a maven profile.

likely outcome

  • all tests about overwrite fail
  • cost of creation tests fail.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@steveloughran
Copy link
Contributor Author

creation tests fail. propose: parameterize

[ERROR] Tests run: 16, Failures: 3, Errors: 0, Skipped: 4, Time elapsed: 25.039 s <<< FAILURE! - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractCreate
[ERROR] testOverwriteNonEmptyDirectory(org.apache.hadoop.fs.contract.s3a.ITestS3AContractCreate)  Time elapsed: 5.549 s  <<< FAILURE!
java.lang.AssertionError: write of file over dir succeeded
        at org.junit.Assert.fail(Assert.java:89)
        at org.apache.hadoop.fs.contract.AbstractContractCreateTest.testOverwriteNonEmptyDirectory(AbstractContractCreateTest.java:194)
        at org.apache.hadoop.fs.contract.AbstractContractCreateTest.testOverwriteNonEmptyDirectory(AbstractContractCreateTest.java:210)

[ERROR] testOverwriteEmptyDirectory(org.apache.hadoop.fs.contract.s3a.ITestS3AContractCreate)  Time elapsed: 1.51 s  <<< FAILURE!
java.lang.AssertionError: Should be a directory -but isn't: S3AFileStatus{path=s3a://stevel-london/fork-0001/test/testOverwriteEmptyDirectory; isDirectory=false; length=256; replication=1; blocksize=33554432; modification_time=1696965045000; access_time=0; owner=stevel; group=stevel; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag="fba66d3273eac117107832558bc9a363" versionId=0u_U.3vOkC7.7F0kQHv9fHuNAIRxAo3f
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:585)
        at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:577)
        at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertIsDirectory(AbstractFSContractTestBase.java:352)
        at org.apache.hadoop.fs.contract.AbstractContractCreateTest.testOverwriteEmptyDirectory(AbstractContractCreateTest.java:145)
        at org.apache.hadoop.fs.contract.AbstractContractCreateTest.testOverwriteEmptyDirectory(AbstractContractCreateTest.java:161)

[ERROR] testCreateFileOverExistingFileNoOverwrite(org.apache.hadoop.fs.contract.s3a.ITestS3AContractCreate)  Time elapsed: 1.467 s  <<< FAILURE!
java.lang.AssertionError: writing without overwrite unexpectedly succeeded
        at org.junit.Assert.fail(Assert.java:89)
        at org.apache.hadoop.fs.contract.AbstractContractCreateTest.testCreateFileOverExistingFileNoOverwrite(AbstractContractCreateTest.java:92)
        at org.apache.hadoop.fs.contract.AbstractContractCreateTest.testCreateFileOverExistingFileNoOverwrite(AbstractContractCreateTest.java:106)
)

+ most of ITestCreateFileCost.

      

@steveloughran
Copy link
Contributor Author

+ITestDirectoryMarkerListing

@steveloughran
Copy link
Contributor Author

testing: updated test against s3 london from ide and maven.

@@ -1183,12 +1183,17 @@ private Constants() {

/**
* Flag for create performance.
* This is *not* a configuration option; it is for use in the
* {code createFile()} builder.
* This can be set in the {code createFile()} builder.
Copy link
Contributor

@xinglin xinglin Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add what "create performance" means or change it to a more meaningful name, such as fs.s3a.create.skip.file.existence.check or fs.s3a.create.overwrite-if-exist, or just as simple as fs.s3a.create.overwrite and then add a comment about what it means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd rather not because its the existing openFile option. And it gives us options to do more later, if we can think of

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, understood. we can keep this option. But can we add a comment about what it does today?

I don't get what it meant by reading "create performance".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be covered in the openfile docs

@steveloughran steveloughran changed the title HADOOP-18930. Make fs.s3a.create.performance a bucket-wide settings. HADOOP-18930. Make fs.s3a.create.performance a bucket-wide setting Oct 12, 2023
@@ -656,6 +657,11 @@ public void initialize(URI name, Configuration originalConf)
// verify there's no S3Guard in the store config.
checkNoS3Guard(this.getUri(), getConf());

// performance creation flag for code which wants performance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move comment here to where this property MACRO is defined? easy to find there.

Copy link
Contributor

@xinglin xinglin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change lgtm.

@steveloughran steveloughran force-pushed the s3/HADOOP-18930-create-performance branch from 99f879a to f234e28 Compare October 17, 2023 18:54
@steveloughran
Copy link
Contributor Author

lots of work on the tests, including a fix in production code (create("/") must raise IOE)

still ITestS3ADeleteCost failing

@steveloughran steveloughran force-pushed the s3/HADOOP-18930-create-performance branch from 43bed65 to f670582 Compare October 20, 2023 09:46
If fs.s3a.create.performance is set on a bucket
- All file overwrite checks are skipped, even if the caller says
  otherwise.
- All directory existence checks are skipped.
- Marker deletion is *always* skipped.

This eliminates a HEAD and a LIST for every creation.

* New path capability "fs.s3a.create.performance.enabled" true
  if the option is enabled.
* Parameterize ITestS3AContractCreate to expect the different
  outcomes
* Parameterize ITestCreateFileCost similarly, with
  changed cost assertions there.
* create(/) raises an IOE. existing bug only noticed here.

Change-Id: Ie997290254ae6afd5bc1d52789f57a9b68eb536b
@steveloughran steveloughran force-pushed the s3/HADOOP-18930-create-performance branch from f670582 to 7bdf972 Compare October 24, 2023 18:26
@steveloughran steveloughran marked this pull request as ready for review October 24, 2023 18:31
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 10 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 16m 4s Maven dependency ordering for branch
+1 💚 mvninstall 35m 25s trunk passed
+1 💚 compile 18m 13s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 17m 9s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 4m 23s trunk passed
+1 💚 mvnsite 2m 34s trunk passed
+1 💚 javadoc 1m 53s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 40s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 3m 49s trunk passed
+1 💚 shadedclient 37m 54s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 33s Maven dependency ordering for patch
+1 💚 mvninstall 1m 25s the patch passed
+1 💚 compile 15m 47s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 15m 47s the patch passed
+1 💚 compile 16m 7s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 16m 7s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 4m 19s the patch passed
+1 💚 mvnsite 2m 46s the patch passed
+1 💚 javadoc 2m 1s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 55s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 4m 16s the patch passed
+1 💚 shadedclient 35m 14s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 12s hadoop-common in the patch passed.
+1 💚 unit 3m 9s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 12s The patch does not generate ASF License warnings.
255m 57s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6168/7/artifact/out/Dockerfile
GITHUB PR #6168
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname Linux 1df3f3eea80b 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 7bdf972
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6168/7/testReport/
Max. process+thread count 2503 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6168/7/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Oct 26, 2023
@apache apache deleted a comment from hadoop-yetus Oct 26, 2023
@apache apache deleted a comment from hadoop-yetus Oct 26, 2023
@apache apache deleted a comment from hadoop-yetus Oct 26, 2023
@apache apache deleted a comment from hadoop-yetus Oct 26, 2023
@apache apache deleted a comment from hadoop-yetus Oct 26, 2023
@apache apache deleted a comment from hadoop-yetus Oct 26, 2023
Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1
Looks good to me overall.

@steveloughran steveloughran merged commit 7ec636d into apache:trunk Oct 27, 2023
4 checks passed
ahmarsuhail pushed a commit to ahmarsuhail/hadoop that referenced this pull request Dec 4, 2023
…pache#6168)


If fs.s3a.create.performance is set on a bucket
- All file overwrite checks are skipped, even if the caller says
  otherwise.
- All directory existence checks are skipped.
- Marker deletion is *always* skipped.

This eliminates a HEAD and a LIST for every creation.

* New path capability "fs.s3a.create.performance.enabled" true
  if the option is enabled.
* Parameterize ITestS3AContractCreate to expect the different
  outcomes
* Parameterize ITestCreateFileCost similarly, with
  changed cost assertions there.
* create(/) raises an IOE. existing bug only noticed here.

Contributed by Steve Loughran
ahmarsuhail pushed a commit to ahmarsuhail/hadoop that referenced this pull request Dec 5, 2023
…pache#6168)


If fs.s3a.create.performance is set on a bucket
- All file overwrite checks are skipped, even if the caller says
  otherwise.
- All directory existence checks are skipped.
- Marker deletion is *always* skipped.

This eliminates a HEAD and a LIST for every creation.

* New path capability "fs.s3a.create.performance.enabled" true
  if the option is enabled.
* Parameterize ITestS3AContractCreate to expect the different
  outcomes
* Parameterize ITestCreateFileCost similarly, with
  changed cost assertions there.
* create(/) raises an IOE. existing bug only noticed here.

Contributed by Steve Loughran
ahmarsuhail pushed a commit to ahmarsuhail/hadoop that referenced this pull request Dec 5, 2023
…pache#6168)


If fs.s3a.create.performance is set on a bucket
- All file overwrite checks are skipped, even if the caller says
  otherwise.
- All directory existence checks are skipped.
- Marker deletion is *always* skipped.

This eliminates a HEAD and a LIST for every creation.

* New path capability "fs.s3a.create.performance.enabled" true
  if the option is enabled.
* Parameterize ITestS3AContractCreate to expect the different
  outcomes
* Parameterize ITestCreateFileCost similarly, with
  changed cost assertions there.
* create(/) raises an IOE. existing bug only noticed here.

Contributed by Steve Loughran
jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024
…pache#6168)


If fs.s3a.create.performance is set on a bucket
- All file overwrite checks are skipped, even if the caller says
  otherwise.
- All directory existence checks are skipped.
- Marker deletion is *always* skipped.

This eliminates a HEAD and a LIST for every creation.

* New path capability "fs.s3a.create.performance.enabled" true
  if the option is enabled.
* Parameterize ITestS3AContractCreate to expect the different
  outcomes
* Parameterize ITestCreateFileCost similarly, with
  changed cost assertions there.
* create(/) raises an IOE. existing bug only noticed here.

Contributed by Steve Loughran
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants