Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17074 Optimise s3a Listing to be fully asynchronous. #2207

Merged

Conversation

mukund-thakur
Copy link
Contributor

Tested using ap-south-1 bucket. All good apart from known failures.
https://issues.apache.org/jira/browse/HADOOP-17192
https://issues.apache.org/jira/browse/HADOOP-17190

@mukund-thakur
Copy link
Contributor Author

mukund-thakur commented Aug 10, 2020

Performance result using new test:
2020-08-10 15:04:35,966 [JUnit-testMultiPagesListingPerformanceAndCorrectness] INFO contract.ContractTestUtils (ContractTestUtils.java:end(1847)) - Duration of listing 1000 files using listFiles() api with batch size of 10 including 10ms of processing time for each file: 12,039,952,465 nS 2020-08-10 15:04:52,170 [JUnit-testMultiPagesListingPerformanceAndCorrectness] INFO contract.ContractTestUtils (ContractTestUtils.java:end(1847)) - Duration of listing 1000 files using listStatus() api with batch size of 10 including 10ms of processing time for each file: 16,088,964,963 nS

We can see an improvement of 4s with these configs.
Result when the same test is run in trunk having sync listing.

2020-08-10 15:10:03,815 [JUnit-testMultiPagesListingPerformanceAndCorrectness] INFO contract.ContractTestUtils (ContractTestUtils.java:end(1847)) - Duration of listing 1000 files using listFiles() api with batch size of 10 including 10ms of processing time for each file: 16,722,638,860 nS 2020-08-10 15:10:20,293 [JUnit-testMultiPagesListingPerformanceAndCorrectness] INFO contract.ContractTestUtils (ContractTestUtils.java:end(1847)) - Duration of listing 1000 files using listStatus() api with batch size of 10 including 10ms of processing time for each file: 16,364,577,964 nS

It is evident from the logs that without the improvements, listStatus and listFiles took same time.

CC @steveloughran @mehakmeet @bgaborg

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When my IOStatistics patch goes in the stats could be logged here

Overall, I like this. Tangible benefits once you start doing some milliseconds of work per file. You were testing remotely, correct? So list time may be pessimistic. But then again, versioned buckets under load with many tombstones may be worse.

Test wise -Use WriteOperationHelper via getWriteOperationHelper(), no need to make something else visible,

Production code: I'd rather the new async submit code when into ListOperationCallbacks. That is: no new methods in S3AFilesystem, just the ListOperationsCallbacksImpl taking on more of the work. This stops the FS itself growing.

@@ -1956,6 +1956,14 @@ protected S3ListResult listObjects(S3ListRequest request) throws IOException {
}
}

protected CompletableFuture<S3ListResult> listObjectsAsync(S3ListRequest request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make private unless someone needs to get at these in mockito tests

@mukund-thakur
Copy link
Contributor Author

Fixed all review comments and re ran the new test. All good.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 33m 37s trunk passed
+1 💚 compile 0m 40s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 compile 0m 34s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 checkstyle 0m 26s trunk passed
+1 💚 mvnsite 0m 43s trunk passed
+1 💚 shadedclient 15m 4s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 20s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 27s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗 spotbugs 1m 5s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 4s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 35s the patch passed
+1 💚 compile 0m 35s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javac 0m 35s the patch passed
+1 💚 compile 0m 27s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 javac 0m 27s the patch passed
-0 ⚠️ checkstyle 0m 17s hadoop-tools/hadoop-aws: The patch generated 6 new + 12 unchanged - 0 fixed = 18 total (was 12)
+1 💚 mvnsite 0m 32s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 13m 38s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 19s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 findbugs 1m 10s the patch passed
_ Other Tests _
+1 💚 unit 1m 26s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
75m 39s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2207/2/artifact/out/Dockerfile
GITHUB PR #2207
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux b39672fe98f7 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / e592ec5
Default Java Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
checkstyle https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2207/2/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2207/2/testReport/
Max. process+thread count 419 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2207/2/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

LGTM. +1 pending the changes needed to get checkstyle to be (mostly) quiet

nice bit of work here.

@steveloughran
Copy link
Contributor

...LGTM, let's see what Yetus says

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 17s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 28s trunk passed
+1 💚 compile 0m 53s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 compile 0m 42s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 checkstyle 0m 30s trunk passed
+1 💚 mvnsite 0m 49s trunk passed
+1 💚 shadedclient 19m 27s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 20s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 30s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗 spotbugs 1m 17s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 15s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 37s the patch passed
+1 💚 compile 0m 39s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javac 0m 39s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 javac 0m 32s the patch passed
+1 💚 checkstyle 0m 21s the patch passed
+1 💚 mvnsite 0m 35s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 17m 44s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 19s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 28s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 findbugs 1m 27s the patch passed
_ Other Tests _
+1 💚 unit 1m 44s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
84m 53s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2207/3/artifact/out/Dockerfile
GITHUB PR #2207
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux fb536f212292 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 17cd8a1
Default Java Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2207/3/testReport/
Max. process+thread count 342 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2207/3/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran merged commit cc64153 into apache:trunk Aug 25, 2020
asfgit pushed a commit that referenced this pull request Aug 25, 2020
Contributed by Mukund Thakur.

Change-Id: I1b0574a0c9ebc0805f285dd5280a00e5add081f1
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
…e#2207)

Contributed by Mukund Thakur.

Change-Id: Iad9832ee75370a1ba289455c91ea0ef65f6a8286
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants