Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18106: Handle memory fragmentation in S3A Vectored IO #4427

Conversation

mukund-thakur
Copy link
Contributor

@mukund-thakur mukund-thakur commented Jun 10, 2022

…tation.

part of HADOOP-18103.
Handling memoroy fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer is called.

Description of PR

How was this patch tested?

Added new tests, ran all existing tests.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

part of HADOOP-18103.
Add support for multiple ranged vectored read api in PositionedReadable.
The default iterates through the ranges to read each synchronously,
but the intent is that FSDataInputStream subclasses can make more
efficient readers especially in object stores implementation.

Also added implementation in S3A where smaller ranges are merged and
sliced byte buffers are returned to the readers. All the merged ranged are
fetched from S3 asynchronously.

Contributed By: Owen O'Malley and Mukund Thakur
… maxReadSizeForVectorReads (apache#3964)

Part of HADOOP-18103.
Introducing fs.s3a.vectored.read.min.seek.size and fs.s3a.vectored.read.max.merged.size
to configure min seek and max read during a vectored IO operation in S3A connector.
These properties actually define how the ranges will be merged. To completely
disable merging set fs.s3a.max.readsize.vectored.read to 0.

Contributed By: Mukund Thakur
…che#4273)

* HADOOP-18107 Adding scale test for vectored reads for large file

part of HADOOP-18103.
part of HADOOP-18103.
Required for vectored IO feature. None of current buffer pool
implementation is complete. ElasticByteBufferPool doesn't use
weak references and could lead to memory leak errors and
DirectBufferPool doesn't support caller preferences of direct
and heap buffers and has only fixed length buffer implementation.

Contributed By: Mukund Thakur
…tation.

part of HADOOP-18103.
Handling memoroy fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer is called.
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 55s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 8 new or modified test files.
_ feature-vectored-io Compile Tests _
+0 🆗 mvndep 15m 51s Maven dependency ordering for branch
+1 💚 mvninstall 25m 19s feature-vectored-io passed
+1 💚 compile 23m 20s feature-vectored-io passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 20m 36s feature-vectored-io passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 4m 22s feature-vectored-io passed
+1 💚 mvnsite 4m 55s feature-vectored-io passed
-1 ❌ javadoc 1m 52s /branch-javadoc-hadoop-common-project_hadoop-common-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt hadoop-common in feature-vectored-io failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.
+1 💚 javadoc 4m 49s feature-vectored-io passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 6m 29s feature-vectored-io passed
+1 💚 shadedclient 21m 50s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 22m 22s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 30s Maven dependency ordering for patch
+1 💚 mvninstall 2m 30s the patch passed
+1 💚 compile 22m 35s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 22m 35s root-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 generated 0 new + 1815 unchanged - 2 fixed = 1815 total (was 1817)
+1 💚 compile 20m 29s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 20m 29s root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu120.04-b07 with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu120.04-b07 generated 0 new + 1689 unchanged - 2 fixed = 1689 total (was 1691)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 19s /results-checkstyle-root.txt root: The patch generated 17 new + 80 unchanged - 2 fixed = 97 total (was 82)
+1 💚 mvnsite 5m 1s the patch passed
-1 ❌ javadoc 1m 43s /patch-javadoc-hadoop-common-project_hadoop-common-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt hadoop-common in the patch failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.
+1 💚 javadoc 4m 47s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 6m 50s the patch passed
+1 💚 shadedclient 22m 4s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 38s hadoop-common in the patch passed.
+1 💚 unit 3m 26s hadoop-aws in the patch passed.
+1 💚 unit 1m 16s hadoop-benchmark in the patch passed.
+1 💚 asflicense 1m 37s The patch does not generate ASF License warnings.
257m 11s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4427/2/artifact/out/Dockerfile
GITHUB PR #4427
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 0b98eff8abb1 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision feature-vectored-io / 28963f4
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4427/2/testReport/
Max. process+thread count 2246 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws hadoop-tools/hadoop-benchmark U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4427/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Jun 14, 2022
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; some minor changes. most important one is making sure

I've realised that its possible to call

stream.readVectored(...)
unbuffer()
stream.readVectored(...)

and the first set of reads may never noticed that unbuffer happened. Do we care? I don't believe so.

however, the javadocs for the unbuffer API should be changed to say

active vector reads must be signalled to stop, and no new queued
reads initiated. no expectation of blocking to await the outcome of > these

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 55s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 7 new or modified test files.
_ feature-vectored-io Compile Tests _
+0 🆗 mvndep 14m 54s Maven dependency ordering for branch
+1 💚 mvninstall 30m 40s feature-vectored-io passed
+1 💚 compile 24m 40s feature-vectored-io passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 20m 45s feature-vectored-io passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 4m 24s feature-vectored-io passed
+1 💚 mvnsite 4m 59s feature-vectored-io passed
-1 ❌ javadoc 1m 51s /branch-javadoc-hadoop-common-project_hadoop-common-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt hadoop-common in feature-vectored-io failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.
+1 💚 javadoc 4m 46s feature-vectored-io passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 6m 29s feature-vectored-io passed
+1 💚 shadedclient 21m 58s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 22m 31s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 28s Maven dependency ordering for patch
+1 💚 mvninstall 2m 30s the patch passed
+1 💚 compile 22m 22s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 22m 22s root-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 generated 0 new + 1815 unchanged - 2 fixed = 1815 total (was 1817)
+1 💚 compile 20m 42s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 20m 42s root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu120.04-b07 with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu120.04-b07 generated 0 new + 1689 unchanged - 2 fixed = 1689 total (was 1691)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 14s /results-checkstyle-root.txt root: The patch generated 6 new + 72 unchanged - 0 fixed = 78 total (was 72)
+1 💚 mvnsite 5m 1s the patch passed
-1 ❌ javadoc 1m 44s /patch-javadoc-hadoop-common-project_hadoop-common-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt hadoop-common in the patch failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.
+1 💚 javadoc 4m 48s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 6m 49s the patch passed
+1 💚 shadedclient 22m 6s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 35s hadoop-common in the patch passed.
+1 💚 unit 3m 31s hadoop-aws in the patch passed.
+1 💚 unit 1m 15s hadoop-benchmark in the patch passed.
+1 💚 asflicense 1m 37s The patch does not generate ASF License warnings.
264m 2s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4427/3/artifact/out/Dockerfile
GITHUB PR #4427
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 4276af40f219 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision feature-vectored-io / dc5ed31
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4427/3/testReport/
Max. process+thread count 1280 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws hadoop-tools/hadoop-benchmark U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4427/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. good to go

@mukund-thakur
Copy link
Contributor Author

+1. good to go

Thanks @steveloughran . Although this one is the rebased patch #4445

@steveloughran
Copy link
Contributor

can you close this now

@mukund-thakur
Copy link
Contributor Author

rebased patch which got merged. #4445

@mukund-thakur mukund-thakur deleted the HADOOP-18106-vec-io-memory-fragmentation-latest branch July 11, 2022 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants