New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18106: Handle memory fragmentation in S3A Vectored IO #4427
HADOOP-18106: Handle memory fragmentation in S3A Vectored IO #4427
Conversation
part of HADOOP-18103. Add support for multiple ranged vectored read api in PositionedReadable. The default iterates through the ranges to read each synchronously, but the intent is that FSDataInputStream subclasses can make more efficient readers especially in object stores implementation. Also added implementation in S3A where smaller ranges are merged and sliced byte buffers are returned to the readers. All the merged ranged are fetched from S3 asynchronously. Contributed By: Owen O'Malley and Mukund Thakur
… maxReadSizeForVectorReads (apache#3964) Part of HADOOP-18103. Introducing fs.s3a.vectored.read.min.seek.size and fs.s3a.vectored.read.max.merged.size to configure min seek and max read during a vectored IO operation in S3A connector. These properties actually define how the ranges will be merged. To completely disable merging set fs.s3a.max.readsize.vectored.read to 0. Contributed By: Mukund Thakur
…che#4273) * HADOOP-18107 Adding scale test for vectored reads for large file part of HADOOP-18103.
part of HADOOP-18103. Required for vectored IO feature. None of current buffer pool implementation is complete. ElasticByteBufferPool doesn't use weak references and could lead to memory leak errors and DirectBufferPool doesn't support caller preferences of direct and heap buffers and has only fixed length buffer implementation. Contributed By: Mukund Thakur
…tation. part of HADOOP-18103. Handling memoroy fragmentation in S3A vectored IO implementation by allocating smaller user range requested size buffers and directly filling them from the remote S3 stream and skipping undesired data in between ranges. This patch also adds aborting active vectored reads when stream is closed or unbuffer is called.
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; some minor changes. most important one is making sure
I've realised that its possible to call
stream.readVectored(...)
unbuffer()
stream.readVectored(...)
and the first set of reads may never noticed that unbuffer happened. Do we care? I don't believe so.
however, the javadocs for the unbuffer API should be changed to say
active vector reads must be signalled to stop, and no new queued
reads initiated. no expectation of blocking to await the outcome of > these
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileRange.java
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/StreamCapabilities.java
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/VectoredReadUtils.java
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/VectoredReadUtils.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
Show resolved
Hide resolved
...hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractVectoredRead.java
Outdated
Show resolved
Hide resolved
...hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractVectoredRead.java
Show resolved
Hide resolved
...tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/AbstractSTestS3AHugeFiles.java
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FileRangeImpl.java
Show resolved
Hide resolved
2b5d2f0
to
7668f8d
Compare
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. good to go
Thanks @steveloughran . Although this one is the rebased patch #4445 |
dac8c48
to
645bcd6
Compare
can you close this now |
rebased patch which got merged. #4445 |
…tation.
part of HADOOP-18103.
Handling memoroy fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer is called.
Description of PR
How was this patch tested?
Added new tests, ran all existing tests.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?