-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-13560 S3A to support huge file writes and operations -with tests #125
Closed
steveloughran
wants to merge
20
commits into
apache:branch-2
from
steveloughran:s3/HADOOP-13560-5GB-blobs
Closed
HADOOP-13560 S3A to support huge file writes and operations -with tests #125
steveloughran
wants to merge
20
commits into
apache:branch-2
from
steveloughran:s3/HADOOP-13560-5GB-blobs
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@@ -183,6 +199,8 @@ | |||
<include>**/ITestS3AFileSystemContract.java</include> | |||
<include>**/ITestS3AMiniYarnCluster.java</include> | |||
<include>**/ITest*Root*.java</include> | |||
<include>**/ITestS3AFileContextStatistics.java</include> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved this line down as it was failing sometimes
… serve up its statistics
…ing on inside S3A, including a gauge of active request counts. +more troubleshooting docs. The fast output stream will retry on errors
… are passing tests
block streaming is in, testing at moderate scale <100 MB. you can choose for buffer-by-ram (current fast uploader) or buffer by HDD; in a test using SSD & remote S3, I got ~1.38MB/s bandwidth, got something similar 1.44 on RAM. But: we shouldn't run out off heap on the HDD option. RAM buffering uses existing ByteArrays, to ease source code migration off FastUpload (which is still there, for now). * I do plan to add pooled ByteBuffers * Add metrics of total and ongoing upload, including tracking what quantity of the outstanding block data has actually been uploaded.
…ng the feature. Minor cleanups of code
-supercede the Fast output stream, -run tests, tune outcomes (especially race conditions in multipart operations)
* more debug statements * fixed name of fs.s3a.block.output option in core-default and docs. Thanks Rajesh! * more attempts at managing close() operation rigorously. No evidence this is the cause of the problem rajesh saw though. * rearranged layout of code in S3ADatablocks so associated classes are adjacent; * retry on multipart commit adding sleep statements between retries * new Progress log for logging progress @ debug level in s3a. Why? Because logging events every 8KB gets too chatty when debugging many-MB uploads. * gauges of active block uploads wired up.
steveloughran
force-pushed
the
s3/HADOOP-13560-5GB-blobs
branch
from
September 21, 2016 22:18
26f3c44
to
c342feb
Compare
shanthoosh
pushed a commit
to shanthoosh/hadoop
that referenced
this pull request
Oct 15, 2019
…rocessorId Refactoring LocalApplicationRunner s.t. each processor has its own listener instance, instead of a single listener keeping track of all processors. Author: Navina Ramesh <navina@apache.org> Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>, Xinyu Liu <xiliu@linkedin.com> Closes apache#125 from navina/SAMZA-1213
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds
Scale tests for S3A huge file support;
-configurable to bigger sizes in the auth-keys XML or in the build
-Dfs.s3a.scale.test.huge.filesize=1000
New scalable output stream for writing,
S3ABlockOutputStream
-always saves in incremental blocks as writes proceed, block size == partition size.
-supports Fast output stream memory buffer code (for regression testing)
-supports a back end which buffers blocks in files, using RR disk allocation. As such, write/read bandwidth is limited to aggregate HDD bandwidth.
-adding extra failure resilience as testing throws up failure conditions (network timeouts, no-response from server on multipart commit, etc).
-adding instrumentation, including using callbacks from AWS SDK to update gauges and counters (in progress)
What we have here is essentially something that can replace the classic "save to file, upload at the end" stream and the fast "store it all in RAM and hope there's space" stream. It should offer incremental upload for faster output of larger files compared the classic file stream, with the scaleability the fast one lacks. And the instrumentation to show what's happening.