-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-16132 Support multipart download in S3AFileSystem #645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
| */ | ||
| public static final boolean CHANGE_DETECT_REQUIRE_VERSION_DEFAULT = true; | ||
|
|
||
| public static final String MULTIPART_DOWNLOAD_ENABLED = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should add javadoc for each of these, and add to core-default.xml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some documentation should be added to hadoop-aws/index.md and hadoop-aws/performance.md about this capability.
|
|
||
| public static final String MULTIPART_DOWNLOAD_CHUNK_SIZE = | ||
| "fs.s3a.multipartdownload.chunk-size"; | ||
| public static final long DEFAULT_MULTIPART_DOWNLOAD_CHUNK_SIZE = 262_144; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the javadoc should explain the difference between part size and chunk size, also why a new configuration property is needed instead of reusing fs.s3a.multipart.size
|
|
||
| public static final String MULTIPART_DOWNLOAD_NUM_THREADS = | ||
| "fs.s3a.multipartdownload.num-threads"; | ||
| public static final int DEFAULT_MULTIPART_DOWNLOAD_NUM_THREADS = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not reuse fs.s3a.threads.max (and ideally, the existing thread pool corresponding to that config)?
| String text = String.format("%s %s at %d", | ||
| operation, uri, rangeStart); | ||
|
|
||
| S3Object object = Invoker.once(text, "s3a://" + key + "/" + bucket, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the key and bucket need to be switched here? Also, use requestBucket instead of bucket?
| */ | ||
| public abstract class AbortableInputStream extends InputStream { | ||
|
|
||
| public abstract void abort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add method javadoc
|
|
||
| private final S3ObjectInputStream s3ObjectInputStream; | ||
|
|
||
| public AbortableS3ObjectInputStream(S3ObjectInputStream s3ObjectInputStream) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add constructor javadoc
|
|
||
| public static final String MULTIPART_DOWNLOAD_BUFFER_SIZE = | ||
| "fs.s3a.multipartdownload.buffer-size"; | ||
| public static final long DEFAULT_MULTIPART_DOWNLOAD_BUFFER_SIZE = 20_000_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you determine the optimal value for this? Javadoc for it should specify that the size is "(in bytes)".
|
@justinuang are you still working on this? |
|
Sorry, I am not anymore. We can close it.
…On Wed, Jul 10, 2019 at 5:09 PM bolkedebruin ***@***.***> wrote:
@justinuang <https://github.com/justinuang> are you still working on this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#645?email_source=notifications&email_token=AACHNJ4Z7ZI4II47SIUXCKTP6ZFWXA5CNFSM4HBQ2PH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZUXYEY#issuecomment-510229523>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACHNJ7OTVPX3IWWXFAODSDP6ZFWXANCNFSM4HBQ2PHQ>
.
|
… instead of run-time objects As per subject, changed caching table descriptor to take table descriptor instead of run-time objects - Added BaseHybridTableDescriptor, which models a hybrid table that may contain other tables - Modified StreamApplicationDescriptorImpl to also include tables contained within a hybrid table Author: Wei Song <wsong@linkedin.com> Reviewers: Jagadish Venkatraman <jvenkatraman@linkedin.com> Closes apache#645 from weisong44/SAMZA-1854 and squashes the following commits: 2c0d1362 [Wei Song] Updated based on review comments dd18bbee [Wei Song] Merge branch 'master' into SAMZA-1854 a6c94ad [Wei Song] Merge remote-tracking branch 'upstream/master' 41299b5 [Wei Song] Merge remote-tracking branch 'upstream/master' 239a095 [Wei Song] Merge remote-tracking branch 'upstream/master' a87a9b04 [Wei Song] SAMZA-1854: Changed caching table descriptor to take table descriptor instead of run-time objects eca0020 [Wei Song] Merge remote-tracking branch 'upstream/master' 5156239 [Wei Song] Merge remote-tracking branch 'upstream/master' de708f5 [Wei Song] Merge remote-tracking branch 'upstream/master' df2f8d7 [Wei Song] Merge remote-tracking branch 'upstream/master' f28b491 [Wei Song] Merge remote-tracking branch 'upstream/master' 4782c61 [Wei Song] Merge remote-tracking branch 'upstream/master' 0440f75 [Wei Song] Merge remote-tracking branch 'upstream/master' aae0f38 [Wei Song] Merge remote-tracking branch 'upstream/master' a15a7c9 [Wei Song] Merge remote-tracking branch 'upstream/master' 5cbf9af [Wei Song] Merge remote-tracking branch 'upstream/master' 3f7ed71 [Wei Song] Added self to committer list
No description provided.