-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18915. Tune/extend S3A http connection and thread pool settings #6180
HADOOP-18915. Tune/extend S3A http connection and thread pool settings #6180
Conversation
* Get duration. This may be negative; callers must check. | ||
* If the config option is greater than {@code Integer.MAX_VALUE} milliseconds, | ||
* it is set to that max. | ||
* Logs the value for diagnostics. | ||
* @param conf config | ||
* @param name option name | ||
* @param defVal default value | ||
* @param defaultUnit unit of default value | ||
* @return duration. may be negative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Get duration
-> Get duration object representing value in milliseconds
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a java Duration class, so no need to specify unit
* @param defaultUnit unit of default value | ||
* @return duration. may be negative. | ||
*/ | ||
private static Duration getDuration(final Configuration conf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: getDurationInMillis
or getDurationWithMillis
might be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope
3009d67
to
c54eead
Compare
Default time unit for all values is milliseconds Change-Id: Ia0ad86c123b4bd6abcf0b849a7b7a05ece47838c
* Moves to Durations everywhere * Also sets acquisition timeouts * Has a minimum duration to help address confusion between seconds and millis * Coalesces resolution across both clients Change-Id: I61dd41ff9e23a76818c900ddadbfe8ab6cb9cb0c
* Cuts requirement for org.apache.http to be on classpath just to build a URL. * new AWSApiCallTimeoutException to be raised on sdk-related methods, and also gateway timeouts (sdk things this is retryable, see) * new ConfigurationHelper class to help get/set times in configs, and anything else in future (e.g. successor to that bit of S3AUtils) * There's now a minimum duration for operations; this can be explicitly changed in code to allow for tests to fail fast. * ITestConnectionTimeouts to fail fast * Bumped up pool size to 200 * And fixed where the create performance option was inadvertently true. Change-Id: I9fee99c17d241dc4cfc606af5f3a71339e0f14c9
c54eead
to
2bd33e5
Compare
* Add unit test for the default evaluation * Try to come up with good defaults * Update performance doc with settings and a discussion. Change-Id: Ia044af001dca82164ff8b03e512379f54b0b8220
* Use duration strings consistently in s3a settings * Update core-default.xml * performance docs are where all thread and network settings are defined Change-Id: I5cb12219bc9771ad43865755a27a3b81d3fec46e
And switch to these, marking the (now derived) int/long millis/seconds as deprecated. Change-Id: Iaecf01e9d98432ab2ca5210a1b971de929796033
62267a1
to
4bcf27f
Compare
tested: s3 london with took a while to work out why my connection failure test worked standalone and failed in bulk runs, as even after disabling fs instance reuse it still occurred. It was actually prefetching, as this does return connections to the pool! @ahmarsuhail can you look at this? The core change is straightforward:
|
💔 -1 overall
This message was automatically generated. |
Change-Id: I69d74afc14ab021b3490e7b316625d2ce6101cd7
Add test to verify that SDK timeouts are mapped immediately to AWSApiCallTimeoutException (without even having to look internally for a connection timeout), and that these are retried. Tune timeout test so creation of test file doesn't use the brittle FS, as that timed out for me on one run in the IDE. Updated its javadocs. Change-Id: Id56ea4c6a88f1c702739b5b3ff5021c42398dca4
Using Duration throughout the codebase; retaining the older numeric constants in Constants.java but deprecated and not used in s3a main or test code. Change-Id: Ic989e057a425991501dfe20935e4fa1f97bf53d0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, some minor suggestions.
|
||
/** | ||
* Default value for {@link #MAXIMUM_CONNECTIONS}: {@value}. | ||
* Note this value gets increased over time as more connections are used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make this a bit clearer. Currently to me it reads like "connection pool size will increase dynamically", but I guess what we mean is this size is increasing as we've added prefetching and vectoredIO and do more parallel ops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"future releases are likely to increase this default value"
@@ -183,6 +185,12 @@ public static IOException translateException(@Nullable String operation, | |||
// call considered an sign of connectivity failure | |||
return (EOFException)new EOFException(message).initCause(exception); | |||
} | |||
if (exception instanceof ApiCallTimeoutException | |||
|| exception instanceof ApiCallAttemptTimeoutException) { | |||
// An API call to an AWS service timed out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APICallTimeout is timeout of the API call, including retries,
APICallAttemptTimeout is the timeout of the individual HTTP request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding comments around this and that APICallTimeout
is the one we should expect
LOG.debug("Request timeout is too high({} ms). Setting to {} ms instead", | ||
requestTimeoutMillis, Integer.MAX_VALUE); | ||
requestTimeoutMillis = Integer.MAX_VALUE; | ||
if (callTimeout.toMillis() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that apiCallTimeout is the total time allowed for the API call, including retries. and apiCallAttemptTimeout is the timeout for an individual HTTP request. I think apiCallTimeout should be:
number of SDK retries * callTimeout. will confirm with SDK team
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we've cut back on the api timeouts as it was retrying on unrecoverable problems like UnknownHostException; better to handle ourselves. The only pain point here is the xfer manager, which isn't doing that handling
* @param conf configuration to evaluate | ||
* @return connection settings. | ||
*/ | ||
static ConnectionSettings createApiConnectionSettings(Configuration conf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's only one connection setting config on the client, and the rest are for the HTTP builders. Can we simplify this by keeping apiCallTimeout out of ConnectionSettings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've kept in in one place in case they ever need adding. I can add a new ClientConnectionSettings struct there (which would be so much easier in java17)
* All the connection settings, wrapped as a class for use by | ||
* both the sync and async clients, and connection client builder. | ||
*/ | ||
static class ConnectionSettings { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this constructor is getting quite large, maybe we can use a builder here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, its a static struct and a builder is simply a java-lang workaround for the lack of named parameters like scala and python or default arguments like C++. this is just a 7-tuple
| Property | Default | Meaning | | ||
|--------------------------------|---------|------------------------------------------| | ||
| `fs.s3a.threads.max` | `96` | Threads in the thread pool | | ||
| `fs.s3a.threads.keepalivetime` | `60s` | Threads in the thread pool | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in the meaning for keepalivetime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually i should cut there
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md
Outdated
Show resolved
Hide resolved
| Property | Default | V2 | Meaning | | ||
|-----------------------------------------|---------|:----|-------------------------------------------------------| | ||
| `fs.s3a.threads.max` | `96` | | Threads in the thread pool | | ||
| `fs.s3a.threads.keepalivetime` | `60s` | | Threads in the thread pool | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also typo here
| `fs.s3a.threads.max` | `96` | | Threads in the thread pool | | ||
| `fs.s3a.threads.keepalivetime` | `60s` | | Threads in the thread pool | | ||
| `fs.s3a.executor.capacity` | `16` | | Maximum threads for any single operation | | ||
| `fs.s3a.max.total.tasks` | `16` | | Maximum threads for any single operation | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in description for total tasks
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java
Show resolved
Hide resolved
@@ -87,7 +87,7 @@ public void testEnforceMinDuration() { | |||
.describedAs("10s") | |||
.isEqualTo(s10); | |||
|
|||
// and a null check | |||
// and a null checkNo it's kind of like get out the wayI am |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, dictation was on...
🎊 +1 overall
This message was automatically generated. |
Added a new 1-tuple for client connection settings. Change-Id: I91008d284f520f63322215155ff3942cc4d5aa7d
...tried to address the review comments. Tested: google cloud london Failure which I'm going to assume is unrelated and leave for later regression testing.
|
+delete cost tests are wrong, presumably because fs binding is single object delete only. would be interesting to think about having the delete cost test skip bulk delete profile if target fs is set up for single object delete only.
|
💔 -1 overall
This message was automatically generated. |
Change-Id: I7e8d42d569676d20ba8b807d4fc2930bff2345b9
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
This PR is good to go; checkstyle is about #of args in the value class constructor as we can't switch to java Records without making the leap to java17. |
@virajjasani @ahmarsuhail can I get a final review of this. We do need it for the v2 sdk as it now measures pool acquisition time (good) but times out aggressively if it takes too long (less good). Makes debugging harder too: for my local test setup I'll probably have some longer values, albeit with zero retry attempts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
Minor suggestion to improve javadocs - Zero default value of timeout means no timeouts (I think)
@@ -345,14 +344,14 @@ private Constants() { | |||
"fs.s3a.connection.request.timeout"; | |||
|
|||
/** | |||
* Default duration of a request before it is timed out: {@value}. | |||
* Default duration of a request before it is timed out: Zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we mention that zero means no timeouts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its mentioned in the comment on the property name
apache#6180) Increases existing pool sizes, as with server scale and vector IO, larger pools are needed fs.s3a.connection.maximum 200 fs.s3a.threads.max 96 Adds new configuration options for v2 sdk internal timeouts, both with default of 60s: fs.s3a.connection.acquisition.timeout fs.s3a.connection.idle.time All the pool/timoeut options are covered in performance.md Moves all timeout/duration options in the s3a FS to taking temporal units (h, m, s, ms,...); retaining the previous default unit (normally millisecond) Adds a minimum duration for most of these, in order to recover from deployments where a timeout has been set on the assumption the unit was seconds, not millis. Uses java.time.Duration throughout the codebase; retaining the older numeric constants in org.apache.hadoop.fs.s3a.Constants for backwards compatibility; these are now deprecated. Adds new class AWSApiCallTimeoutException to be raised on sdk-related methods and also gateway timeouts. This is a subclass of org.apache.hadoop.net.ConnectTimeoutException to support existing retry logic. + reverted default value of fs.s3a.create.performance to false; inadvertently set to true during testing. Contributed by Steve Loughran.
apache#6180) Increases existing pool sizes, as with server scale and vector IO, larger pools are needed fs.s3a.connection.maximum 200 fs.s3a.threads.max 96 Adds new configuration options for v2 sdk internal timeouts, both with default of 60s: fs.s3a.connection.acquisition.timeout fs.s3a.connection.idle.time All the pool/timoeut options are covered in performance.md Moves all timeout/duration options in the s3a FS to taking temporal units (h, m, s, ms,...); retaining the previous default unit (normally millisecond) Adds a minimum duration for most of these, in order to recover from deployments where a timeout has been set on the assumption the unit was seconds, not millis. Uses java.time.Duration throughout the codebase; retaining the older numeric constants in org.apache.hadoop.fs.s3a.Constants for backwards compatibility; these are now deprecated. Adds new class AWSApiCallTimeoutException to be raised on sdk-related methods and also gateway timeouts. This is a subclass of org.apache.hadoop.net.ConnectTimeoutException to support existing retry logic. + reverted default value of fs.s3a.create.performance to false; inadvertently set to true during testing. Contributed by Steve Loughran.
apache#6180) Increases existing pool sizes, as with server scale and vector IO, larger pools are needed fs.s3a.connection.maximum 200 fs.s3a.threads.max 96 Adds new configuration options for v2 sdk internal timeouts, both with default of 60s: fs.s3a.connection.acquisition.timeout fs.s3a.connection.idle.time All the pool/timoeut options are covered in performance.md Moves all timeout/duration options in the s3a FS to taking temporal units (h, m, s, ms,...); retaining the previous default unit (normally millisecond) Adds a minimum duration for most of these, in order to recover from deployments where a timeout has been set on the assumption the unit was seconds, not millis. Uses java.time.Duration throughout the codebase; retaining the older numeric constants in org.apache.hadoop.fs.s3a.Constants for backwards compatibility; these are now deprecated. Adds new class AWSApiCallTimeoutException to be raised on sdk-related methods and also gateway timeouts. This is a subclass of org.apache.hadoop.net.ConnectTimeoutException to support existing retry logic. + reverted default value of fs.s3a.create.performance to false; inadvertently set to true during testing. Contributed by Steve Loughran.
apache#6180) Increases existing pool sizes, as with server scale and vector IO, larger pools are needed fs.s3a.connection.maximum 200 fs.s3a.threads.max 96 Adds new configuration options for v2 sdk internal timeouts, both with default of 60s: fs.s3a.connection.acquisition.timeout fs.s3a.connection.idle.time All the pool/timoeut options are covered in performance.md Moves all timeout/duration options in the s3a FS to taking temporal units (h, m, s, ms,...); retaining the previous default unit (normally millisecond) Adds a minimum duration for most of these, in order to recover from deployments where a timeout has been set on the assumption the unit was seconds, not millis. Uses java.time.Duration throughout the codebase; retaining the older numeric constants in org.apache.hadoop.fs.s3a.Constants for backwards compatibility; these are now deprecated. Adds new class AWSApiCallTimeoutException to be raised on sdk-related methods and also gateway timeouts. This is a subclass of org.apache.hadoop.net.ConnectTimeoutException to support existing retry logic. + reverted default value of fs.s3a.create.performance to false; inadvertently set to true during testing. Contributed by Steve Loughran.
How was this patch tested?
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?