Add timeout feature for synchronous api calls #724

zoewangg · 2018-09-24T17:44:06Z

Description

Add timeout feature for synchronous api calls to allow customers to configure how long the api call or api call attempt should timeout if it didn't complete within the configured time.

~~It works the same way as v1.~~ Updated: the apiCallTimeout works the same way as v1, ApiCallAttemptTimeout, on the other hand, behaves differently compared with v1. It will not only abort the request but also interrupt the thread just like apiCallTimeout.

ApiCallTimeout: a timeout task is scheduled to a separate thread pool and it will interrupt and the current thread and abort the underlying http request when the total request time exceeds the configured timeout.
ApiCallAttemptTimeout: a timeout task is scheduled to a separate thread pool and it will interrupt the thread and abort the http request when times up.

NOTE:
ApiCallAttemptTimeout will not abort the request if the request is stuck in the ResponseTransformer because the timer task is being cancelled in MakeHttpRequestStage and ResponseTransformer is triggered after that. This is different from asynchronous timeout where the timer task is being cancelled upon the completion of the completable future.

Testing

Integration tests for timeouts are passed.
All integration tests are passed.

Screenshots (if appropriate)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Checklist

I have read the CONTRIBUTING document
Local run of mvn install succeeds
My code follows the code style of this project
My change requires a change to the Javadoc documentation
I have updated the Javadoc documentation accordingly
I have read the README document
I have added tests to cover my changes
All new and existing tests passed
A short description of the change has been added to the CHANGELOG

License

I confirm that this pull request can be released under the Apache 2 license

dagnir · 2018-09-24T18:57:28Z

...ore/src/main/java/software/amazon/awssdk/core/client/config/ClientOverrideConfiguration.java

@@ -299,9 +300,14 @@ default Builder retryPolicy(Consumer<RetryPolicy.Builder> retryPolicy) {
         * requests that don't get aborted until several seconds after the timer has been breached. Because of this, the client
         * execution timeout feature should not be used when absolute precision is needed.
         *
+         * <p>
+         * For synchronous streaming operations, customized implemenations of {@link ResponseTransformer} must handle interrupt


nit: just "implementations of".

dagnir · 2018-09-26T17:55:05Z

...sdk-core/src/main/java/software/amazon/awssdk/core/internal/http/timers/SyncTimeoutTask.java

+    }
+
+    SyncTimeoutTask(Thread threadToInterrupt) {
+        this.threadToInterrupt = Validate.paramNotNull(threadToInterrupt, "threadToInterrupt");


This check seems strange to me seeing as the default ctor sets it to null

Right, when using the default ctor, the thread is not required. It is intended to avoid null threadToInterrupt only when using this ctor. I've seen similar pattern in AwsBasicCredentials.

aws-sdk-java-v2/core/auth/src/main/java/software/amazon/awssdk/auth/credentials/AwsBasicCredentials.java

Lines 57 to 68 in 92f35e0

protected AwsBasicCredentials(String accessKeyId, String secretAccessKey) {

this(accessKeyId, secretAccessKey, true);

}

private AwsBasicCredentials(String accessKeyId, String secretAccessKey, boolean validateCredentials) {

this.accessKeyId = trimToNull(accessKeyId);

this.secretAccessKey = trimToNull(secretAccessKey);

if (validateCredentials) {

Validate.notNull(this.accessKeyId, "Access key ID cannot be blank.");

Validate.notNull(this.secretAccessKey, "Secret access key cannot be blank.");

}

I can probably create another private ctor adding boolean validateThread param to make it more clear.

I guess that sort of makes sense, but I don't think the AwsBasicCredentials case is the same since the validation behavior is controlled by another parameter. I just thought it was weird to bother checking since the class works whether or not the thread is null since there's always a null check before using it

dagnir · 2018-09-26T18:03:38Z

...tocol-tests/src/test/java/software/amazon/awssdk/protocol/tests/timeout/BaseTimeoutTest.java

+        timeoutExceptionAssertion().accept(() -> retryableCallable().call());
+    }
+
+    public static class SlowBytesResponseTransformer<ResponseT> implements ResponseTransformer<ResponseT, ResponseBytes<ResponseT>> {


Any reason to test with these different specific transformer types as a opposed to just some generic "slow" transformer? The tests that use them look identical otherwise

Yeah, those tests are created to make sure we have InterruptMonitor.checkInterrupted(); in each of the response transformer types to cover the scenarios where the request time exceeds the configured time right after entering our defined response transformer.

https://github.com/aws/aws-sdk-java-v2/pull/724/files#diff-66508e304899c9905bc9d659b856b50dR119

dagnir · 2018-09-26T19:53:31Z

...ain/java/software/amazon/awssdk/core/internal/http/pipeline/stages/MakeHttpRequestStage.java


-        return requestCallable.call();
+        try {
+            return requestCallable.call();


In V1, for non-streaming, if request execution timeout is enabled, we count time to read the full HTTP response towards the timeout (by buffering the entity first). Do we this in v2? If not should we?

We don't currently do this in V2, but I guess we can add a isBufferResponse in ExecuteRequest and pass it to the http clients. This seems a bit out of scope of this PR and I'd say creating another PR for this feature.

One of the goals of the new timers is to make the per attempt and per call timeouts behave more similiarly. So the per attempt timeout would not just be an abort on the HTTP request but a timer that interrupts as well and is scoped to the entirety of the things we do per request (i.e. unmarshalling for example). This would elimnate the need for buffering as we consume the data in unmarshalling which is subject to the timeout. Is this how the timeouts work (haven't looked at the PR)?

No, the current timeout implementation is the same as v1, so attempt timeout is just aborting the request. I like the idea of scoping it to the entire thing and will update the PR to see how it looks. One concern I have though is now that both call timemout and attempt timeout can interrupt the thread, handling interrupt exception can get tricky and we might get more issues with uncleared interrupt status.

Yeah that's always the tricky part of this feature. We also need to make sure the right exception is thrown. I.E. if api call attempt and api call timeouts trigger at the same time then api call timeout should be thrown as it's non-retryable.

dagnir · 2018-09-26T20:11:31Z

...ain/java/software/amazon/awssdk/core/internal/http/pipeline/stages/MakeHttpRequestStage.java

+        timeoutTracker.abortable(requestCallable);
+
+        context.apiCallAttemptTimeoutTracker(timeoutTracker);
+        context.apiCallTimeoutTracker().abortable(requestCallable);


Is this redundant with line 71?

This is intended. line 71 is setting the abortable for ApiCallAttemptTimeoutTracker whereas line 74 is setting the abortable for ApiCallTimeoutTracker

Ah okay I see it

dagnir · 2018-09-26T20:24:36Z

...tocol-tests/src/test/java/software/amazon/awssdk/protocol/tests/timeout/BaseTimeoutTest.java

+        }
+    }
+
+    public static void wastingTimeInterrupitably() throws InterruptedException {


nit: "interruptibly" mispelled

dagnir · 2018-09-26T20:28:51Z

core/sdk-core/src/main/java/software/amazon/awssdk/core/internal/http/timers/TimerUtils.java

+     * Schedule a {@link TimeoutTask} that aborts the task if not otherwise completed before the given timeout.
+     *
+     * @param timeoutExecutor the executor to execute the {@link TimeoutTask}
+     * @return a {@link TimeoutTracker}


Please add docs for the other params as well. It's hard to tell what isInterruptedThread is without looking at the source. I'd also suggest changing it to interruptCurrentThread

My bad. will fix.

dagnir · 2018-09-26T20:32:13Z

core/sdk-core/src/main/java/software/amazon/awssdk/core/sync/ResponseTransformer.java

- * InterruptedException} is thrown from a interruptible task, you should either re-interrupt the current thread or throw that
- * {@link InterruptedException} from the {@link #apply(Object, AbortableInputStream)} method. Failure to do these things may
- * prevent the SDK from stopping the request in a timely manner in the event the thread is interrupted externally.
+ * InterruptedException} is thrown from a interruptible task, you should throw that {@link InterruptedException} from the


Did we remove the ability for customers to just interrupt the thread without (re)throwing an InterruptedException?

If the customer just sets the interrupt flag, we are relying on InterruptMonitor.checkInterrupted(); to throw InterruptedException and it can only happen after transform method, so the timeout might be far off.

Not sure what you mean? Can we add a check for Thread.isInterrupted() after we call transform?

To me it's perfectly valid for customers to do this in their transform methods:

MyObject transform() { try { // something interruptible } catch (InterruptedException e) { log.error("Transform was interrupted", e); Thread.interrupt(); return null; } }

We have an explicit test for this behavior in the S3 module (though it's currently disabled): https://github.com/aws/aws-sdk-java-v2/blob/2.0.0-preview-12/services/s3/src/it/java/software/amazon/awssdk/services/s3/GetObjectFaultIntegrationTest.java#L108-L129

Right, we can add the check after we call transform, but the time when the request actually gets timed out might be far from accurate.

For example, say we set the request timeout to 1s and if the transform method takes more than 5s, the request only will throw TimeoutException after 5s when it goes to InterruptMonitor.checkInterrupted(), or even worse, if the request gets stuck in transform, TimeoutException will never get thrown.

Yeah, I can see it's valid for customers to just re-interrupt the thread, but my point is it's highly possible that they get TimeoutException long after the configured timeout.

But we already state in the doc that customers should throw an InterruptedException if their transform method gets interrupted, so I guess I don't see how that's too different (as far as timing accuracy) if instead of re-throwing the exception, they set the flag and return right away and on the SDK side we have a check for the flag as soon as transform() returns.

Yeah, we should be fine for most of the cases. I was thinking of one edge case where the customers just set the interrupted flag without re-throwing any exception and do some time-consuming task after it.

I will update the docs.

shorea · 2018-10-11T18:34:03Z

...a/software/amazon/awssdk/core/internal/http/pipeline/stages/ApiCallTimeoutTrackingStage.java

+     */
+    private RuntimeException handleInterruptedException(RequestExecutionContext context, InterruptedException e) {
+        if (e instanceof SdkInterruptedException) {
+            ((SdkInterruptedException) e).getResponseStream().ifPresent(r -> invokeSafely(r::close));


I wonder if it would be better to set the current response in RequestExecutionContext and close it on any exception.

We are actually closing inputStream in HandleResponseStage.
This is to close the response stream if the thread gets interrupted before that line.

aws-sdk-java-v2/core/sdk-core/src/main/java/software/amazon/awssdk/core/internal/http/pipeline/stages/HandleResponseStage.java

Line 63 in 28a5f48

closeInputStreamIfNeeded(httpResponse, didRequestFail);

dagnir · 2018-10-11T20:00:54Z

...k-core/src/test/java/software/amazon/awssdk/core/internal/util/ResponseHandlerTestUtils.java

+import software.amazon.awssdk.core.exception.SdkServiceException;
+import software.amazon.awssdk.core.http.HttpResponseHandler;
+
+public class ResponseHandlerTestUtils {


nit final

dagnir · 2018-10-11T23:12:27Z

core/sdk-core/src/main/java/software/amazon/awssdk/core/sync/ResponseTransformer.java

@@ -81,10 +83,14 @@
     */
    default ReturnT apply(ResponseT response, AbortableInputStream inputStream) throws Exception {
        try {
-            return transform(response, inputStream);
-        } catch (RetryableException e) {
+            InterruptMonitor.checkInterrupted();


Any reason we do this here instead instead of in

aws-sdk-java-v2/core/sdk-core/src/main/java/software/amazon/awssdk/core/client/handler/BaseSyncClientHandler.java

Line 124 in 716acc5

return responseTransformer.apply(resp, response.content().get());

? My concern is if this gets overridden then we lose this behavior

Yeah, same as exception handling logic here. I think we should probably move the whole thing to the SyncClientHandler

dagnir · 2018-10-11T23:15:29Z

...sdk-core/src/main/java/software/amazon/awssdk/core/internal/http/timers/SyncTimeoutTask.java

+    public void run() {
+        log.debug(() -> "Timing out, aborting the task");
+        hasExecuted = true;
+        if (!threadToInterrupt.isInterrupted()) {


Is this check necessary?

Nope, will remove

dagnir

…637064dd Pull request: release <- staging/e795d56a-1e26-460a-848f-bacb637064dd

zoewangg requested a review from dagnir September 24, 2018 17:44

dagnir reviewed Sep 26, 2018

View reviewed changes

shorea reviewed Oct 11, 2018

View reviewed changes

dagnir reviewed Oct 11, 2018

View reviewed changes

dagnir approved these changes Oct 15, 2018

View reviewed changes

zoewangg force-pushed the zoewang-TimeoutForSyncPath branch 2 times, most recently from f2aca09 to a62a2c5 Compare October 15, 2018 18:10

Add apiCallTimeout and apiCallAttemptTimeout for synchronous operations

4ab4f25

zoewangg force-pushed the zoewang-TimeoutForSyncPath branch from a62a2c5 to 4ab4f25 Compare October 15, 2018 18:42

zoewangg merged commit 01247f3 into master Oct 15, 2018

zoewangg deleted the zoewang-TimeoutForSyncPath branch October 15, 2018 19:05

zoewangg mentioned this pull request Nov 27, 2018

Execution and Request timeouts #29

Closed

aws-sdk-java-automation added a commit that referenced this pull request Jan 24, 2020

Merge pull request #724 from aws/staging/e795d56a-1e26-460a-848f-bacb…

2b0dfb8

…637064dd Pull request: release <- staging/e795d56a-1e26-460a-848f-bacb637064dd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timeout feature for synchronous api calls #724

Add timeout feature for synchronous api calls #724

zoewangg commented Sep 24, 2018 •

edited

Loading

dagnir Sep 24, 2018

dagnir Sep 26, 2018

zoewangg Sep 29, 2018

dagnir Oct 1, 2018

dagnir Sep 26, 2018

zoewangg Sep 29, 2018

dagnir Sep 26, 2018

zoewangg Oct 8, 2018

shorea Oct 8, 2018

zoewangg Oct 8, 2018

shorea Oct 9, 2018

dagnir Sep 26, 2018

zoewangg Sep 29, 2018

dagnir Oct 1, 2018

dagnir Sep 26, 2018

dagnir Sep 26, 2018

zoewangg Oct 1, 2018

dagnir Sep 26, 2018

zoewangg Oct 1, 2018

dagnir Oct 1, 2018

zoewangg Oct 1, 2018

dagnir Oct 1, 2018

zoewangg Oct 1, 2018

shorea Oct 11, 2018

zoewangg Oct 12, 2018

dagnir Oct 11, 2018

dagnir Oct 11, 2018

zoewangg Oct 12, 2018

dagnir Oct 11, 2018

zoewangg Oct 12, 2018

dagnir left a comment

	protected AwsBasicCredentials(String accessKeyId, String secretAccessKey) {
	this(accessKeyId, secretAccessKey, true);
	}

	private AwsBasicCredentials(String accessKeyId, String secretAccessKey, boolean validateCredentials) {
	this.accessKeyId = trimToNull(accessKeyId);
	this.secretAccessKey = trimToNull(secretAccessKey);

	if (validateCredentials) {
	Validate.notNull(this.accessKeyId, "Access key ID cannot be blank.");
	Validate.notNull(this.secretAccessKey, "Secret access key cannot be blank.");
	}

Add timeout feature for synchronous api calls #724

Add timeout feature for synchronous api calls #724

Conversation

zoewangg commented Sep 24, 2018 • edited Loading

Description

Testing

Screenshots (if appropriate)

Types of changes

Checklist

License

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dagnir left a comment

Choose a reason for hiding this comment

zoewangg commented Sep 24, 2018 •

edited

Loading