Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout feature for synchronous api calls #724

Merged
merged 1 commit into from
Oct 15, 2018

Conversation

zoewangg
Copy link
Contributor

@zoewangg zoewangg commented Sep 24, 2018

Description

Add timeout feature for synchronous api calls to allow customers to configure how long the api call or api call attempt should timeout if it didn't complete within the configured time.

It works the same way as v1. Updated: the apiCallTimeout works the same way as v1, ApiCallAttemptTimeout, on the other hand, behaves differently compared with v1. It will not only abort the request but also interrupt the thread just like apiCallTimeout.

  • ApiCallTimeout: a timeout task is scheduled to a separate thread pool and it will interrupt and the current thread and abort the underlying http request when the total request time exceeds the configured timeout.
  • ApiCallAttemptTimeout: a timeout task is scheduled to a separate thread pool and it will interrupt the thread and abort the http request when times up.

NOTE:
ApiCallAttemptTimeout will not abort the request if the request is stuck in the ResponseTransformer because the timer task is being cancelled in MakeHttpRequestStage and ResponseTransformer is triggered after that. This is different from asynchronous timeout where the timer task is being cancelled upon the completion of the completable future.

Testing

Integration tests for timeouts are passed.
All integration tests are passed.

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read the CONTRIBUTING document
  • Local run of mvn install succeeds
  • My code follows the code style of this project
  • My change requires a change to the Javadoc documentation
  • I have updated the Javadoc documentation accordingly
  • I have read the README document
  • I have added tests to cover my changes
  • All new and existing tests passed
  • A short description of the change has been added to the CHANGELOG

License

  • I confirm that this pull request can be released under the Apache 2 license

@@ -299,9 +300,14 @@ default Builder retryPolicy(Consumer<RetryPolicy.Builder> retryPolicy) {
* requests that don't get aborted until several seconds after the timer has been breached. Because of this, the client
* execution timeout feature should not be used when absolute precision is needed.
*
* <p>
* For synchronous streaming operations, customized implemenations of {@link ResponseTransformer} must handle interrupt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just "implementations of".

}

SyncTimeoutTask(Thread threadToInterrupt) {
this.threadToInterrupt = Validate.paramNotNull(threadToInterrupt, "threadToInterrupt");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check seems strange to me seeing as the default ctor sets it to null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, when using the default ctor, the thread is not required. It is intended to avoid null threadToInterrupt only when using this ctor. I've seen similar pattern in AwsBasicCredentials.

protected AwsBasicCredentials(String accessKeyId, String secretAccessKey) {
this(accessKeyId, secretAccessKey, true);
}
private AwsBasicCredentials(String accessKeyId, String secretAccessKey, boolean validateCredentials) {
this.accessKeyId = trimToNull(accessKeyId);
this.secretAccessKey = trimToNull(secretAccessKey);
if (validateCredentials) {
Validate.notNull(this.accessKeyId, "Access key ID cannot be blank.");
Validate.notNull(this.secretAccessKey, "Secret access key cannot be blank.");
}

I can probably create another private ctor adding boolean validateThread param to make it more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that sort of makes sense, but I don't think the AwsBasicCredentials case is the same since the validation behavior is controlled by another parameter. I just thought it was weird to bother checking since the class works whether or not the thread is null since there's always a null check before using it

timeoutExceptionAssertion().accept(() -> retryableCallable().call());
}

public static class SlowBytesResponseTransformer<ResponseT> implements ResponseTransformer<ResponseT, ResponseBytes<ResponseT>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to test with these different specific transformer types as a opposed to just some generic "slow" transformer? The tests that use them look identical otherwise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, those tests are created to make sure we have InterruptMonitor.checkInterrupted(); in each of the response transformer types to cover the scenarios where the request time exceeds the configured time right after entering our defined response transformer.

https://github.com/aws/aws-sdk-java-v2/pull/724/files#diff-66508e304899c9905bc9d659b856b50dR119


return requestCallable.call();
try {
return requestCallable.call();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In V1, for non-streaming, if request execution timeout is enabled, we count time to read the full HTTP response towards the timeout (by buffering the entity first). Do we this in v2? If not should we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently do this in V2, but I guess we can add a isBufferResponse in ExecuteRequest and pass it to the http clients. This seems a bit out of scope of this PR and I'd say creating another PR for this feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the goals of the new timers is to make the per attempt and per call timeouts behave more similiarly. So the per attempt timeout would not just be an abort on the HTTP request but a timer that interrupts as well and is scoped to the entirety of the things we do per request (i.e. unmarshalling for example). This would elimnate the need for buffering as we consume the data in unmarshalling which is subject to the timeout. Is this how the timeouts work (haven't looked at the PR)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the current timeout implementation is the same as v1, so attempt timeout is just aborting the request. I like the idea of scoping it to the entire thing and will update the PR to see how it looks. One concern I have though is now that both call timemout and attempt timeout can interrupt the thread, handling interrupt exception can get tricky and we might get more issues with uncleared interrupt status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's always the tricky part of this feature. We also need to make sure the right exception is thrown. I.E. if api call attempt and api call timeouts trigger at the same time then api call timeout should be thrown as it's non-retryable.

timeoutTracker.abortable(requestCallable);

context.apiCallAttemptTimeoutTracker(timeoutTracker);
context.apiCallTimeoutTracker().abortable(requestCallable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this redundant with line 71?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended. line 71 is setting the abortable for ApiCallAttemptTimeoutTracker whereas line 74 is setting the abortable for ApiCallTimeoutTracker

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay I see it

}
}

public static void wastingTimeInterrupitably() throws InterruptedException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "interruptibly" mispelled

* Schedule a {@link TimeoutTask} that aborts the task if not otherwise completed before the given timeout.
*
* @param timeoutExecutor the executor to execute the {@link TimeoutTask}
* @return a {@link TimeoutTracker}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add docs for the other params as well. It's hard to tell what isInterruptedThread is without looking at the source. I'd also suggest changing it to interruptCurrentThread

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. will fix.

* InterruptedException} is thrown from a interruptible task, you should either re-interrupt the current thread or throw that
* {@link InterruptedException} from the {@link #apply(Object, AbortableInputStream)} method. Failure to do these things may
* prevent the SDK from stopping the request in a timely manner in the event the thread is interrupted externally.
* InterruptedException} is thrown from a interruptible task, you should throw that {@link InterruptedException} from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we remove the ability for customers to just interrupt the thread without (re)throwing an InterruptedException?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the customer just sets the interrupt flag, we are relying on InterruptMonitor.checkInterrupted(); to throw InterruptedException and it can only happen after transform method, so the timeout might be far off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean? Can we add a check for Thread.isInterrupted() after we call transform?

To me it's perfectly valid for customers to do this in their transform methods:

MyObject transform() {
    try {
        // something interruptible
    } catch (InterruptedException e) {
        log.error("Transform was interrupted", e);
        Thread.interrupt();
        return null;
    }
}

We have an explicit test for this behavior in the S3 module (though it's currently disabled): https://github.com/aws/aws-sdk-java-v2/blob/2.0.0-preview-12/services/s3/src/it/java/software/amazon/awssdk/services/s3/GetObjectFaultIntegrationTest.java#L108-L129

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we can add the check after we call transform, but the time when the request actually gets timed out might be far from accurate.

For example, say we set the request timeout to 1s and if the transform method takes more than 5s, the request only will throw TimeoutException after 5s when it goes to InterruptMonitor.checkInterrupted(), or even worse, if the request gets stuck in transform, TimeoutException will never get thrown.

Yeah, I can see it's valid for customers to just re-interrupt the thread, but my point is it's highly possible that they get TimeoutException long after the configured timeout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we already state in the doc that customers should throw an InterruptedException if their transform method gets interrupted, so I guess I don't see how that's too different (as far as timing accuracy) if instead of re-throwing the exception, they set the flag and return right away and on the SDK side we have a check for the flag as soon as transform() returns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should be fine for most of the cases. I was thinking of one edge case where the customers just set the interrupted flag without re-throwing any exception and do some time-consuming task after it.

I will update the docs.

*/
private RuntimeException handleInterruptedException(RequestExecutionContext context, InterruptedException e) {
if (e instanceof SdkInterruptedException) {
((SdkInterruptedException) e).getResponseStream().ifPresent(r -> invokeSafely(r::close));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be better to set the current response in RequestExecutionContext and close it on any exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are actually closing inputStream in HandleResponseStage.
This is to close the response stream if the thread gets interrupted before that line.

import software.amazon.awssdk.core.exception.SdkServiceException;
import software.amazon.awssdk.core.http.HttpResponseHandler;

public class ResponseHandlerTestUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit final

@@ -81,10 +83,14 @@
*/
default ReturnT apply(ResponseT response, AbortableInputStream inputStream) throws Exception {
try {
return transform(response, inputStream);
} catch (RetryableException e) {
InterruptMonitor.checkInterrupted();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we do this here instead instead of in

return responseTransformer.apply(resp, response.content().get());
? My concern is if this gets overridden then we lose this behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, same as exception handling logic here. I think we should probably move the whole thing to the SyncClientHandler

public void run() {
log.debug(() -> "Timing out, aborting the task");
hasExecuted = true;
if (!threadToInterrupt.isInterrupted()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, will remove

Copy link
Contributor

@dagnir dagnir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@zoewangg zoewangg force-pushed the zoewang-TimeoutForSyncPath branch 2 times, most recently from f2aca09 to a62a2c5 Compare October 15, 2018 18:10
@zoewangg zoewangg merged commit 01247f3 into master Oct 15, 2018
@zoewangg zoewangg deleted the zoewang-TimeoutForSyncPath branch October 15, 2018 19:05
aws-sdk-java-automation added a commit that referenced this pull request Jan 24, 2020
…637064dd

Pull request: release <- staging/e795d56a-1e26-460a-848f-bacb637064dd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants