Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add apiCallAttempt timeout and apiCall timeout feature for asynchrounous operations #657

Merged
merged 1 commit into from
Aug 21, 2018

Conversation

zoewangg
Copy link
Contributor

@zoewangg zoewangg commented Aug 10, 2018

Add apiCallAttempt timeout and apiCall timeout feature for asynchronous operations.

  • apiCallTimeout: time out config for the entire api call request (including signing, retry, etc)
  • apiCallAttemptTimeout: time out config for an api call attempt.

I renamed the original timeout classes. Below are the new classes:

  • TimeoutTask interface of the timer task to abort the abortable
    -> AsyncTimeoutTask implementation for asynchronous operation

  • TimeoutTracker interface to track the TimeoutTask and the ScheduledFuture
    -> ApiCallTimeoutTracker default implementations to track Api call time out task AsyncTimeoutTask
    -> NoOpTimeoutTracker when timeout is not enabled

I removed all the existing timer unit tests using the AmazonSyncHttpClient because I'm working on adding more tests using the generated clients and those tests would cover the existing tests.
Second thoughts: will fix them and add them back
Added them back.


TODOs:

  • add more protocol tests with both apiCallTimeout and apiCallAttemptTimeout enabled.
  • add javadocs
  • clean up logs

@zoewangg zoewangg force-pushed the zoewang-timeoutFeature branch 3 times, most recently from 9425025 to a913b3d Compare August 13, 2018 17:59
@zoewangg zoewangg changed the title [DRAFT] Add apiCallAttempt timeout and apiCall timeout feature Add apiCallAttempt timeout and apiCall timeout feature Aug 14, 2018
@dagnir
Copy link
Contributor

dagnir commented Aug 14, 2018

Still going through it but I had a some questions regarding the async timeouts:

What are the situations in which a customer would want to use a timer on an async call? Since we return a CompletableFuture it seems like it's easy for them to do their own timing out if necessary:

e.g.

CompletableFuture<GetObjectResponse> f = s3Client.getObject(request, toFileTransformer);

// do other stuff

f.get(10, TimeUnit.MILLISECONDS);

I think we should also clearly define what it means to timeout an async task since by definition it doesn't happen synchronously with the current thread. For example, does the timer start when we submit the request to async client? Or when the async client actually starts executing the request?

@shorea
Copy link
Contributor

shorea commented Aug 14, 2018

I think timeouts for async is more about releasing resources than customer SLAs. I.E. close the connection and give it back to the pool, don't submit to our future completion executor, etc.

@zoewangg
Copy link
Contributor Author

Customer might still want to timeout if the request is taking too long and could take up resources and there might be other chained operations in the completableFuture that are getting "blocked".

For example:

CompletableFuture.allOf(s3Async.getObject(oneRequest), s3Async.getObject(anotherrequest));

f.get(10, TimeUnit.MILLISECONDS) could work but that's a blocking call.

Side note: Java 9 introduced CompletableFutre.orTimeout https://docs.oracle.com/javase/9/docs/api/java/util/concurrent/CompletableFuture.html#orTimeout-long-java.util.concurrent.TimeUnit- which is pretty cool.

I think we should also clearly define what it means to timeout an async task since by definition it doesn't happen synchronously with the current thread. does the timer start when we submit the request to async client? Or when the async client actually starts executing the request?

Yeah, I'll add more javadocs.

For ApiCallTimeout, timer starts in the signing stage. https://github.com/aws/aws-sdk-java-v2/pull/657/files#diff-427c7fb0c6f7ebb4aa635811a210f908R130

For ApiCallAttemptTimeout, timer starts right before making http requests. https://github.com/aws/aws-sdk-java-v2/pull/657/files#diff-e3a70123328ddb47abe656e045204a99R224

@dagnir
Copy link
Contributor

dagnir commented Aug 14, 2018

@shorea I thought about that as well, but it seems like timeout is the wrong place to hang it on. Resource cleanup should all take place if the customer cancels the future while the request is executing as well right? I'm not sure sure if we do that today, but I don't think we do.

@@ -87,6 +94,10 @@
public static final SdkClientOption<SdkHttpClient> SYNC_HTTP_CLIENT =
new SdkClientOption<>(SdkHttpClient.class);

public static final SdkClientOption<Duration> API_CALL_ATTEMPT_TIME_OUT = new SdkClientOption<>(Duration.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should be TIMEOUT to match spelling of the getter/setter as well as the spelling of timeout in other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 good catch. it's too tempting to separate them when they are all uppercase!

}

@Override
public void run() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do anything special when streaming responses? Do we bubble an exception to the Subscriber?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I'm aware of. The exception is bubbled to AsyncResponseTransformer.exceptionOccurred.

We have a test for streaming operation: https://github.com/aws/aws-sdk-java-v2/pull/657/files#diff-a78559873b6762403653c7ac0f1beb9eR103


/**
* @return True if client execution has been aborted by the timer task. False otherwise
*/
boolean hasTimeoutExpired();
boolean isTimedOut();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this makes it sound like the task itself has timed out. Maybe something like hasExecuted?

@millems
Copy link
Contributor

millems commented Aug 16, 2018

+1 to cleaning up if the user cancels the future (eg. if they're using Java 9's orTimeout()).

I still think we should support global timeouts within the SDK so that the customer doesn't always have to enforce it at the future level, and can use the same timeout configuration with sync and async.

@@ -73,6 +79,43 @@ protected RequestOverrideConfiguration(Builder<?> builder) {
return apiNames;
}

/**
* The amount of time to wait for the http request to complete before giving up and timing out. A positive value
* enables this feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be "this value will always be positive, if present"? (same comment below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, fixed

* don't get aborted until several seconds after the timer has been breached. Because of this, the request timeout
* feature should not be used when absolute precision is needed.
*
* @see Builder#apiCallAttemptTimeout(Duration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a section here that differentiates between this and apiCallTimeout? I see there's one down there already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 fixed

@@ -226,6 +230,7 @@ private SdkClientConfiguration finalizeAsyncConfiguration(SdkClientConfiguration
*/
private SdkClientConfiguration finalizeConfiguration(SdkClientConfiguration config) {
return config.toBuilder()
.option(TIMEOUT_EXECUTOR_SERVICE, resolveTimeoutExecutorService())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial thought without looking further in the code first: We already have an internal scheduled executor service for retries. Can we use the same one for timeouts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we can, probably need to rename the existingASYNC_RETRY_EXECUTOR_SERVICE to SDK_INTERNAL_EXECUTOR_SERVICE(or something else)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reused the scheduled executor and renamed it to SCHEDULED_EXECUTOR_SERVICE

}

@Override
public Builder toBuilder() {
return new DefaultClientOverrideConfigurationBuilder().advancedOptions(advancedOptions.toBuilder())
.headers(headers)
.retryPolicy(retryPolicy)
.apiCallAttemptTimeout(apiCallAttemptTimeout)
.apiCallAttemptTimeout(apiCallAttemptTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! fixed

public static ApiCallAttemptTimeoutException create() {
return builder().message("HTTP request execution did not complete before the specified timeout configuration.")
.build();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we specify the timeout that was configured here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 updated

return requestConfig.apiCallAttemptTimeout().get().toMillis();
}

if (apiCallAttemptTimeout != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems duplicated from elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other place is getApiTimoeut and this is getApiCallAttemptTimeout

@@ -51,6 +51,7 @@ private SdkException handleServiceException(SdkHttpFullResponse response, Execut
try {
return delegate.handle(response, executionAttributes);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw e;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here. Where were we not interrupting the thread before? Can we just fix it there instead of re-interrupting here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, reverted the change and will create a backlog item to track those.

@@ -31,7 +31,7 @@ log4j.logger.org.apache.http.wire=INFO
log4j.logger.org.apache.http=DEBUG
# log4j.logger.org.apache.http.wire=DEBUG
#
# log4j.logger.com.amazonaws=DEBUG
log4j.logger.sofeware.amazonaws.awssdk=TRACE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

software*? needed?

@@ -27,8 +27,9 @@ log4j.appender.A1.layout.ConversionPattern=%d [%t] %-5p %c - %m%n
#log4j.logger.httpclient.wire=DEBUG

# HttpClient 4 Wire Logging
log4j.logger.org.apache.http.wire=DEBUG
log4j.logger.org.apache=TRACE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these needed?

public static Duration isPositiveIfPresent(Duration duration, String fieldName) {
if (duration == null) {
return null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe isPositiveOrNull? "Present" implies there's an optional involved, to me. Can isPositive delegate to here now for its positivity check?

I also think the naming convention elsewhere is to include "param" in the method name if the second parameter is a field name. Otherwise it would be a string format for the caller to specify the whole message... but it looks like that's already been broken by isPositive, so nevermind!

Copy link
Contributor

@dagnir dagnir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed it, but are API call attempt timeouts implement for sync calls?

context.apiCallTimeoutTracker(timeoutTracker);

future.whenComplete((o, t) -> {
if (t != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this check here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the checks here because we don't want it to timeout if the completableFuture fails exceptionally. I'll add a test for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the future has already been completed, the timeout task should not have any effect though right? Unless it tries to force the timeout exception by using obtrudeException: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html#obtrudeException-java.lang.Throwable-

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops.. I meant to write if (t == null)(The unit tests succeeded because there were either response unmarshaller expception or ApiCallAttemptTimeoutException got retried). But yeah, I think you are right, we should remove the check here.

public void run() {
log.trace(() -> "Times up. Aborting the request");
hasExecuted = true;
if (completableFuture != null && !completableFuture.isDone()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would we have a null future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it should not be null. I'll add the validation in the constructor.


@Override
public void cancel() {
if (future != null && !future.isDone()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, would we ever have a null future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup. same as above, will remove the null check here and add validation in the constructor.

@zoewangg
Copy link
Contributor Author

@dagnir Timeouts for sync were implemented before and we removed them via #494 because timeouts for async were not working back then and we wanted to keep consistent for async and sync.

Hmmm, we are in the same situation now except that this time it's sync that is missing the timeout feature.

Copy link
Contributor

@dagnir dagnir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some weird linebreaks in the javadocs for apiCallTimeout, apiCallAttemptTimeout getters/setters. Can we fix those as well?


@Override
public void cancel() {
if (!future.isDone()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this check. If the future is already complete cancel() should have no effect. There's also no guarantee isDone would still be false when we call cancel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 fixed

public static <T> TimeoutTracker timeCompletableFuture(CompletableFuture<T> completableFuture,
ScheduledExecutorService timeoutExecutor,
SdkClientException exceptionToThrow,
long timeoutInMills) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Taking Duration as a param would be more flexible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm. I actually prefer using long here because we don't need another null check here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough; it's an internal util so I' don't feel strongly either way.

*
* @param duration Number to validate
* @param fieldName Field name to display in exception message if not positive.
* @return Duration if positive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "or null"

return firstValue;
}

T value = fallbackValue.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fallbackValue is empty get() will throw an exception here. I think we can just return fallbackValue here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops nm, it's a Supplier, not an Optional. In that case we can just change this to Optional.ofNullable(value)

Copy link
Contributor

@dagnir dagnir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you add a changelog message? Can you also clarify in your commit and and changelog message that it's for async only?

@zoewangg zoewangg changed the title Add apiCallAttempt timeout and apiCall timeout feature Add apiCallAttempt timeout and apiCall timeout feature for asynchrounous operations Aug 21, 2018
@zoewangg zoewangg merged commit 7ab5e15 into master Aug 21, 2018
@zoewangg zoewangg deleted the zoewang-timeoutFeature branch September 13, 2018 17:48
aws-sdk-java-automation added a commit that referenced this pull request Nov 1, 2019
…1d0a417c

Pull request: release <- staging/388a3a8b-bd1c-4999-82b9-975d1d0a417c
.option(SdkClientOption.ASYNC_RETRY_EXECUTOR_SERVICE, Executors.newScheduledThreadPool(1))
.option(SdkAdvancedAsyncClientOption.FUTURE_COMPLETION_EXECUTOR,
Runnable::run)
.option(SdkClientOption.SCHEDULED_EXECUTOR_SERVICE, Executors.newScheduledThreadPool(1))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt this thread pool technically unbounded? The core pool size is 1, but the max pool is int.max.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the work queue used by the scheduledThreadPool is unbounded. Because the JDK classes only use the core pool size until the work queue is full, and the work queue is unbounded, the maximumPoolSize within the scheduled thread pool should never used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants