Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3.getObject Unable to execute HTTP request: Timeout waiting for connection from pool #2663

Closed
umutyusuf opened this issue Nov 8, 2021 · 10 comments
Labels
guidance Question that needs advice or information.

Comments

@umutyusuf
Copy link

umutyusuf commented Nov 8, 2021

Describe the bug

I'm having a problem while trying to get the object over the S3 instance.
com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool

I'm trying to reach the s3 over lambda with EventBridge every 3 minutes with a max of 1 instance. After 5 hours, the exceptions start. This code runs in a single thread.

Here are the code pieces to that produces the exception.

  • Creation of S3 instance
public AmazonS3 s3(final AWSCredentialsProvider awsCredentialsProvider) {
        return AmazonS3ClientBuilder.standard()
                .withRegion(Regions.US_EAST_1)
                .withCredentials(awsCredentialsProvider)
                .build();
}
  • S3CrudService
public @NotNull List<S3Object> readAllObjects(@NotNull String bucketName, Integer limit) {
        logger.info("Reading all objects from bucket = {}", bucketName);

        final List<S3Object> s3Objects = new ArrayList<>();
        final ObjectListing objects = s3.listObjects(new ListObjectsRequest(bucketName, null, null, null, limit));
        if (objects == null) {
            logger.warn("{} bucket object has returned null!!", bucketName);
            return Collections.emptyList();
        }
        final List<S3ObjectSummary> objectSummaries = objects.getObjectSummaries();
        if (CollectionUtils.isEmpty(objectSummaries)) {
            logger.warn("Objects are empty in bucket = {}!", bucketName);
            return Collections.emptyList();
        }
        final int size = objectSummaries.size();
        logger.info("Found files of size = {} in bucket = {}", size, bucketName);

        logger.info("STARTED - retrieving all objects in buckets = {} of size = {}", bucketName, size);
        for (final S3ObjectSummary objectSummary : objectSummaries) {
            final String key = objectSummary.getKey();
            if (StringUtils.hasText(key)) {
                final S3Object object = s3.getObject(bucketName, key);
                s3Objects.add(object);
            } else {
                logger.warn("{} bucket file object summary has invalid key for summary = {}", bucketName, objectSummary);
            }
        }
        return s3Objects;
    }

The pieces that produces the exception

public @NotNull List<SomeObject> readAndDeleteCurrentResults(int limit) {
        logger.info("STARTED - Reading results from s3");
        final List<S3Object> s3Objects = s3CrudService.readAllObjects(BUCKET_NAME, limit);
        final List<String> keys = new ArrayList<>();
        final List<RepricingResult> results = new ArrayList<>();
        try {
            logger.info("STARTED - retrieving and deserializing results from files of size = {}", s3Objects.size());
            for (final S3Object object : s3Objects) {
                final String key = object.getKey();
                if (StringUtils.hasText(key)) {
                    results.addAll(parseFile(object));
                    keys.add(key);
                } else {
                    logger.warn("Result file object summary has invalid key for summary = {}", object);
                }
            }
        } catch (Throwable t) {
            logger.error("Error occurred while reading  results from s3", t);
        } finally {
            // ---- REPRICER RESULT
            s3CrudService.deleteKeys(BUCKET_NAME, keys);
        }
        return results;
    }
  • And the method that reads, and closes the object
private List<SomeObject> parseFile(@NotNull S3Object object) {
        final String fileKey = object.getKey();
        try (object) {
            try (final GZIPInputStream gzipInputStream = new GZIPInputStream(object.getObjectContent())) {
                final byte[] bytes = IOUtils.toByteArray(gzipInputStream);
                return objectMapper.readValue(bytes, new TypeReference<>() {
                });
            }
        } catch (Throwable t) {
            logger.error("Error while reading result file for key = {}", fileKey, t);
        }
        return Collections.emptyList();
    }

Expected behavior

As you can see from the code pieces, I'm closing every object that's opened, even though some exceptions might occur, every item in objects should be properly closed. The expected behaviour is that the connections must be properly released, and new connections can be successfully established.

Current behavior

Exception is thrown after a while (observed to be nearly 5 hours - triggered every 3 mins). Here are the logs

Failed to execute CommandLineRunner: java.lang.IllegalStateException
java.lang.IllegalStateException: Failed to execute CommandLineRunner
	at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:807)
	at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:788)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:333)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:1311)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:1300)
	at org.springframework.cloud.function.adapter.aws.FunctionInvoker.start(FunctionInvoker.java:85)
	at org.springframework.cloud.function.adapter.aws.FunctionInvoker.<init>(FunctionInvoker.java:67)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
	at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1207)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1153)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5267)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5214)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1513)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1367)
	at com.onestart.aws.services.service.impl.S3CrudServiceImpl.readAllObjects(S3CrudServiceImpl.java:58)
	at com.onestart.Result.service.impl.ResultFileServiceImpl.readAndDeleteCurrentResults(ResultFileServiceImpl.java:65)
	at com.onestart.Result.app.ResultAggregateJob.processResults(ResultAggregateJob.java:37)
	at com.onestart.Result.app.ResultApp.lambda$processScheduledEvent$0(ResultApp.java:24)
	at org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.invokeConsumer(SimpleFunctionRegistry.java:784)
	at org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.doApply(SimpleFunctionRegistry.java:589)
	at org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.apply(SimpleFunctionRegistry.java:435)
	at org.springframework.cloud.function.adapter.aws.CustomRuntimeEventLoop.eventLoop(CustomRuntimeEventLoop.java:108)
	at org.springframework.cloud.function.adapter.aws.CustomRuntimeInitializer.lambda$null$0(CustomRuntimeInitializer.java:49)
	at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:804)
	... 10 more
Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:316)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:282)
	at jdk.internal.reflect.GeneratedMethodAccessor134.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
	at com.amazonaws.http.conn.$Proxy165.get(Unknown Source)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1331)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
	... 31 more

Steps to Reproduce

You can only reproduce this by running it for a long time with the same s3 instance.

Possible Solution

It's either in my code, or the connections are not released properly even though they are released properly. I've also checked the previously created issue but it got me nowhere.

Context

No response

AWS Java SDK version used

1.11.973

JDK version used

Java 11 (Corretto)

Operating System and version

Labmda - Java 11 (Corretto)

@umutyusuf umutyusuf added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 8, 2021
@debora-ito
Copy link
Member

@umutyusuf thank you for reaching out.

Is the S3 client shared or is a new instance of the client created with every new request? We recommend reusing the same client instance whenever possible.

The error Timeout waiting for connection from pool indicates that all the connections from the pool are in use and a new request timed out while waiting for a connection to be available.

If you enable the client-side metrics you can obtain more insights on the performance of your application over time and perhaps see why the error only occurs after 5h. Metrics like HttpClientPoolAvailableCount,
HttpClientPoolLeasedCount and HttpClientPoolPendingCount will be specially helpful in this case.

To enable the client-side metrics see the blog post - https://aws.amazon.com/blogs/developer/enabling-metrics-with-the-aws-sdk-for-java/

@debora-ito debora-ito added response-requested Waiting on additional info or feedback. Will move to "closing-soon" in 5 days. and removed needs-triage This issue or PR still needs to be triaged. labels Nov 8, 2021
@umutyusuf
Copy link
Author

Hey @debora-ito, thanks for the response. the s3 client is singleton injected with spring so there is only a single instance. I've separated the modules to run on different lambda functions, so this instance is not shared for this application. Weirdly, the exception did not occur in the last 24 hours, but I'll keep watching and in case the exception arose again, I'll enable the metrics and share the results.

@github-actions github-actions bot removed the response-requested Waiting on additional info or feedback. Will move to "closing-soon" in 5 days. label Nov 9, 2021
@umutyusuf
Copy link
Author

umutyusuf commented Nov 17, 2021

The errors started to after 6 days without any changes. So I made a little adjustment in the code after the exceptions started.
Replaced this part

private List<SomeObject> parseFile(@NotNull S3Object object) {
        final String fileKey = object.getKey();
        try (object) {
            try (final GZIPInputStream gzipInputStream = new GZIPInputStream(object.getObjectContent())) {
                final byte[] bytes = IOUtils.toByteArray(gzipInputStream);
                return objectMapper.readValue(bytes, new TypeReference<>() {
                });
            }
        } catch (Throwable t) {
            logger.error("Error while reading result file for key = {}", fileKey, t);
        }
        return Collections.emptyList();
    }

with this

 private List<SomeObject> parseFile(@NotNull S3Object object) {
        final String fileKey = object.getKey();
        try {
            return objectMapper.readValue(new GZIPInputStream(object.getObjectContent()), new TypeReference<>() {
            });
        } catch (Throwable t) {
            logger.error("Error while reading result file for key = {}", fileKey, t);
        }
        return Collections.emptyList();
 }

Here objectMapper.readValue method properly closes the given input strem. (ObjectMapper from jackson library)

As I mentioned in the original entry, this application runs on AWS Lambda. Meaning the application does not run indefinitely and is restarted roughly every 20 minutes. So I don't expect that the thread pool issue should not persist across different applications.

I've also tried to enable the metrics as @debora-ito mentioned by adding an environment variable to lambda function as

Key = JAVA_TOOL_OPTIONS
Value = -Dcom.amazonaws.sdk.enableDefaultMetrics

But still can not see the metrics in the cloudwatch under AWSSDK/Java. I'll update the issue with the metrics once I'm able to see the graphs and the values you @debora-ito requested, but until then, I need assistance on the issue.

PS: I have 3 separate applications that use this flow, read from the s3 process, and terminate. All three applications use different s3 buckets, but I only face this exception in one of them.

@umutyusuf
Copy link
Author

hey, @debora-ito I've collected the values, and the issue is resurfaced. However, I failed to share the whole graph so I'll try to follow your lead on how you want the see the metrics. The reason for sharing these separately is that the connection time values jump to thousands and the other values are not readable anymore.

After nearly 4 hours, HttpClientGetConnectionTime suddenly jumps to very big values suddenly. Here is the ss for this value.
Screen Shot 2021-11-18 at 6 48 41 PM

For the same time period, here is the values for

  • HttpClientPoolLeasedCount
  • HttpClientPoolAvailableCount
  • RequestCount (I couldn't see HttpClientPoolPendingCount I assume this is not substitue for that but can provide insight to some point)

Screen Shot 2021-11-18 at 6 46 44 PM

As you can see from the graph, the available thread count always caught up with leased one which I assume indicates that the threads are released properly. I don't know where the problem is, ready to provide more info about the issue if you require it.

Thanks.

@umutyusuf
Copy link
Author

I think I found the issue, once the file size exceeds the max thread count, the new requests are blocked. Sorry for the inconvinence.

@debora-ito debora-ito added guidance Question that needs advice or information. and removed bug This issue is a bug. labels Nov 25, 2021
@highfestiva
Copy link

@umutyusuf We're having the same problem, what do you mean by "file size exceeds the max thread count"? I'm only seeing it on larger files.

@umutyusuf
Copy link
Author

@highfestiva I was trying to read too many objects in parallel and relying on the pool to be available once the file read operation is completed, however after performing tests, once the max thread count of the client exceeded, all the threads got blocked and the client halts in whole. I know this is an issue but I didn't want to persist further.

@philliptaylorpro
Copy link

Is anyone else able to explain how to programmatically recover from this error? I'm closing everything except the GetObjectRequest itself but the documentation and example code doesn't imply I need to close request objects.

Avoiding reading too many files in parallel means I have to reverse engineer the connection pool logic etc to match it to what I'm doing and opens a can of worms on correctness. Can the connection pool be restarted?

@will-molloy
Copy link

We recommend reusing the same client instance whenever possible.

My solution is to create the client per request, cause as others have mentioned, there's no way to get the connection pool size and throttle accordingly.

@Bluexin
Copy link

Bluexin commented Aug 7, 2023

Seems to be a dupe of #1405 which seems to have a better solution than creating new s3 clients (for people stumbling upon this one in the future)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests

6 participants