New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
java.lang.IllegalStateException: Connection pool shut down when refreshing table metadata on s3 #8601
Comments
I have encountered the same issue on Apache Iceberg 1.3.1. This exception is raised when reading parquet data file:
|
Between Iceberg 1.2.1 and 1.3.0 the underlying http client changed from url-connection-client to the apache http client (#7119). It's possible that this might cause the issue you're seeing. You could try and switch back to using the url-connection-client in order to see if that fixes the issue. |
I think I have spotted the problem. This problem only reproduces when using web identity token file to authenticate. The
S3FileIO.close closes the Apache HTTP clients used by both the S3 client and the STS client (which is used by the web identity token file credential provider). The problem is that all S3FileIO objects actually shares one same credential provider. If any of the S3FileIO object was finalized, the shared credential provider object was broken. AwsProperties.java line:1801 creates the credential provider object used by S3FileIO object, which is actually a singleton: public final class DefaultCredentialsProvider {
private static final DefaultCredentialsProvider DEFAULT_CREDENTIALS_PROVIDER = new DefaultCredentialsProvider(builder());
// ...
public static DefaultCredentialsProvider create() {
return DEFAULT_CREDENTIALS_PROVIDER;
}
// ...
} A workaround for this problem is always creating a new instance of diff --git a/aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java b/aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java
index 9266c83f1..0f182a20c 100644
--- a/aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java
+++ b/aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java
@@ -45,6 +45,7 @@ import software.amazon.awssdk.auth.credentials.AwsBasicCredentials;
import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider;
import software.amazon.awssdk.auth.credentials.AwsSessionCredentials;
import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider;
+import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider.Builder;
import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider;
import software.amazon.awssdk.awscore.client.builder.AwsClientBuilder;
import software.amazon.awssdk.awscore.client.builder.AwsSyncClientBuilder;
@@ -1798,7 +1799,8 @@ public class AwsProperties implements Serializable {
return credentialsProvider(this.clientCredentialsProvider);
}
- return DefaultCredentialsProvider.create();
+ Builder builder = DefaultCredentialsProvider.builder();
+ return builder.build();
}
private AwsCredentialsProvider credentialsProvider(String credentialsProviderClass) { Currently, I'm still testing this patch to see if it actually resolve this issue. I'll let you know if the problem goes away. |
The patch mentioned above should have fixed this issue. However, this is quite hackish and coupled with the internal implementation of AWS SDK v2, which makes it hard to write unit tests for this patch. Is there any better approach for resolving this issue (except for switching to url-connection-client)? |
@Kontinuation thanks for the detailed explanation. Can you confirm that this fixes the issue? I think changing to using |
Nice catch. If I understood things correctly, the core issue is that AWS' SDK is tying the lifecycle of its clients to the credentials providers. If iceberg want's clients to be closeable (which is perfectly reasonable ), I think it will need to create a new provider for each client. The main drawback is that it will have to refresh credentials for each new client... If AWS supports it, a different http client instance could be passed to the credential provider so it wouldn't be closed with s3's EDIT When a custom client is explicitly provided to the builder SdkHttpClient apacheHttpClient = ApacheHttpClient.create();
// Singletons: Use the s3Client and dynamoDbClient for all requests.
S3Client s3Client =
S3Client.builder()
.httpClient(apacheHttpClient).build(); It's not closed automatically
https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/http-configuration-apache.html But this makes things a bit more complex |
Yes. I confirmed that this really fixed the issue. Using a unique instance of DefaultCredentialsProvider per S3 client is also the default behavior of AWS SDK v2 (See AwsDefaultClientBuilder.resolveCredentials). I think using |
@Kontinuation would you be interested in creating a PR? I also think @elkhand also observed the same problem with Flink session cluster mode, where static global singleton tends to be problematic with object lifecycle management. |
Sure. I've submitted a pull request #8677 |
@Kontinuation @stevenzwu I believe this fix was released over 1.4.0 over last week, but I am still getting this error over Flink (1.15) Iceberg jobs:
|
@AkshayWise this fix didn't make it into 1.4.0 unfortunately |
This issue still exists in the Iceberg 1.14.1 version with Flink 1.17 when the iceberg catalog is created with If the iceberg catalog is created without S3FileIO, then this issue does not occur:
|
@elkhand can you paste the stacktrace that you're seeing? This would help in seeing whether it's the same or a different issue. |
Sure @nastra , here is the stacktrace:
|
It does not seem to be the same problem. According to the stacktrace, |
Yup, this is not the same issue which you can tell via the new stacktrace. I'm working on the fix for this new issue but it is pending some internal company processes. I'll likely submit a PR after Thanksgiving week. |
@mas-chen Hi, I'm wondering is there any updates or work around for the |
I had the same issue you had @elkhand and @Kontinuation is right. Something is closing the S3 client and I don't know what that is. I ended up writing a custom S3FileIO and use it with my catalog loader which re-opens the s3 client if it is ever closed. That way I don't have the problem. Note: It does not work with Iceberg version 1.3 because the methods that this custom S3FileIO overriding is only public since Iceberg version 1.4. https://gist.github.com/javrasya/513f838a8af355b51506ca2a2dc1e3d8 |
Apache Iceberg version
1.3.1 (latest release)
Query engine
Spark
Please describe the bug 馃悶
Hi, we're using iceberg on a long running spark sql server and after upgrading from 1.2 to 1.3.1 we noticed that eventually the server starts throwing
java.lang.IllegalStateException: Connection pool shut down
on s3's connection pool. Full stack trace:I belive this MAY have been caused by #7513 which introduced a finalizer method for
org.apache.iceberg.aws.s3.S3FileIO
. I'm not entirely sure that this is the cause of this issue, but this finalizer seems problematic since it closes the client instance while not being the sole owner of it.S3FileIO
may "leak" the client on these 3 methodsIf the caller retains ownership of a returned object (InputFIle, OutputFile, etc) this object may outlive the S3FileIO instance. The finalizer may be called and s3 client will be closed by the time it's needed.
Again, I'm not sure if that's the case here since these things are super tricky to track down and I haven't managed to reproduce this locally.
The text was updated successfully, but these errors were encountered: