-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS: Add S3FileIOAwsClientFactory with s3.client-factory-impl catalog property for S3FileIO #7590
Conversation
import org.apache.iceberg.util.PropertyUtil; | ||
import software.amazon.awssdk.services.s3.S3Client; | ||
|
||
public class S3FileIOAwsClientFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be Serializable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, it will be serialized
*/ | ||
@SuppressWarnings("unchecked") | ||
public static <T> T getS3ClientFactoryImpl(Map<String, String> properties) { | ||
boolean useS3FileIO = PropertyUtil.propertyAsBoolean(properties, CLIENT_FACTORY, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: useS3FileIO
is slightly misleading, maybe initFactory
or something along those lines?
* @return factory class object | ||
*/ | ||
@SuppressWarnings("unchecked") | ||
public static <T> T getS3ClientFactoryImpl(Map<String, String> properties) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me it feels like this functionality should live in a slimmed down version of AwsClientFactories
where we don't have compile-time dependencies to dynamo/glue/.. (basically without the DefaultAwsClientFactory
)
Map<String, String> properties = Maps.newHashMap(); | ||
properties.put(S3FileIOAwsClientFactory.CLIENT_FACTORY, "true"); | ||
Object factoryImpl = S3FileIOAwsClientFactory.getS3ClientFactoryImpl(properties); | ||
Assertions.assertThat(factoryImpl instanceof S3FileIOAwsClientFactory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be simplified to assertThat(factoryImpl).isInstanceOf(S3FileIOAwsClientFactory.class)
.withFailMessage( | ||
"should instantiate an object of type S3FileIOAwsClientFactory when s3.client-factory-impl is set") | ||
.isTrue(); | ||
assert factoryImpl instanceof S3FileIOAwsClientFactory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why we're using assert
here. I believe this would even be always ignored because they have to be explicitly enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove this, sorry. I needed this for my older revisions where the compiler was complaining
* set, will load one of {@link org.apache.iceberg.aws.AwsClientFactory} factory classes to | ||
* provide backward compatibility. | ||
*/ | ||
public static final String CLIENT_FACTORY = "s3.client-factory-impl"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impl
in the property name usually suggests that one would specify an actual implementation class, but here we're using a boolean, which seems inconsistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, my original thought is that S3FileIOAwsClientFactory
is a trimmed down version of AwsClientFactories
interface, that only requires s3()
and s3Async()
(once we add async client integration). And there could be a DefaultS3FileIOAwsClientFactory
that is the default implementation of that trimmed down version of the factory. So this should actually not be a boolean config, but takes the impl class name, basically exactly the same experience as the original AwsClientFactories
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3FileIOAwsClientFactory
is an interface. And, DefaultS3FileIOAwsClientFactory
will implement S3FileIOAwsClientFactory
. In case the property isn’t set, do we want to fallback to AwsClientFactory.defaultFactory()
?
"should instantiate an object of type S3FileIOAwsClientFactory when s3.client-factory-impl is set") | ||
.isTrue(); | ||
assert factoryImpl instanceof S3FileIOAwsClientFactory; | ||
Assertions.assertThat(((S3FileIOAwsClientFactory) factoryImpl).s3()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove this, as CI check is failing while trying to instantiate S3Client in unit test. Added tests to verify instantiating s3 client in TestS3FileIOIntegration
…ory interface and utility class
import org.apache.iceberg.relocated.com.google.common.base.Strings; | ||
import org.apache.iceberg.util.PropertyUtil; | ||
|
||
@SuppressWarnings("checkstyle:HideUtilityClassConstructor") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I believe we typically just add a private no-arg constructor instead of suppressing this
Object factoryImpl = S3FileIOAwsClientFactories.initFactory(properties); | ||
Assertions.assertThat(factoryImpl) | ||
.withFailMessage( | ||
"should instantiate an object of type AwsClientFactory when s3.client-factory-impl is set to false") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the "is set to false" doesn't apply anymore
* @return an instance of a factory class | ||
*/ | ||
@SuppressWarnings("unchecked") | ||
public static <T> T initFactory(Map<String, String> properties) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: prefer simple names of possible, can we call this method just initialize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we can mostly follow the pattern in AwsClientFactories:
public static AwsClientFactory defaultFactory() {
return AWS_CLIENT_FACTORY_DEFAULT;
}
public static AwsClientFactory from(Map<String, String> properties) {
String factoryImpl =
PropertyUtil.propertyAsString(
properties, AwsProperties.CLIENT_FACTORY, DefaultAwsClientFactory.class.getName());
return loadClientFactory(factoryImpl, properties);
}
So we use a singleton if default, and don't need to use PropertyUtil.propertyAsString(xxx, null)
which is a bit awkward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackye1995 when S3FileIOProperties.CLIENT_FACTORY
is not set, we still want to support loading user provided AwsClientFactory
implementation defined in AwsProperties.CLIENT_FACTORY
catalog property defined here. Thats why I set default value to null
.
For ex, if someone sets AwsProperties.CLIENT_FACTORY
to AssumeRoleAwsClientFactory
. Then, we should support loading that factory impl class, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. Thanks for the explanation!
import org.apache.iceberg.aws.HttpClientProperties; | ||
import software.amazon.awssdk.services.s3.S3Client; | ||
|
||
public class DefaultS3FileIOAwsClientFactory implements S3FileIOAwsClientFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default class don't need to be public, it can be embedded in the interface class. In general we should minimize the number of public classes and methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Thanks for the work! And thanks for the review @nastra |
Subtask of #7516
This PR adds a new client factory class
S3FileIOAwsClientFactory
with catalog propertys3.client-factory-impl
. Whens3.client-factory-impl
is set to true, we instantiateS3FileIOAwsClientFactory
class. If the property wasn't set, we fallback toAwsClientFactory
factory classes.