Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: Add S3FileIOAwsClientFactory with s3.client-factory-impl catalog property for S3FileIO #7590

Merged
merged 8 commits into from
May 16, 2023

Conversation

akshayakp97
Copy link
Contributor

Subtask of #7516

This PR adds a new client factory class S3FileIOAwsClientFactory with catalog property s3.client-factory-impl. When s3.client-factory-impl is set to true, we instantiate S3FileIOAwsClientFactory class. If the property wasn't set, we fallback to AwsClientFactory factory classes.

@github-actions github-actions bot added the AWS label May 12, 2023
import org.apache.iceberg.util.PropertyUtil;
import software.amazon.awssdk.services.s3.S3Client;

public class S3FileIOAwsClientFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be Serializable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it will be serialized

*/
@SuppressWarnings("unchecked")
public static <T> T getS3ClientFactoryImpl(Map<String, String> properties) {
boolean useS3FileIO = PropertyUtil.propertyAsBoolean(properties, CLIENT_FACTORY, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: useS3FileIO is slightly misleading, maybe initFactory or something along those lines?

* @return factory class object
*/
@SuppressWarnings("unchecked")
public static <T> T getS3ClientFactoryImpl(Map<String, String> properties) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me it feels like this functionality should live in a slimmed down version of AwsClientFactories where we don't have compile-time dependencies to dynamo/glue/.. (basically without the DefaultAwsClientFactory)

Map<String, String> properties = Maps.newHashMap();
properties.put(S3FileIOAwsClientFactory.CLIENT_FACTORY, "true");
Object factoryImpl = S3FileIOAwsClientFactory.getS3ClientFactoryImpl(properties);
Assertions.assertThat(factoryImpl instanceof S3FileIOAwsClientFactory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be simplified to assertThat(factoryImpl).isInstanceOf(S3FileIOAwsClientFactory.class)

.withFailMessage(
"should instantiate an object of type S3FileIOAwsClientFactory when s3.client-factory-impl is set")
.isTrue();
assert factoryImpl instanceof S3FileIOAwsClientFactory;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we're using assert here. I believe this would even be always ignored because they have to be explicitly enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove this, sorry. I needed this for my older revisions where the compiler was complaining

* set, will load one of {@link org.apache.iceberg.aws.AwsClientFactory} factory classes to
* provide backward compatibility.
*/
public static final String CLIENT_FACTORY = "s3.client-factory-impl";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impl in the property name usually suggests that one would specify an actual implementation class, but here we're using a boolean, which seems inconsistent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, my original thought is that S3FileIOAwsClientFactory is a trimmed down version of AwsClientFactories interface, that only requires s3() and s3Async() (once we add async client integration). And there could be a DefaultS3FileIOAwsClientFactory that is the default implementation of that trimmed down version of the factory. So this should actually not be a boolean config, but takes the impl class name, basically exactly the same experience as the original AwsClientFactories

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3FileIOAwsClientFactory is an interface. And, DefaultS3FileIOAwsClientFactory will implement S3FileIOAwsClientFactory. In case the property isn’t set, do we want to fallback to AwsClientFactory.defaultFactory()?

"should instantiate an object of type S3FileIOAwsClientFactory when s3.client-factory-impl is set")
.isTrue();
assert factoryImpl instanceof S3FileIOAwsClientFactory;
Assertions.assertThat(((S3FileIOAwsClientFactory) factoryImpl).s3())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove this, as CI check is failing while trying to instantiate S3Client in unit test. Added tests to verify instantiating s3 client in TestS3FileIOIntegration

import org.apache.iceberg.relocated.com.google.common.base.Strings;
import org.apache.iceberg.util.PropertyUtil;

@SuppressWarnings("checkstyle:HideUtilityClassConstructor")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I believe we typically just add a private no-arg constructor instead of suppressing this

Object factoryImpl = S3FileIOAwsClientFactories.initFactory(properties);
Assertions.assertThat(factoryImpl)
.withFailMessage(
"should instantiate an object of type AwsClientFactory when s3.client-factory-impl is set to false")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the "is set to false" doesn't apply anymore

* @return an instance of a factory class
*/
@SuppressWarnings("unchecked")
public static <T> T initFactory(Map<String, String> properties) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer simple names of possible, can we call this method just initialize?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we can mostly follow the pattern in AwsClientFactories:

public static AwsClientFactory defaultFactory() {
    return AWS_CLIENT_FACTORY_DEFAULT;
  }

  public static AwsClientFactory from(Map<String, String> properties) {
    String factoryImpl =
        PropertyUtil.propertyAsString(
            properties, AwsProperties.CLIENT_FACTORY, DefaultAwsClientFactory.class.getName());
    return loadClientFactory(factoryImpl, properties);
  }

So we use a singleton if default, and don't need to use PropertyUtil.propertyAsString(xxx, null) which is a bit awkward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackye1995 when S3FileIOProperties.CLIENT_FACTORY is not set, we still want to support loading user provided AwsClientFactory implementation defined in AwsProperties.CLIENT_FACTORY catalog property defined here. Thats why I set default value to null.

For ex, if someone sets AwsProperties.CLIENT_FACTORY to AssumeRoleAwsClientFactory. Then, we should support loading that factory impl class, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. Thanks for the explanation!

import org.apache.iceberg.aws.HttpClientProperties;
import software.amazon.awssdk.services.s3.S3Client;

public class DefaultS3FileIOAwsClientFactory implements S3FileIOAwsClientFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default class don't need to be public, it can be embedded in the interface class. In general we should minimize the number of public classes and methods.

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jackye1995
Copy link
Contributor

Thanks for the work! And thanks for the review @nastra

@jackye1995 jackye1995 merged commit 29eebdd into apache:master May 16, 2023
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants