-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass-through IcebergIO catalog properties #31726
Conversation
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
@Pure | ||
public abstract long getAuthSessionTimeoutMillis(); | ||
public abstract Properties getProperties(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kennknowles any benefits to using java.util.Properties
as opposed to a Map<String, String>
?
R: @kennknowles CC: @BhupiSindhwani |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
89ee279
to
e9fe5cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java change LGTM (I didn't look line by line but viewed a number of sample lines and this is a good change)
The dependencies I'm a little more worried about. Need to think hard about what becomes a required dependency. We could also put some of the "optional" deps in the ManagedIO layer instead of this internal module.
sdks/java/core/build.gradle
Outdated
@@ -98,7 +98,7 @@ dependencies { | |||
permitUnusedDeclared enforcedPlatform(library.java.google_cloud_platform_libraries_bom) | |||
provided library.java.json_org | |||
implementation library.java.everit_json_schema | |||
implementation library.java.snake_yaml | |||
shadow library.java.snake_yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid of this getting changed again. Is it necessarily part of this change? Is it safe now?
sdks/java/io/iceberg/build.gradle
Outdated
@@ -23,6 +23,8 @@ import java.util.stream.Collectors | |||
plugins { id 'org.apache.beam.module' } | |||
applyJavaNature( | |||
automaticModuleName: 'org.apache.beam.sdk.io.iceberg', | |||
shadowClosure: {}, | |||
validateShadowJar: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why false? I think we want to validate it. Otherwise we end up having duplicate classes in multiple jars. Basically we want everything in the jar to be in our namespace, not any other namespace.
sdks/java/io/iceberg/build.gradle
Outdated
@@ -54,11 +56,12 @@ dependencies { | |||
implementation "org.apache.iceberg:iceberg-parquet:$iceberg_version" | |||
implementation "org.apache.iceberg:iceberg-orc:$iceberg_version" | |||
implementation library.java.hadoop_common | |||
// Hadoop GCS filesystem dependencies | |||
runtimeOnly library.java.hadoop_client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm OK with this... it isn't very Java-like to put optional dependencies as mandatory dependencies though. Typically these should also be resolvable via transitive dependencies if we depend on e.g. GCS core which provides the GCS filesystem.
Thanks for taking a look @kennknowles. I see what you mean about including dependencies only if they're mandatory. These dependency changes aren't necessary for this PR so I reverted them so we can get this in. I wonder if we can improve IcebergIO's dependency story in a future PR though. Currently a user writing to a GCS Iceberg table has to include these dependencies:
Maybe we can provide an uber jar just for these dependencies (e.g. |
* Pass-through iceberg catalog properties * add to CHANGES.md * trigger integration test * remove custom configuration; pass catalog name
The current IcebergIO implementation supports only a fixed set of catalog properties. This is an unnecessary limitation for users and in some cases makes the connector unusable (e.g. if an unsupported property is required).
The changes in this PR opens it up so that properties can be specified in a key-store fashion and then simply be passed through to the catalog.
Note: This is a breaking change, but it's worth going ahead with it considering the big upside. This is also a new connector with not many users.