-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables #6428
Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables #6428
Conversation
…#1) Initial read-only Snowflake Catalog implementation built on top of the Snowflake JDBC driver, providing support for basic listing of namespaces, listing of tables, and loading/reads of tables. Auth options are passthrough to the JDBC driver. Co-authored-by: Maninder Parmar <maninder.parmar@snowflake.com> Co-authored-by: Maninder Parmar <maninder.parmar+oss@snowflake.com> Co-authored-by: Dennis Huo <dennis.huo+oss@snowflake.com>
Add JdbcSnowflakeClientTest using mocks; provides full coverage of JdbcSnowflakeClient and entities' ResultSetHandler logic. Also update target Spark runtime versions to be included.
snowflake/src/test/java/org/apache/iceberg/snowflake/SnowflakeCatalogTest.java
Show resolved
Hide resolved
consistency and future interoperability with inheriting from abstact unittest base classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @dennishuo, great to have this. I did a more thorough review and my comments are inlined.
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java
Show resolved
Hide resolved
snowflake/src/test/java/org/apache/iceberg/snowflake/SnowflakeCatalogTest.java
Outdated
Show resolved
Hide resolved
snowflake/src/test/java/org/apache/iceberg/snowflake/SnowflakeCatalogTest.java
Outdated
Show resolved
Hide resolved
snowflake/src/test/java/org/apache/iceberg/snowflake/SnowflakeCatalogTest.java
Outdated
Show resolved
Hide resolved
snowflake/src/test/java/org/apache/iceberg/snowflake/SnowflakeCatalogTest.java
Outdated
Show resolved
Hide resolved
snowflake/src/test/java/org/apache/iceberg/snowflake/SnowflakeCatalogTest.java
Outdated
Show resolved
Hide resolved
-Convert unittests to all use assertj/Assertions for "fluent assertions" -Refactor test injection into overloaded initialize() method -Add test cases for close() propagation -Use CloseableGroup.
SnowflakTableOperations class itself, add test case.
SnowflakeClient/JdbcSnowflakeClient layers and merge SnowflakeTable and SnowflakeSchema into a single SnowflakeIdentifier that also encompasses ROOT and DATABASE level identifiers. A SnowflakeIdentifier thus functions like a type-checked/constrained Iceberg TableIdentifier, and eliminates any tight coupling between a SnowflakeClient and Catalog business logic. Parsing of Namespace numerical levels into a SnowflakeIdentifier is now fully encapsulated in NamespaceHelpers so that callsites don't duplicate namespace-handling/validation logic.
…ssert in favor of assertj's Assertions.
@nastra Thanks for the thorough review and suggestions! Finished applying all your suggestions, including fully converting to |
Interesting, not sure why it removed @danielcweeks when I re-requested review from @nastra (definitely wasn't intentional -- as far as I can tell my repository permissions don't even provide the ability to remove reviewer requests). I wonder if this is possibly another manifestation of community/community#8939 |
snowflake/src/main/java/org/apache/iceberg/snowflake/entities/SnowflakeIdentifier.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeClient.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/JdbcSnowflakeClient.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeIdentifier.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeTableMetadata.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeTableMetadata.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/JdbcSnowflakeClient.java
Show resolved
Hide resolved
@dennishuo #6538 is merged, so you might want to rebase on top of that. I'm good, but I it would be great to have @nastra sign off as well since there are couple comments open. |
Thanks for the note! Successfully merged main. I'll await @nastra 's review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @dennishuo for working on this. I've commented on a few nits but overall this looks good to me.
It would also be good to have @rdblue review/approve this.
snowflake/src/test/java/org/apache/iceberg/snowflake/NamespaceHelpersTest.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeTableMetadata.java
Outdated
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeTableOperations.java
Show resolved
Hide resolved
snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java
Outdated
Show resolved
Hide resolved
* snowflakeLocation is a known non-compatible path syntax but fails to match the expected path | ||
* components for a successful translation. | ||
*/ | ||
public static String snowflakeLocationToIcebergLocation(String snowflakeLocation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see Azure and GCP paths converted, what about the S3 ones? Are all combinations like s3a, s3n, etc. all compatible natively in Snowflake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, in situations where Snowflake handles paths coming from externally produced sources, Snowflake tracks a canonical form of it and indeed handles s3a and s3n.
Here, the conversion is for outbound paths produced by Snowflake, where the "s3://" prefix is already used natively, so only Azure and GCS need translation for "standard" scheme mappings accepted by e.g. HadoopFileSystem.
This does remind me though - I've been meaning to open a discussion to see if anyone has thought about maybe adding some ease-of-use hooks for last-mile automatic basic path translations right before they go to FileIO resolution. For example, someone might want s3://somebucket
paths to be rewritten to a viewfs://
base path before letting HadoopFileIO automatically delegate to the right ViewFs impl. Or it could come up in cases where manifest files hold s3://
paths but someone wants everything to go through a corresponding dbfs://
mount point. Do you know of any prior discussions along those lines, and would it be worth opening an issue for broader input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, I approved the changes, very excited to see this from Snowflake.
I've been meaning to open a discussion to see if anyone has thought about maybe adding some ease-of-use hooks for last-mile automatic basic path translations right before they go to FileIO resolution.
For HadoopFileIO, i think this is not really needed because typically we see our users already configured HDFS settings to map schemes to whatever file system implementations they would like to use.
I think ResolvingFileIO to some extent already does this kind of translation to some extent, maybe we can extend its functionality at that front.
|
||
static class FileIOFactory { | ||
public FileIO newFileIO(String impl, Map<String, String> properties, Object hadoopConf) { | ||
return CatalogUtil.loadFileIO(impl, properties, hadoopConf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This factory seems odd since it has no state that is passed through to the loadFileIO
method. Couldn't this just call loadFileIO
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just extracted to preserve the flow of unittest setup to be able to return the pre-configured InMemoryFileIO fake instance rather than having dynamic classloading get an empty one. The alternative there would've been to introduce some static global state in InMemoryFileIO, but that seems prone to causing cross-test-case problems.
I could add a comment or annotation here to explain its existence if that'd make it cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added code comment to clarify
Looks like some CI tests are failing? Could you check? Maybe need to rebase. |
@jackye1995 Thanks for the heads up! Looks like merging to head fixed it. |
This read-only implementation of the Catalog interface, initially built on top of the Snowflake JDBC driver for the connection layer, enables engines like Spark using the Iceberg Java SDK to be able to consume Snowflake-managed Iceberg Tables via Iceberg Catalog interfaces.
Example, assuming a Snowflake account with a database
iot_data
containing a schemapublic
and a managed Iceberg tablesensor_test_results
:Note that the involvement of a JDBC driver is only incidental, and functionality is different from the
JdbcCatalog
- here, Snowflake itself manages manifest/metadata files and table/snapshot metadata, and this catalog layer facilitates the coordination of metadata-file locations and discovery of the latest table snapshot versions without resorting to file-listing or "directory-name"-listing (for listTables or listNamespaces) like the HadoopCatalog.