Docs: recommendation for packaging uberjars#14292
Conversation
|
Could somebody provide some feedback? Thanks a lot! |
| Iceberg also has modules for adding Iceberg support to processing engines: | ||
|
|
||
| * `iceberg-spark` is an implementation of Spark's Datasource V2 API for Iceberg with submodules for each spark versions (use runtime jars for a shaded version) | ||
| * When packaging user projects, keep only `iceberg-spark-runtime` in the uberjar. All other functional modules such as `iceberg-core` or `iceberg-parquet` (and their transitive dependencies) should be excluded from uberjar, because some libraries (`parquet`, `avro`, etc) they use may be of versions incompatible with those provided in Spark runtime classpath. |
There was a problem hiding this comment.
| * When packaging user projects, keep only `iceberg-spark-runtime` in the uberjar. All other functional modules such as `iceberg-core` or `iceberg-parquet` (and their transitive dependencies) should be excluded from uberjar, because some libraries (`parquet`, `avro`, etc) they use may be of versions incompatible with those provided in Spark runtime classpath. | |
| * When packaging user projects in an uberjar, only the `iceberg-spark-runtime` jar and potentially one of the storage-specific bundles, such as `iceberg-aws-bundle` or `iceberg-gcp-bundle` are needed. No other Iceberg modules, such as `iceberg-core` or `iceberg-parquet` (and their transitive dependencies) are needed, as this will lead to dependency mismatches/conflicts on the classpath. |
There was a problem hiding this comment.
we should probably add the same wording for iceberg-flink below
There was a problem hiding this comment.
Thanks for the reply. I'd like to use less strong wording for the last phrase "they may lead to dependency mismatches/conflicts on runtime classpath" because with previous versions of iceberg (up to 1.8) I did not encounter such issues with my crooked setup.
|
We actually have guidance at https://iceberg.apache.org/multi-engine-support/#runtime-jar. The README is not up-to-date. Shall we link to docs site in README like https://github.com/apache/iceberg-python/blob/main/README.md? |
|
thanks for pointing that out @manuzhang. @qinghui-xu can you just add a link to the respective section of the docs then? |
dddfc11 to
906145c
Compare
| Iceberg also has modules for adding Iceberg support to processing engines: | ||
|
|
||
| * `iceberg-spark` is an implementation of Spark's Datasource V2 API for Iceberg with submodules for each spark versions (use runtime jars for a shaded version) | ||
| * When packaging user projects in an uberjar, only the `iceberg-spark-runtime` jar and potentially one of the storage-specific bundles, such as `iceberg-aws-bundle` or `iceberg-gcp-bundle` are needed, as suggested by [the documentation](https://iceberg.apache.org/multi-engine-support/#runtime-jar). No other Iceberg modules, such as `iceberg-core` or `iceberg-parquet` (and their transitive dependencies) are needed, as they may lead to dependency version mismatches/conflicts on the runtime classpath. |
There was a problem hiding this comment.
I think we can remove this entire subsection and rather add a link to the line above which then links to the respective section: ... for each spark version (use [runtime jars](link goes here) for a shaded version)
And then we can do the same for iceberg-flink-runtime below
There was a problem hiding this comment.
I agree it seems a bit redundant as information. But I'd like to highlight that we should exclude other modules from runtime classpath (eg. uberjar). Maybe I should put this into the doc instead?
There was a problem hiding this comment.
Yes just add it to the linked doc and here please only add the link as otherwise we're duplicating information
There was a problem hiding this comment.
Updated the site/docs and add a pointer in the README.
Sometimes iceberg may use verisons of parquet, avro, or other libs that are incompatible with what's deployed in Spark runtime. Add a reminder in site docs to highlight such issues. This closes apache#14232
906145c to
8df5a24
Compare
| When using Iceberg with these engines, the runtime jar is the only addition to the classpath needed in addition to vendor dependencies. | ||
| For example, to use Iceberg with Spark 3.5 and AWS integrations, `iceberg-spark-runtime-3.5_2.12` and AWS SDK dependencies are needed for the Spark installation. | ||
|
|
||
| > ℹ️ It's important to make sure that only the runtime jars (plus storage specific bundles if needed, eg. `iceberg-aws-bundle` or `iceberg-gcp-bundle`) are included in the runtime classpath. |
There was a problem hiding this comment.
how does this page render? Can you provide a screenshot when running the site locally?

Sometimes iceberg may use verisons of parquet, avro, or other libs that are incompatible with what's deployed in engine runtime. Users should package only iceberg-spark-runtime (it provides shaded transitive deps) into the uberjar to avoid such issue.
Highlight this in the site docs.
Close #14232