Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP: Add bundle jar for GCP-related dependencies #8231

Merged
merged 10 commits into from Aug 8, 2023

Conversation

bryanck
Copy link
Contributor

@bryanck bryanck commented Aug 5, 2023

This PR packages the Iceberg GCP library in a similar way to the Iceberg AWS library. The icberg-gcp project is packaged with the engine runtimes (Spark, Flink, Hive) without the GCP dependencies, similar to how iceberg-aws is included without the AWS dependencies. This has a negligible impact on the engine runtime size.

In the iceberg-gcp project, the GCP dependencies are changed to compile-only. This mirrors how the AWS dependencies are declared in iceberg-aws. While this could impact those using iceberg-gcp today, the thought is that there may not be many heavy users of it, as there was a critical bug in the GCP reader that was only recently fixed and is not in a release yet (#8071). Current users would need to add the GCP dependencies to their build.

Finally, a new iceberg-gcp-bundle project is added that packages the necessary GCP libraries and shades any packages that might conflict with engine libraries. This allows users to simply include two dependencies when using Iceberg on GCP, e.g. for Spark, iceberg-spark-runtime and iceberg-gcp-bundle. (AWS users similarly can specify the AWS bundle, but Google doesn't provide something similar.)

@github-actions github-actions bot added the flink label Aug 7, 2023
@github-actions github-actions bot added the hive label Aug 7, 2023

--------------------------------------------------------------------------------

This binary artifact includes Project Nessie with the following in its NOTICE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryanck I think we need to revisit the LICENSE and NOTICE files. If this is just a GCP bundle, it shouldn't include all of the Iceberg and related dependencies. Just the GCP notice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks. I updated these.

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just left some minor comments on the license file


Group: com.google.code.gson Name: gson Version: 2.10.1
Project URL: https://github.com/google/gson/gson
License: "Apache-2.0";link="https://www.apache.org/licenses/LICENSE-2.0.txt" (Not packaged)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove this line maybe?

gcp-bundle/LICENSE Show resolved Hide resolved
--------------------------------------------------------------------------------

Group: org.checkerframework Name: checker-qual Version: 3.33.0
License: MIT (Not packaged)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can probably be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I removed these lines.

@nastra nastra merged commit 5150549 into apache:master Aug 8, 2023
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants