Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Connect: Include third party licenses and notices in distribution #10829

Merged
merged 6 commits into from
Oct 24, 2024

Conversation

bryanck
Copy link
Contributor

@bryanck bryanck commented Jul 31, 2024

This PR includes third party licenses and notices in the Kafka Connect sink distribution archives.

The third party licenses and notices were generated using the Gradle License Report plugin, along with a custom report renderer. This is the same plugin used to generate the license and notice files for the AWS, GCP, and Azure bundles. The plugin scans jar and pom file dependencies for license and notice information.

(We might want to consider adding this plugin to the build so we can regenerate the license and notice files more easily.)

@bryanck
Copy link
Contributor Author

bryanck commented Jul 31, 2024

@rdblue @danielcweeks in case you want to take a look, Let me know if you want a different format, I can easily regenerate the files


===============================================================================

The BracketFinder (package org.apache.commons.math3.optimization.univariate)
Copy link
Member

@ajantha-bhat ajantha-bhat Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any direct dependency on this.

I am not even sure this is a right way to generate notice and license.
Because it looks so much different from spark runtime for example.

Maybe we need to add how to generate notice and license for modules in contribute.md first (also need to mention when do we need to add it, like only for runtime jar modules) and then maybe follow that process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes licenses for all transitive dependencies, which I think is correct given those are packaged in the archive. This is the same tool used to generate the license files for the AWS, GCP, and Azure bundles. The Spark runtime predates those. I was thinking we could use this tool for the other runtimes as well.


--------------------------------------------------------------------------------

This project includes code from Kite, developed at Cloudera, Inc. with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is right. Looking at the Kite NOTICE, it has entries I don't see here. (I didn't check all dependencies. The first one appears to have an issue, so want to make sure this was generated correctly)

Copy link
Contributor Author

@bryanck bryanck Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just appended notices to the root project notice, which has the Kite notice. I could remove that if we don't need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and removed the Kite notice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to keep the Kite notice in if that's part of the root Iceberg notice. My comment was more that I thought it would have more info. @rdblue said we only need direct-dependency notices as we expect other projects to properly note their dependencies.

Is that configurable in the plugin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I added that back. sorry for the misunderstanding. We could exclude transitive dependencies, but I don't think that's exactly what we want. The sink has dependencies on other iceberg modules and I think we want to include the direct dependencies of those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the licenses and notices to only include direct dependencies of Iceberg modules, this reduced the number of entries quite a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note, Iceberg code uses libraries that aren't direct dependencies, e.g. parquet-hadoop which is transitively brought in by parquet-avro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a discussion, we decided to include the full transitive list to cover all cases, so I rolled back the change to only include direct dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the relevant link we discussed: https://infra.apache.org/licensing-howto.html#deps-of-deps

@RussellSpitzer
Copy link
Member

Reminder to @danielcweeks + @bryanck to finalize reviews on this PR

Copy link
Contributor

@danielcweeks danielcweeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM. Thanks @bryanck for pulling it all the transitive dependencies. It would great if we could get a gist or something to document how you did that for future reference.

@danielcweeks danielcweeks merged commit 12ac3ee into apache:main Oct 24, 2024
14 checks passed
@bryanck
Copy link
Contributor Author

bryanck commented Oct 24, 2024

Yes, definitely, I’ll follow up, hopefully with something to make it automated.

Group: javax.annotation Name: javax.annotation-api Version: 1.3.2
Project URL (from manifest): https://javaee.github.io/glassfish
Project URL (from POM): http://jcp.org/en/jsr/detail?id=250
License (from POM): CDDL + GPLv2 with classpath exception - https://github.com/javaee/javax.annotation/blob/master/LICENSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the special exception:

Special exceptions to the GNU GPL (e.g. GNU Classpath) unless otherwise permitted elsewhere on this page.

Group: javax.servlet Name: javax.servlet-api Version: 3.1.0
Project URL (from manifest): https://glassfish.dev.java.net
Project URL (from POM): http://servlet-spec.java.net
License (from POM): CDDL + GPLv2 with classpath exception - https://glassfish.dev.java.net/nonav/public/CDDL+GPL.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also the one with the classpath exception, so we're good there. Thanks for pinging me here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants