Rework of binary distribution licenses#888
Conversation
All redistributed jars are mentioned in the LICENSE file. If the jar is ASLv2, it is simply listed under the ASLv2 text. If these ASLv2 jars have their own NOTICE files, then the relevant portions are added to our NOTICE file. All jar under license other than ASLv2 get their own section in LICENSE, which links to the full text of their license. Non-ASLv2 generally do not require an update to NOTICE. This change also includes a script, check-binary-license, which checks that the contents on the LICENSE and NOTICE actually matches what bundled in the distribution tarball.
|
I've only covered -all for now. I will do -server after. Also, check-binary-license will fail for now, as rocksdb isn't accounted for in the LICENSE file, and the jsr305 artifact version changed since 4.6.0. |
|
@ivankelly this is a huge work, thank you very much Did you take a look at maven license plugin ? Maybe the very interesting part of the maven plugin is that it can FAIL the build is a non compliant license is discovered Apart from this comment we need to review all the licenses manually as you did, it will take some time but I think it is worth |
|
@eolivelli that plugin looks interesting, but it doesn't do the notice files. Also, I guess it depends on the license stated in the pom, which can be wrong in some cases (jsr305 for example). Do you have any example of it being used with BSD style licenses (where the copyright notice needs to be preserved)? |
sijie
left a comment
There was a problem hiding this comment.
I don't think it is a good idea to pull in all the license and notice from all the dependencies. It will make maintenance and dependency upgrade complicated, and the license files will quickly get out of sync. I think we should look for better solutions than this.
|
also it would be good to create an issue for this and track the discussion there. |
This isn't the licenses and notices from all the dependencies, only from the dependencies whose license requires it.
It shouldn't. The check-binary-license script should flag if the license needs to be updated in CI. Updating the license is quick, I checked every dependency we have in one morning.
Again, the check-binary-license script should prevent this.
I don't think there is a better solution than doing it manually. A script can't parse a NOTICE file to see what to bubble up.
Will create an issue. |
|
Issue is #891 |
|
first, I think it is a good idea to have a check script to do the things. I am not against it. my main concern is about the approach how we maintain the source-of-truth of notice files and how this script does the verification. basically you maintain some notice/license files as the source-of-truth, and have a script verifying the LICENSE file against the maintained notice/license files. How do you guarantee the notice/license files can always be the correct? That introduces extra complexities than maintaining a LICENSE file. when I am saying "look for better solutions", I am not saying "doing it manually". There can always be better solutions. I am not sure how it is feasible, just thinking here, you can get artifactory (version, name) from pom files or assembled package, and fetch their licenses/notice from their website and verify the LICENSE against their notice. At the minimal, I don't think we should be maintaining any LICENSE and NOTICE files from dependencies, and we shouldn't use our maintained versions as the source-of-truth for verification. |
We assume that, for a given version, the notice and license files will not change, which is a safe assumption to make. I'm dubious as to whether it will be possible to pull in NOTICE files automatically. We would have to pull in every NOTICE file, which isn't really necessary. And then someone would have to check the contents of the pulled in NOTICE file to ensure everything is ok. If we make the pulling of licenses automatic, then they will only ever be checked at release time. At release time, all dependencies need to be checked, and when there's so much to check, people are likely to just give it a quick glance, and +1 it, without actually checking each dependency. I would prefer that the work in manually checking dependencies occurs as part of the development process, each time we update a dependency. At this time, there will be a smaller subset of the dependencies changing, so it can be reviewed more carefully. The submitter will be able to take their time with it, and the reviewer will be able to give each dependency their full attention. Once a license/notice has been updated for a version of the dependency, it shouldn't need to be looked at again (as licenses/notices don't change within a single version). |
|
@ivankelly so my question is if NOTICE is already included in LICENSE file, why do we need to maintain them twice in different places? can we just keep one copy in LICENSE file and the check script should use something else (e.g. fetching LICENSE and NOTICE from their sources) to verify and make sure LICENSE file is correct? |
|
Notice isn't included in the license file. Where are you seeing that? |
sorry I mean the copyrights included in the notice file. |
|
In the submitted change, there are no copyrights in the LICENSE file, though for all non-ASLv2 dependencies, we link to a LICENSE file which does include the copyright for those dependencies. In MIT and BSD the copyright notice is part of the license. ASLv2 has to be handled differently. The copyright is part of the notice file, so it has to be bubbled up there. In a lot of cases, the dependency is another ASF project, so it's already covered by the copyright at the top of the project NOTICE file. |
|
@ivankelly I think it is a bit confusing here back and forth. I will just comment on the pull request. |
There was a problem hiding this comment.
I think it is hard for me to comment at the pull request because this pull request includes so main texts, my browser barely move.
remaining comments here:
- /netty-3.10.1.Final/license: not all dependencies we used. and some of the depencencies have been recorded (commons-logging, log4j, slf4j, protobuf)
- same as netty-4.1.12-Final.
- protobuf license includes parts we don't use.
- scala: there are tons of dependencies we don't include, which we don't need to include their notices and license. for example, jline, jquery
dependencies of dependencies is already flatten in the binary distribution. we should only include notices and licenses that are actually included in the binary distribution. otherwise it is going to be hard to maintain NOTICE and LICENSE
| </includes> | ||
| </fileSet> | ||
| <fileSet> | ||
| <directory>../src/main/resources/deps</directory> |
There was a problem hiding this comment.
we should not maintain the 3rdparty license. we should only attach the needed notice and license that is required into NOTICE or LICENSE file. maintaining this directory makes things complicated. we should avoid it.
The check script should parse pom files or the assemble tarballs to see what dependencies are included, and fetch their corresponding notices and verify if the NOTICE file includes all dependencies or not and if their licenses are matched and notices are attached.
There was a problem hiding this comment.
I've moved all the NOTICE stuff into the NOTICE, so that doesn't link anywhere.
The ASF licensing recommendations (http://www.apache.org/dev/licensing-howto.html#permissive-deps) actually say to bundle the license file, instead of putting directly in the LICENSE file unless the license is very short. I actually prefer it like this.
https://github.com/ivankelly/bookkeeper/blob/license-rework/bookkeeper-dist/src/main/resources/LICENSE-all.bin.txt is easier to read than https://github.com/apache/bookkeeper/blob/master/bookkeeper-dist/src/main/resources/LICENSE-all.bin.txt.
LICENSE-all.bin.txt would get huge if we flattened them all. I'm particularly eager to keep the CDDL out of it, that license is huge.
The check script should parse pom files or the assemble tarballs to see what dependencies are included, and fetch their corresponding notices and
We should check on the final output (i.e. the assemble tarball) as that is what we distribute.
verify if the NOTICE file includes all dependencies or not and if their licenses are matched and notices are attached.
This verification is very hard to do in an automated fashion. How will a machine know that the protobuf license contains stuff that isn't relevant? How will it work out which part of the netty NOTICE needs to be pulled in and which doesn't?
It's not hard for a human, but the human will need guidelines which we should put in the wiki.
I've added a check to the script to check if the bundled license files are linked, and ensure all linked files exist.
| of the input file used when generating it. This code is not | ||
| standalone and requires a support library to be linked with it. This | ||
| support library is itself covered by the above license. | ||
| lib/io.netty-netty-3.10.1.Final.jar bundles some 3rd party dependencies. |
There was a problem hiding this comment.
if we are not use the 3rd party dependencies that netty bundled with, we shouldn't include their license and notices. because those dependencies might already be excluded.
There was a problem hiding this comment.
Removed those that aren't relevant. There's a bunch of the notices which refer to code which netty has copied into their own source repo, so I've retained those in the LICENSE file (they're all BSD/MIT).
There was a problem hiding this comment.
can you point me links on how those licenses should handle "copied code"?
My interpretation here is those "copied code" has already been compiled and redistributed as part of the jar, we should just retain netty's notice, but we don't need to their license, no?
There was a problem hiding this comment.
Retaining the notice is enough if the code is ASLv2. With BSD/MIT, the notice is the license, so the whole thing has to be redistributed.
| For details see deps/netty-3.19.1.Final/ | ||
| ------------------------------------------------------------------------------------ | ||
| For lib/javax.servlet-javax.servlet-api-3.1.0.jar | ||
| lib/io.netty-netty-all-4.1.12.Final.jar bundles some 3rd party dependencies |
Notice doesn't link to anything now, everything is directly there. check-binary-licenses now ensures that if something is linked from LICENSE, it exists, and if something exists, it is linked from LICENSE.
I've found chrome to suck at github. Moved to the new firefox a couple of weeks ago, and it works much better. |
| been derived from the works by JSR-166 EG, Doug Lea, and Jason T. Greene: | ||
|
|
||
| * LICENSE: | ||
| * deps/netty-3.10.1.Final/LICENSE.jsr166y.txt (Public Domain) |
There was a problem hiding this comment.
I am not sure we need to retain these LICENSE files. I would remove them and just reference back to netty.
Also I think netty is used by multiple ASF projects, it would be good if you can point me any ASF projects that do this. Otherwise it is going to be painful to maintain LICENSE of those "compiled code" in our dependencies.
There was a problem hiding this comment.
If we include the code, we should include the licenses. The public domain ones are not so important, but the BSD style ones need to be either here, or in a license file linked from here. Dependencies of dependencies should be treated the same as dependencies [1], which makes sense because if netty is distributing it, then we are distributing it.
[1] http://www.apache.org/dev/licensing-howto.html#deps-of-deps
There was a problem hiding this comment.
if a dependency of dependency is bundled, it makes sense to include its license. otherwise it doesn't make sense to me.
I am not sure about "code copied" into a netty and we bundle netty as a dependency. It is a very complicated case to me. I would suggest following what other ASF projects that use Netty do. I am pretty sure there are tons of projects depend on netty.
If you really feel strong about using this approach you proposed here, I would suggest reaching legal-discuss@ to get a clarification on how to update LICENSE and NOTICE when including a bundle that contains copied-and-modified code. I am not qualified for reviewing if this is the right or not.
From my personal view, I would follow what other ASF projects do, which they are more or less reviewed and accepted by the ASF. The approach here suddenly pulls in a lot of licenses from dependencies of dependencies, sounds too complicated to me and it is going to be very hard for the community to maintain and keep them in sync.
There was a problem hiding this comment.
Code copied is actually the simplest case. You are distributing binaries derived from from copied code, and if it is BSD/MIT licensed, then you must include the license. Since I have the hadoop NOTICE open, I'll use that as the example:
https://github.com/apache/hadoop/blob/trunk/NOTICE.txt
Netty is there at the top (though the BSD/MIT stuff should be in LICENSE I believe).
I've emailed legal-discuss@, you're cc'd.
Yes, this is a pain, but it's less of a pain than reimplementing all out dependencies ourselves. And it's not that complex. I'll write it up if this goes in, and I don't expect it to be more than 2 or 3 paragraphs.
sounds too complicated to me and it is going to be very hard for the community to maintain and keep them in sync.
The other option is stalled release processes if we leave it up to then to check licenses. That's what I'm trying to get away from with this.
There was a problem hiding this comment.
since you point out the hadoop's NOTICE file as the example, can you please tell me where does hadoop keep the those license files of netty's dependencies? (e.g. license/LICENSE.jsr166y.txt, license/LICENSE.base64.txt) I can't find those license files in hadoop repo or their binary package.
If I understand this correctly, hadoop only adds netty's notice file in its notice file, it doesn't copy the licenses of netty's dependencies into its repo and redistributed with its binary package. https://github.com/netty/netty/blob/4.1/NOTICE.txt The approach is exactly what I said here, we don't need to copy the licenses of netty's dependencies, we only need to attach the notices/licenses of the dependencies we bundled.
If you think hadoop is the right example to follow, let's follow what hadoop does (remove all the deps licenses, only include the notice/license of the bundles we distribute). If you think hadoop's notice/license also has problems, let's wait for responses from the legal team.
There was a problem hiding this comment.
pinged the list again.
There was a problem hiding this comment.
There was a problem hiding this comment.
@sijie legal's position on this is "if it's bundled, it needs the license"
There was a problem hiding this comment.
@ivankelly I left one comment in the email thread, will wait for a clarification of my understanding there.
| to work with Initial Developer and Contributors to distribute such | ||
| responsibility on an equitable basis. Nothing herein is intended or | ||
| shall be deemed to constitute any admission of liability. | ||
| lib/io.netty-netty-all-4.1.12.Final.jar contains a modified portion of and optionally depends on 'Protocol Buffers', Google's data |
There was a problem hiding this comment.
This is optionally dependency. why do we need this?
There was a problem hiding this comment.
"contains a modified portion of" implies that they are pulling code directly in.
There was a problem hiding this comment.
I think it follows into the above comment. We need to address the comment above if you feel strong on using this approach here.
This also a problem I can see here. We are unclear about what our dependencies do and what they changed. We should just include our dependencies' license and notice, we should not import their licenses, unless that is the dependency of dependency is explicitly included in our bundle.
Last, protobuf is already in the notice/license in our bundle. I am not sure why we still need a separate one.
There was a problem hiding this comment.
Again, this is code copied in, since it is modified. As it is copied in, we are distributing it, we should distribute the license for it.
| * https://github.com/google/protobuf | ||
|
|
||
| ------------------------------------------------------------------------------------ | ||
| This product bundles the JSR-305 reference implementation, which is available under |
There was a problem hiding this comment.
we should exclude this jar since it is compliation dependency not runtime dependency.
There was a problem hiding this comment.
Are you sure? I was under the same impression, but when I looked at the source, the annotations have runtime retention. If we can get away with removing it, I'm all for that.
|
Copied a few notes from the legal-discussion email thread there: (Justin Mclean)All bundle software licenses needs to be added to LICENSE [1] you can include ALv2 license software if you want. If an bundled ALv2 software includes a NOTICE file you need to look at that and move parts of it to your NOTICE file [3] but try to keep it as simple as needed and not add anything that s not required.
(Justin Mclean)Yep you need to include anything that is bundled. [1] If any of the course code from Webbit ends up in a release then yes the BSD license should be added to the release’s LICENSE file. [2] There is no need to have it mentioned in NOTICE as Hadoop have done. [3]
(Justin Mclean)It looks to me that the Hadoop NOTICE file contains far too much information, and while this is not a licensing error, it best to keep the NOTICE as short as possible [1] as it can effect downstream projects. Permissive licenses such as MIT and BSD should not be mentioned in NOTICE. [2] it also looks like too many copyright lines have been added there (a common mistake), only relocated copyright lines should be mentioned or lines from other ALv2 NOTICE files. [3] (Justin Mclean)Generally NOTICE doesn’t contain copyrights as they are part of the license text and should go in (or be pointed at) by LICENSE. Attributions also do not belong in a NOTICE file and also go in LICENSE. Only relocated headers [4], contents of upstream ALv2 NOTICE files [5], and required 3rd party notices that don’t belong in license [6] need to be in NOTICE. The third inclusion is uncommon and would apply to licenses that have clauses that state you must link to the original source code or state that changes has been made to the original source code need to be listed. An advertising clause (if it wasn’t Category X) would also be placed in NOTICE. I covered this in detail a few times in a talk I’ve given at ApacheCon. It's was recorded at ApacheCon Miami and may help people in this thread. [7]
|
|
@ivankelly after following up the discussions in legal thread, I have a few comments in this PR:
|
LICENSEs and NOTICEs can change between versions. Dependencies can be added or removed. They can pull in code from thirdparties into their own jar. Without the version, we go back to not having a clue what we are actually shipping. If you don't want to add a burden on contributors when bumping versions, then we shouldn't ship binary distributions. It's as simple as that.
Yes, they're not absolutely needed. However, they're there so that the tool can automatically check that everything we ship in the tarball mentioned in the LICENSE. Otherwise, this check is 100% manual. There are 73 jars there. That's a lot of room for omissions and mistakes. And people won't check it, they haven't been checking it so far. The URLs are their to make it easier to bump the versions. If a dependency version changes, it should be trivial to find the NOTICE/LICENSE of the newer version.
Server module deps are a subset of all module deps. If I move it to the all module it will mean having duplicate copies of some of the licenses in the source tree. I'll move if you're ok with the dupes. |
|
Where is that linked from? It would be good to get a sidebar on the community docs, as there's a lot in there now. This licensing stuff should probably go their too. |
|
regarding NOTICE: my comment is about removing the content below circe, because all of them are covered by LICENSE. see Justin's comment:
I don't think you should remove |
I added this link when rewrite the jenkins using dsl. but I probably forgot to add it to the sidebar. but it should be a testing guide under community (I will send a PR for it). licensing part can probably go to coding guide. |
|
Regarding circe, it's code that ASF does not own the copyright on, and it's not the copyright holder submitting it, so there's no relocated copyright. So it's not https://www.apache.org/legal/src-headers.html#headers, but https://www.apache.org/legal/src-headers.html#3party |
|
Moving to community/, there's no sidebar at all for community |
|
@ivankelly sidebar is for documentation, which contains the content varies between versions/releases. the content don't belong to release specific should go to other pages. |
|
yes, I'm saying it would be useful to have one for community docs as well. not a concern for this PR in any case. |
|
@ivankelly you are right with circe, it is not a relocated copyright. fine with that. |
|
@ivankelly can you check my comment? I think we don't need those copyrights in binary NOTICE files, based on Justin's comments. |
|
@sijie ah, yes, will remove |
|
oh, actually I misunderstood your comments, looking again |
|
I've removed some of the unneeded stuff. The Intel stuff is required, as it's relocated copyrights. For the other projects, I've taken their NOTICE files in complete form. |
|
|
||
| * LICENSE: Apache License 2.0 | ||
| * HOMEPAGE: https://github.com/trevorr/circe | ||
| lib/io.netty-netty-all-4.1.12.Final.jar contains a modified portion of 'Apache Harmony', an open source |
There was a problem hiding this comment.
where does this come from?
- I didn't see it appears in netty's notice: https://github.com/netty/netty/blob/netty-4.1.12.Final/NOTICE.txt#L61
- it doesn't belong to relocated copyright. relocated copyright only applies for src files. http://www.apache.org/legal/src-headers.html#headers
There was a problem hiding this comment.
From the harmony NOTICE.
http://svn.apache.org/viewvc/harmony/enhanced/java/trunk/NOTICE?revision=929253&view=markup
The bit they pull from harmony is parts of java.lang.String, which is from intel.
http://svn.apache.org/viewvc/harmony/enhanced/classlibadapter/trunk/modules/kernel/src/main/java/java/lang/String.java?view=markup#l88
netty should be bubbling this up.
There was a problem hiding this comment.
There was a problem hiding this comment.
@ivankelly netty is not an ASF project and it is not our responsibility for modifying a NOTICE file beyond projects. we are bundled netty 4.1.12, which should use the NOTICE in 4.1.12, if netty is going to fix that in future releases and we bump the version, we include the updated NOTICE.
There was a problem hiding this comment.
Apache harmony is an ASF project. netty doesn't include any NOTICE for harmony, and this is wrong. We ship code which belongs to harmony, so we should also ship it's notice. That it come in via netty is largely irrelevant. We can ask about this to legal-discuss@ if you like.
There was a problem hiding this comment.
please ask. we are not directly bundling harmony, we bundled netty. harmony is not the dependency of netty, it is the source code copied by netty. I don't think we need to include harmony NOTICE, because as what you said NOTICE can be changed between versions, you can't use the latest NOTICE in harmony as the NOTICE when netty introduced the modified code. so we have to stick to whatever NOTICE shipped by netty.
There was a problem hiding this comment.
I'll email them tomorrow
There was a problem hiding this comment.
I dug down deeper. The String class that netty copies from is originally from IBM,
https://issues.apache.org/jira/browse/HARMONY-14
http://mail-archives.apache.org/mod_mbox/harmony-dev/200511.mbox/browser
They've assigned ASF copyright in the initial contrib though, except for ICU stuff, which isn't part of String, so it turns out there's no mention needed in the notice. Will remove from this patch. Netty NOTICE is still wrong though, will ping them about it.
| The Apache Software Foundation (http://www.apache.org/). | ||
|
|
||
| ------------------------------------------------------------------------------------ | ||
| - lib/io.dropwizard.metrics-metrics-core-3.1.0.jar |
There was a problem hiding this comment.
It is okay and enough to keep versioning in LICENSE. However I would suggest removing versioning from NOTICE file: 1) versioning is not legally required in NOTICE. 2) NOTICE as short as possible as it will affect downstream projects. this is going to cause NOTICE updates everytime we bumped version.
There was a problem hiding this comment.
the versions are there to facilitate automated checking. if we remove the versions we can't ensure that what is in the notice file exists in the shipped tarball. It will need to be checked manually, so more chance of a -1 on a release candidate.
Also, notices can change between version, so the rationale for having them in the LICENSE holds for the NOTICE also.
There was a problem hiding this comment.
I know notices can change between version. my point is most of the dependencies are having versioning references in LICENSE, which automated checking will fail the a dependency change if the versioning is not updated. automated checking does its job on license/versioning and the contributors/reviewers should check both license/notice on reviewing it. automated checking on LICENSE is enough for capturing this.
The reason I would suggest removing versions from NOTICE, as the ASF polices suggest and the practices that people have, is to avoid unnecessary changes to NOTICE as possible, keep it as brief as possible, and avoid impacting downstream projects as little as possible.
There was a problem hiding this comment.
People will do the bare minimum to make their builds pass. If there are no versions in the NOTICE, they won't touch the notice, and we'll end up with -1 on release candidates.
This notice should have no effect on downstream projects. Downstream depend on our maven jar, not our binary tarballs, which is covered by the top level NOTICE.
|
@sijie could we get some movement on this? I've removed the harmony stuff in the end, as I tracked down the original harmony submission and copyright was assigned to ASF |
|
@ivankelly +1 |
|
merged. thanks @ivankelly for driving this and make this happen in 4.7.0 release. |
All redistributed jars are mentioned in the LICENSE file. If the jar
is ASLv2, it is simply listed under the ASLv2 text. If these ASLv2
jars have their own NOTICE files, then the relevant portions are added
to our NOTICE file.
All jar under license other than ASLv2 get their own section in
LICENSE, which links to the full text of their license. Non-ASLv2
generally do not require an update to NOTICE.
This change also includes a script, check-binary-license, which checks
that the contents on the LICENSE and NOTICE actually matches what
bundled in the distribution tarball.