Apache 2.0 boilerplate not recognized as part of LICENSE file #103

Closed
sschuberth opened this Issue Jun 13, 2016 · 18 comments

Comments

6 participants
@sschuberth
Collaborator

sschuberth commented Jun 13, 2016

For the LICENSE file

Copyright (c) 2016 HERE Europe B.V.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

licensee does not correctly recognize the license (although it correctly shows the attribution):

License file: LICENSE
Attribution: Copyright (c) 2016 HERE Europe B.V.
License: no license
Confidence: %
Method: NilClass
@benbalter

This comment has been minimized.

Show comment
Hide comment
@benbalter

benbalter Jun 13, 2016

Owner

See #82 (comment) for some further discussion, but in short, that notice is intended to be in source file headers, not the LICENSE file.

Owner

benbalter commented Jun 13, 2016

See #82 (comment) for some further discussion, but in short, that notice is intended to be in source file headers, not the LICENSE file.

@sschuberth

This comment has been minimized.

Show comment
Hide comment
@sschuberth

sschuberth Jun 13, 2016

Collaborator

Ah, I briefly read across that issue, but thought that it's only about the case where the license text is added to the README instead of the LICENSE file, until I read @hyandell's comment. I think it makes sense to keep this as a separate issue as I don't quite agree with the reasoning. The Apache license says

To apply the Apache License to your work, attach the following boilerplate notice [...].

Specifically, it does not say where to attach to. I could argue that I'm attaching the boilerplate to a newly created LICENSE file.

Collaborator

sschuberth commented Jun 13, 2016

Ah, I briefly read across that issue, but thought that it's only about the case where the license text is added to the README instead of the LICENSE file, until I read @hyandell's comment. I think it makes sense to keep this as a separate issue as I don't quite agree with the reasoning. The Apache license says

To apply the Apache License to your work, attach the following boilerplate notice [...].

Specifically, it does not say where to attach to. I could argue that I'm attaching the boilerplate to a newly created LICENSE file.

@sschuberth sschuberth changed the title from Apache 2.0 license not recognized to Apache 2.0 boilerplate not recognized as part of LICENSE file Jun 13, 2016

@benbalter

This comment has been minimized.

Show comment
Hide comment
@benbalter

benbalter Jun 13, 2016

Owner

/cc @mlinksva @bkeepers any strong feelings here?

Owner

benbalter commented Jun 13, 2016

/cc @mlinksva @bkeepers any strong feelings here?

@mlinksva

This comment has been minimized.

Show comment
Hide comment
@mlinksva

mlinksva Jun 13, 2016

Collaborator

@sschuberth the boilerplate notice itself says (emphasis added):

you may not use this file except in compliance with the License

I don't see how to read that as anything but an instruction to add the notice to files being licensed.

Just commented on #82 that it'd be good to emit a warning if the boilerplate found in LICENSE as well as README. They are indeed two separate cases, seems fine to me to keep open both issues (or close if we're never going to address, up to @benbalter).

I imagine addressing requires 1) having boilerplate available to recognize 2) including boilerplate among license texts to match but knowing it isn't a license, or doing a second pass to recognize license mistakes starting with this boilerplate.

Regarding (1), the boilerplate could be extracted into an optional field in choosealicense.com (where licensee sources license texts from), or it could be collected somewhere in this (licensee) project.

Collaborator

mlinksva commented Jun 13, 2016

@sschuberth the boilerplate notice itself says (emphasis added):

you may not use this file except in compliance with the License

I don't see how to read that as anything but an instruction to add the notice to files being licensed.

Just commented on #82 that it'd be good to emit a warning if the boilerplate found in LICENSE as well as README. They are indeed two separate cases, seems fine to me to keep open both issues (or close if we're never going to address, up to @benbalter).

I imagine addressing requires 1) having boilerplate available to recognize 2) including boilerplate among license texts to match but knowing it isn't a license, or doing a second pass to recognize license mistakes starting with this boilerplate.

Regarding (1), the boilerplate could be extracted into an optional field in choosealicense.com (where licensee sources license texts from), or it could be collected somewhere in this (licensee) project.

@sschuberth

This comment has been minimized.

Show comment
Hide comment
@sschuberth

sschuberth Jun 13, 2016

Collaborator

Very interesting. Basically this means the Apache 2.0 license cannot be applied to files that, for whatever reason, cannot be modified to add the boilerplate, right? Which is very unfortunate. It saves quite a bit of work to only add the license text to a top-level LICENSE file rather than to all current and future source files. Or is anyone of you aware of a way to properly apply the Apache 2.0 license without adding the boilerplate to each and every file?

Collaborator

sschuberth commented Jun 13, 2016

Very interesting. Basically this means the Apache 2.0 license cannot be applied to files that, for whatever reason, cannot be modified to add the boilerplate, right? Which is very unfortunate. It saves quite a bit of work to only add the license text to a top-level LICENSE file rather than to all current and future source files. Or is anyone of you aware of a way to properly apply the Apache 2.0 license without adding the boilerplate to each and every file?

@mlinksva

This comment has been minimized.

Show comment
Hide comment
@mlinksva

mlinksva Jun 13, 2016

Collaborator

Basically this means the Apache 2.0 license cannot be applied to files that, for whatever reason, cannot be modified to add the boilerplate, right?

Wrong, it just means it can't be applied using the example boilerplate. The license text makes it clear this is just an example:

"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).

With slightly different wording, the boilerplate or a simpler notice could be used in a README or other appropriate context, like the website footer of http://www.apache.org/:

Copyright © 2016 The Apache Software Foundation, Licensed under the Apache License, Version 2.0.

Collaborator

mlinksva commented Jun 13, 2016

Basically this means the Apache 2.0 license cannot be applied to files that, for whatever reason, cannot be modified to add the boilerplate, right?

Wrong, it just means it can't be applied using the example boilerplate. The license text makes it clear this is just an example:

"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).

With slightly different wording, the boilerplate or a simpler notice could be used in a README or other appropriate context, like the website footer of http://www.apache.org/:

Copyright © 2016 The Apache Software Foundation, Licensed under the Apache License, Version 2.0.

@bkeepers

This comment has been minimized.

Show comment
Hide comment
@bkeepers

bkeepers Jun 13, 2016

Contributor

There appears to be a fair number of instances where the boilerplate is used in the LICENSE.* file. While the intention may be that it be applied to individual files,

If licensing is about intention, then I would interpret putting the boilerplate into a file called LICENSE as intending to license the project under the Apache License. 2¢

Contributor

bkeepers commented Jun 13, 2016

There appears to be a fair number of instances where the boilerplate is used in the LICENSE.* file. While the intention may be that it be applied to individual files,

If licensing is about intention, then I would interpret putting the boilerplate into a file called LICENSE as intending to license the project under the Apache License. 2¢

@benbalter

This comment has been minimized.

Show comment
Hide comment
@benbalter

benbalter Jun 13, 2016

Owner

If licensing is about intention, then I would interpret putting the boilerplate into a file called LICENSE as intending to license the project under the Apache License.

Can I just add "Apache" to the file then? What about "APL"? Obviously more clearer is more better, but where do we want to draw the line as to "we're 90% sure this is their intent"?

Coming at it from another way, when this initiative started out, one of the problems was that the open source licensing community was incredibly fragmented, in terms of practice. There was no standard. While it's easy enough to detect the boilerplate from a technical perspective, not detecting it on GitHub can have a normalizing affect, just as standardizing on a LICENSE file (rather than a reference in the README) can.

Owner

benbalter commented Jun 13, 2016

If licensing is about intention, then I would interpret putting the boilerplate into a file called LICENSE as intending to license the project under the Apache License.

Can I just add "Apache" to the file then? What about "APL"? Obviously more clearer is more better, but where do we want to draw the line as to "we're 90% sure this is their intent"?

Coming at it from another way, when this initiative started out, one of the problems was that the open source licensing community was incredibly fragmented, in terms of practice. There was no standard. While it's easy enough to detect the boilerplate from a technical perspective, not detecting it on GitHub can have a normalizing affect, just as standardizing on a LICENSE file (rather than a reference in the README) can.

@sschuberth

This comment has been minimized.

Show comment
Hide comment
@sschuberth

sschuberth Jun 14, 2016

Collaborator

So I ran a few tests with licensee:

  1. The full Apache 2.0 license text (as added by GitHub's New File functionality) which includes the appendix with the boilerplate template is properly recognized (Confidence: 100%, Method: Licensee::Matchers::Exact).
  2. If the appendix including the boilerplate template is stripped from the above, the Apache 2.0 license still is properly detected (Confidence: 95.85253456221197%, Method: Licensee::Matchers::Dice).
  3. If the LICENSE file only contains the boilerplate text, no license if found (as already known, since that is what this ticket is about). The license is also not recognized if the boilerplate text is slightly modified to say "you may not use this project" or "you may not use this <project name>" instead of "you may not use this file".
  4. If the LICENSE file contains a simple statement like "Copyright © 2016 The Apache Software Foundation, Licensed under the Apache License, Version 2.0." as suggested by @mlinksva, licensee still does not recognize a license at all.

Personally, I'm ok with case 4. not being detected, as there's probably too little text to confidently detect a license.

However, I do believe that case 3. with text that is clearly based on the boilerplate from the appendix should be detected. Following @bkeepers' link I see e.g. text like "you may not use log4jdbc" being used, i.e. the boilerplate has been modified to contain the project name. IMO these cases should be properly detected. If the boilerplate was copied as-is and still contains "you may not use this file", the Apache 2.0 license should still be detected and a warning should be displayed saying that the license probably is not applied correctly.

Collaborator

sschuberth commented Jun 14, 2016

So I ran a few tests with licensee:

  1. The full Apache 2.0 license text (as added by GitHub's New File functionality) which includes the appendix with the boilerplate template is properly recognized (Confidence: 100%, Method: Licensee::Matchers::Exact).
  2. If the appendix including the boilerplate template is stripped from the above, the Apache 2.0 license still is properly detected (Confidence: 95.85253456221197%, Method: Licensee::Matchers::Dice).
  3. If the LICENSE file only contains the boilerplate text, no license if found (as already known, since that is what this ticket is about). The license is also not recognized if the boilerplate text is slightly modified to say "you may not use this project" or "you may not use this <project name>" instead of "you may not use this file".
  4. If the LICENSE file contains a simple statement like "Copyright © 2016 The Apache Software Foundation, Licensed under the Apache License, Version 2.0." as suggested by @mlinksva, licensee still does not recognize a license at all.

Personally, I'm ok with case 4. not being detected, as there's probably too little text to confidently detect a license.

However, I do believe that case 3. with text that is clearly based on the boilerplate from the appendix should be detected. Following @bkeepers' link I see e.g. text like "you may not use log4jdbc" being used, i.e. the boilerplate has been modified to contain the project name. IMO these cases should be properly detected. If the boilerplate was copied as-is and still contains "you may not use this file", the Apache 2.0 license should still be detected and a warning should be displayed saying that the license probably is not applied correctly.

@sschuberth

This comment has been minimized.

Show comment
Hide comment
@sschuberth

sschuberth Jun 14, 2016

Collaborator

On a related note, Apache 2.0 separates the copyright holder from the license text (in contract e.g. to MIT license). That is, if you put the full license text into a LICENSE file, you usually put the copyright attribution somewhere else, e.g. in the README.md file. licensee does not currently seem to be able to handle that, i.e. the license is properly detected in that case, but the attribution is missed.

Collaborator

sschuberth commented Jun 14, 2016

On a related note, Apache 2.0 separates the copyright holder from the license text (in contract e.g. to MIT license). That is, if you put the full license text into a LICENSE file, you usually put the copyright attribution somewhere else, e.g. in the README.md file. licensee does not currently seem to be able to handle that, i.e. the license is properly detected in that case, but the attribution is missed.

@bkeepers

This comment has been minimized.

Show comment
Hide comment
@bkeepers

bkeepers Jun 14, 2016

Contributor

Can I just add "Apache" to the file then? What about "APL"? Obviously more clearer is more better, but where do we want to draw the line as to "we're 90% sure this is their intent"?

I definitely would draw the line long before freeform text, but when someone is copying and pasting boilerplate out of a license, that seems intentional to me.

I realize this is a huge judgement call, so I'm ok keeping it more strict now and relaxing it later.

Contributor

bkeepers commented Jun 14, 2016

Can I just add "Apache" to the file then? What about "APL"? Obviously more clearer is more better, but where do we want to draw the line as to "we're 90% sure this is their intent"?

I definitely would draw the line long before freeform text, but when someone is copying and pasting boilerplate out of a license, that seems intentional to me.

I realize this is a huge judgement call, so I'm ok keeping it more strict now and relaxing it later.

@mlinksva

This comment has been minimized.

Show comment
Hide comment
@mlinksva

mlinksva Jun 14, 2016

Collaborator

@benbalter "There was no standard" is pretty uncharitable. Licensee uses the thing that has been standard for decades (full copy of license in LICENSE, COPYING, or similar) and ignores notices outside of the full copy of license in LICENSE, which are indeed less standard (in all of the this we're both using "standard" loosely). Now even within the scope of the LICENSE file I agree that licensee should encourage people to do the right thing, put the full text of the license there. And that's the right thing to do even if there are notices elsewhere.

But* I'm not surprised to see people wanting licensee to also look for notices elsewhere, since that's how many licenses expect to be implemented.

@bkeepers I'd interpret intent in the same way but tend to think licensee should be conservative as well as push for best practices. It currently does that kind of passive-aggressively, by just not recognizing eg Apache boilerplate in a LICENSE file. I think detecting and emitting a warning would further normalizing influence more and be more useful to users. But has to be implemented.

@sschuberth re 3 & 4, licensee simply doesn't recognize things that aren't (mostly, as you point out in 2) a full copy of the license in LICENSE, which I think is the right thing. It'd be nice if in 3 & 4 licensee could detect what the user is trying to do, and tell them how to do it right. 3 & 4 are valid/consistent/sensical notices for Apache License 2.0 works, but they don't belong in the LICENSE file. Regarding attribution line for licenses where notice is outside the LICENSE file, detecting/extracting/reporting that I understand why you'd want that; it'd be an enhancement, see * above.

Collaborator

mlinksva commented Jun 14, 2016

@benbalter "There was no standard" is pretty uncharitable. Licensee uses the thing that has been standard for decades (full copy of license in LICENSE, COPYING, or similar) and ignores notices outside of the full copy of license in LICENSE, which are indeed less standard (in all of the this we're both using "standard" loosely). Now even within the scope of the LICENSE file I agree that licensee should encourage people to do the right thing, put the full text of the license there. And that's the right thing to do even if there are notices elsewhere.

But* I'm not surprised to see people wanting licensee to also look for notices elsewhere, since that's how many licenses expect to be implemented.

@bkeepers I'd interpret intent in the same way but tend to think licensee should be conservative as well as push for best practices. It currently does that kind of passive-aggressively, by just not recognizing eg Apache boilerplate in a LICENSE file. I think detecting and emitting a warning would further normalizing influence more and be more useful to users. But has to be implemented.

@sschuberth re 3 & 4, licensee simply doesn't recognize things that aren't (mostly, as you point out in 2) a full copy of the license in LICENSE, which I think is the right thing. It'd be nice if in 3 & 4 licensee could detect what the user is trying to do, and tell them how to do it right. 3 & 4 are valid/consistent/sensical notices for Apache License 2.0 works, but they don't belong in the LICENSE file. Regarding attribution line for licenses where notice is outside the LICENSE file, detecting/extracting/reporting that I understand why you'd want that; it'd be an enhancement, see * above.

@sschuberth

This comment has been minimized.

Show comment
Hide comment
@sschuberth

sschuberth Jun 15, 2016

Collaborator

Can anyone give an estimation on how hard it would be to allow licensee to detect an attribution in a README.md file although the license comes from the LICENSE file? I.e. allow attribution and license to come from different files? Right now it seems to me to be pretty much hard-coded that attribution and license need to come from the same file.

Collaborator

sschuberth commented Jun 15, 2016

Can anyone give an estimation on how hard it would be to allow licensee to detect an attribution in a README.md file although the license comes from the LICENSE file? I.e. allow attribution and license to come from different files? Right now it seems to me to be pretty much hard-coded that attribution and license need to come from the same file.

@hyandell

This comment has been minimized.

Show comment
Hide comment
@hyandell

hyandell Aug 1, 2016

Contributor

For what it's worth - when hitting the same issue I ended up setting up a custom license called 'Apache Source Header' as I was coming across it often enough.

Contributor

hyandell commented Aug 1, 2016

For what it's worth - when hitting the same issue I ended up setting up a custom license called 'Apache Source Header' as I was coming across it often enough.

@benbalter

This comment has been minimized.

Show comment
Hide comment
@benbalter

benbalter Sep 29, 2016

Owner

Can anyone give an estimation on how hard it would be to allow licensee to detect an attribution in a README.md file although the license comes from the LICENSE file?

This is currently implemented. See https://github.com/benbalter/licensee/blob/master/docs/customizing.md#matching-project-readme-license-references.

Owner

benbalter commented Sep 29, 2016

Can anyone give an estimation on how hard it would be to allow licensee to detect an attribution in a README.md file although the license comes from the LICENSE file?

This is currently implemented. See https://github.com/benbalter/licensee/blob/master/docs/customizing.md#matching-project-readme-license-references.

@sschuberth

This comment has been minimized.

Show comment
Hide comment
@sschuberth

sschuberth Sep 29, 2016

Collaborator

Ah. I still was on 8.1.0 which is why the example code from your link did not work out of the box for me. It works starting with 8.2.0 (but I'm using version 8.6.0 now).

Anyway, I was now putting the boilerplate from my first post in this issue into a README file, used detect_readme: true, but still no license is detected. Is this expected?

Collaborator

sschuberth commented Sep 29, 2016

Ah. I still was on 8.1.0 which is why the example code from your link did not work out of the box for me. It works starting with 8.2.0 (but I'm using version 8.6.0 now).

Anyway, I was now putting the boilerplate from my first post in this issue into a README file, used detect_readme: true, but still no license is detected. Is this expected?

@alexjfisher alexjfisher referenced this issue in github/linguist Nov 18, 2016

Closed

github license detection for Apache-2 #3334

@bzz

This comment has been minimized.

Show comment
Hide comment
@bzz

bzz Dec 16, 2016

Hi, very quickly went though the thread and could not understand what was the consensus of the previous discussion here.. Could somebody summarize it please, if that is not very hard?

Just to bring a data-point - there are number of projects under Apache Software foundation like

  • apache/hadoop
  • apache/spark
  • apache/zeppelin

etc, where LICENSE files were changed, following ASF regulations on licenses for bundled dependencies, and so Github is not able to recognize them as Apache 2.0.

bzz commented Dec 16, 2016

Hi, very quickly went though the thread and could not understand what was the consensus of the previous discussion here.. Could somebody summarize it please, if that is not very hard?

Just to bring a data-point - there are number of projects under Apache Software foundation like

  • apache/hadoop
  • apache/spark
  • apache/zeppelin

etc, where LICENSE files were changed, following ASF regulations on licenses for bundled dependencies, and so Github is not able to recognize them as Apache 2.0.

@stale

This comment has been minimized.

Show comment
Hide comment
@stale

stale bot Apr 6, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale bot commented Apr 6, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment