Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bsd2 license #73

Closed
wants to merge 8 commits into from
Closed

Bsd2 license #73

wants to merge 8 commits into from

Conversation

clalancette
Copy link
Contributor

@clalancette clalancette commented Apr 26, 2017

This PR adds support for the BSD license into ament_lint. This is necessary for turning on all linting in sensor_msgs, since some of that code is under the BSD 2 license.

fixes #72

CI:
windows: Build Status
linux: Build Status
linux aarch64: Build Status
osx: Build Status

Signed-off-by: Chris Lalancette <clalancette@osrfoundation.org>
This is what is recommended by the BSD license.

Signed-off-by: Chris Lalancette <clalancette@osrfoundation.org>
It is only ever called there, and operates on member variables
in that subclass, so it should be located there.

Signed-off-by: Chris Lalancette <clalancette@osrfoundation.org>
Signed-off-by: Chris Lalancette <clalancette@osrfoundation.org>
@clalancette clalancette added in progress Actively being worked on (Kanban column) in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Apr 26, 2017
@clalancette
Copy link
Contributor Author

In terms of review, this is probably easiest to review patch-by-patch, rather than the combined diff (particularly for the regex changes).

@@ -166,7 +166,7 @@ def search_copyright_information(content):
year = '\d{4}'
year_range = '%s-%s' % (year, year)
year_or_year_range = '(?:%s|%s)' % (year, year_range)
pattern = '^[^\n\r]?\s*Copyright\s+(%s(?:,\s*%s)*)\s+([^\n\r]+)$' % \
pattern = '^[^\n\r]?\s*Copyright\s+(%s(?:,\s*%s)*),?\s+([^\n\r]+)$' % \
Copy link
Contributor

@dirk-thomas dirk-thomas Apr 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please give an example what wasn't accepted before and is accepted with this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Previous to this regex change, this was accepted:

Copyright 2017 Open Source Robotics Foundation, Inc.

But this wasn't:

Copyright 2017, Open Source Robotics Foundation, Inc.

(note the comma after the year). It is fairly minor, but I think it makes this regex a little more forgiving of the format.

@mikaelarguedas
Copy link
Contributor

Do these CI jobs test any code that has a BSD licence ? e.g https://github.com/ros2/common_interfaces/blob/master/sensor_msgs/include/sensor_msgs/distortion_models.hpp

@@ -166,7 +166,7 @@ def search_copyright_information(content):
year = '\d{4}'
year_range = '%s-%s' % (year, year)
year_or_year_range = '(?:%s|%s)' % (year, year_range)
pattern = '^[^\n\r]?\s*Copyright\s+(%s(?:,\s*%s)*),?\s+([^\n\r]+)$' % \
pattern = '^[^\n\r]?\s*Copyright(?:\s+\(c\))?\s+(%s(?:,\s*%s)*),?\s+([^\n\r]+)$' % \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to support this case or could we update the code using that.

Copy link
Contributor Author

@clalancette clalancette Apr 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this regex change is to allow a (c) after the word "Copyright". While we could change the code that has this, the BSD license recommends this format:

Copyright (c) <year>, <rights-holder>

(https://en.wikipedia.org/wiki/BSD_licenses#3-clause_license_.28.22BSD_License_2.0.22.2C_.22Revised_BSD_License.22.2C_.22New_BSD_License.22.2C_or_.22Modified_BSD_License.22.29).
Thus, any BSD code we come across is likely to be in that format, so I thought it would be better to support that.

break
else:
self.license_identifier = UNKNOWN_IDENTIFIER

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function being removed and the logic being duplicated into multiple locations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I wasn't entirely sure what to do about this. We need to do something slightly different in this method in the subclass SourceDescriptor than we do in ContributingDescriptor and LicenseDescriptor. Thus, my options seemed to be:

  • Override it in SourceDescriptor
  • Add in some special-case flag for when we wanted to do additional substitutions
  • Copy the code

Given that it's really only 4 lines of code, I opted for the last one. However, I'm open to other suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what the override is needed for (since the Apache license has the same structure already) but I would certainly prefer the first or second option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to override identify_license() because the SourceDescriptor version of it wants to iterate over the self.copyrights list, which only exists in the SourceDescriptor subclass. I can restore the generic version to FileDescriptor and override it in SourceDescriptor, but it seems like overkill since it is such a simple function.

@clalancette
Copy link
Contributor Author

@mikaelarguedas Good point, no, that CI doesn't have any BSD code. I've been testing locally by adding the ament_lint_auto back into sensor_msgs; I can do a CI with that instead to show this actually working.

Then override it in SourceDescriptor, since we need to do
something different there.

Signed-off-by: Chris Lalancette <clalancette@osrfoundation.org>
@clalancette
Copy link
Contributor Author

New CI, which should test the BSD license in common_interfaces/sensor_msgs now:

windows: Build Status
linux: Build Status
linux aarch64: Build Status
osx: Build Status

@dirk-thomas
Copy link
Contributor

dirk-thomas commented Apr 26, 2017

The replacement of {copyright} in the source file template is not necessary. The Apache header also contains {copyright} and doesn't require the expansion.

Regarding the expansion of {company} I would rather suggest to use a "generic" snippet (see https://opensource.org/licenses/BSD-3-Clause) which refers to "Neither the name of the copyright holder nor the names of its contributors" rather than filling in the company name. (The template also lacks a trailing newline.)

The changes to accept (c) and the extra comma in the copyright line are fine.

All other changes are not necessary and I would rather avoid them if there is no good reason for them. E.g. the FileDescriptor doesn't have the member license_identifier because it never has a license. Only some of the subclasses have that attribute.

@clalancette
Copy link
Contributor Author

Regarding the expansion of {company} I would rather suggest to use a "generic" snippet (see https://opensource.org/licenses/BSD-3-Clause) which refers to "Neither the name of the copyright holder nor the names of its contributors" rather than filling in the company name. (The template also lacks a trailing newline.)

This is the key point, I think. If we are OK with converting all incoming BSD licenses to the generic version, then we can do away with most of the code changes here. Are we OK with doing that?

@tfoote
Copy link
Member

tfoote commented Apr 27, 2017

We shouldn't require changing license text contents to pass the linter beyond possibly stripping trailing whitespace. We can do that for our own copyright, but should not apply changes to anything for which we don't hold the copyright.

I would recommend that we allow basically arbitrary strings in the {company} section.

@dirk-thomas
Copy link
Contributor

I would recommend that we allow basically arbitrary strings in the {company} section.

Sounds good to me.

Signed-off-by: Chris Lalancette <clalancette@osrfoundation.org>
@clalancette
Copy link
Contributor Author

clalancette commented Apr 28, 2017

I just pushed a different approach to this. I changed the BSD2 template to be the generic version, and then I used difflib to compare a ratio between the incoming text and the template. If that is > 95%, we accept that as the license (I'm not wedded to that ratio, it just seemed somewhat reasonable).

I left the move of the identify_copyright() function, and the consolidation of self.license_identifier into FileDescriptor(). Here are my reasons:

  1. identify_copyright() operates on data only present in the SourceDescriptor subclass. Thus, it doesn't belong in a generic super class.
  2. self.license_identifier is set by all of its subclasses. This suggests that it is a property that they all share, and thus it makes sense to me to hoist it up to the super class.

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

for name, license in get_licenses().items():
if content is not None and getattr(license, license_part) == content:
template = getattr(license, license_part).replace('\n', ' ')
tomatch = content.replace('\n', ' ')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that's a bug. I should be using that in the difflib call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it's also the cause of the test failure in the CI)

@@ -44,6 +45,7 @@ def __init__(self, filetype, path):
self.path = path
self.exists = os.path.exists(path)
self.content = None
self.license_identifier = UNKNOWN_IDENTIFIER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment why I don't think the base class should have that attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I guess that is the point I disagree with :). While that base class itself doesn't have a license, all of its sub-classes do (they all call identify_license()). Thus it seems to be something common to all of them, and seems to make sense in the base class. It's similar in many respects to the identify_license() method, in that it is a convenient place to not duplicate the code, rather than it being somehow intrinsic to the base class.

template = getattr(license, license_part).replace('\n', ' ')
tomatch = content.replace('\n', ' ')
x = difflib.SequenceMatcher(a=template, b=content)
if x.ratio() > 0.95:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is "good enough". This could result in a wrong detection if he license is just similar enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, though a license that is 95% the same as this would probably be considered the same.

So the real question is whether we should just up this value (98%, say), or do you disagree with the approach?

CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see previous comment. Still lacking a newline at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Cutting out" the company name and checking the string before and after should be straight forward. So I am not sure what the diff approach is gaining us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, what this buys us is the ability to be a lot more forgiving with the license.

In the case of the BSD license, there are at least 3 different variations I've seen, just between looking at the common_interfaces code, the license you linked to at opensource.org, and the license on wikipedia. In the one in the code, it has the name of the company in it, and uses * for each point made. In the one at opensource.org, it says "copyright holder" instead of the company name, and uses numerals instead of stars for each point. In the one on wikipedia, it has the name of the company in it, and uses * for each point.

While I could write some custom parsing code to deal with all of those differences, it would necessarily be BSD-specific, which doesn't seem like an ideal path to go down. The difflib approach is more flexible, and should work for other licenses as well.

Signed-off-by: Chris Lalancette <clalancette@osrfoundation.org>
@tfoote
Copy link
Member

tfoote commented Apr 28, 2017

I don't think that this new approach is appropriate for this use case. Comparing the fraction of match is a great way to get yourself into a lot of trouble. If I add a "not" in any clause, it completely changes the meaning and compliance but does not significantly change the diff. If there's any change outside the standard parameters it's no longer the named license but a variant which needs to be interpreted as it's own entity.

@clalancette clalancette added in progress Actively being worked on (Kanban column) and removed in review Waiting for review (Kanban column) labels May 2, 2017
@clalancette
Copy link
Contributor Author

@tfoote I thought it might be interesting to try this approach, but I agree with your assessment. So my plan will be to split the templated license on the "{company}" string, then look for those substrings (in that order) against the incoming text to test against. Does that sound like a better approach?

This allows us to deal with the somewhat problematic
BSD2 license, where a company name can be embedded in the
middle.

Signed-off-by: Chris Lalancette <clalancette@osrfoundation.org>
@tfoote
Copy link
Member

tfoote commented May 2, 2017

That sounds like a reasonable approach.

@clalancette
Copy link
Contributor Author

clalancette commented May 2, 2017

CI for the new approach:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@clalancette clalancette added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels May 2, 2017
@mikaelarguedas
Copy link
Contributor

once this is merged ros2/common_interfaces@a0234f2 and ros2/common_interfaces#33 need to be reverted

@clalancette
Copy link
Contributor Author

Thanks for the review. I know we usually squash changes together, but in this one, I'd actually prefer for it to go in as 3 different commits: one to change the regex to allow the (c), one to change the regex to allow the comma, and one to add the BSD license. I think this will make following the regex a little easier later on. Does that seem OK?

@dirk-thomas
Copy link
Contributor

Sounds good to me.

@clalancette
Copy link
Contributor Author

I've now merged this (by hand, to split the commits up). Thanks!

@dirk-thomas
Copy link
Contributor

It would be better to rebase the commits on the branch of the PR. That way the actually merged changes are still visible from the PR. Currently the PR doesn't reference the actual changes done on master.

@clalancette
Copy link
Contributor Author

Oh, right, that would be better. Should I go back and do that (force pushing master to remove the commits), or is that advice for the future?

@dirk-thomas
Copy link
Contributor

In this case it I would say it is too late. But you could reference the actual commit hashes which have been applied to master so that future readers find them easily.

@clalancette
Copy link
Contributor Author

This PR ended up applying these commits:

3f65895
77061fd
767257f

mikaelarguedas added a commit to ros2/common_interfaces that referenced this pull request May 16, 2017
mikaelarguedas added a commit to ros2/common_interfaces that referenced this pull request May 16, 2017
* don't ignore copyright test

* make copyright comment comply with ament/ament_lint#73 implementation

* OSRF is the copyright holder
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add in support for the BSD license in the copyright linting
4 participants