Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependencies with license name provided as MIT can't get identified #137

Open
mervyn-mccreight opened this issue Nov 14, 2022 · 10 comments
Open

Comments

@mervyn-mccreight
Copy link

Here is the output for using e.g. version 3.1.0, I assume it will be the same for 3.2.0 as well:

io.github.g0dkar:qrcode-kotlin-jvm:3.1.0
 - ERROR: Unknown license URL 'https://github.com/g0dkar/qrcode-kotlin/blob/main/LICENSE' is NOT allowed

According to https://mvnrepository.com/artifact/io.github.g0dkar/qrcode-kotlin-jvm/3.1.0 the license name is published as MIT, so I assume it should get identfied as MIT when I'm looking at the fallback logic in licenses.kt.

I used the version 1.6.0 of this plugin to get the described results.

@mervyn-mccreight
Copy link
Author

The same goes for https://mvnrepository.com/artifact/org.amshove.kluent/kluent/1.68

org.amshove.kluent:kluent-common:1.68
 - ERROR: Unknown license URL 'https://github.com/MarkusAmshove/Kluent/blob/master/LICENSE' is NOT allowed

@mervyn-mccreight mervyn-mccreight changed the title License of io.github.g0dkar:qrcode-kotlin-jvm:3.1.0 can't get identified, but the published license is MIT Project with license provided as MIT can't get identified Nov 14, 2022
@mervyn-mccreight mervyn-mccreight changed the title Project with license provided as MIT can't get identified Dependencies with license name provided as MIT can't get identified Nov 14, 2022
@JakeWharton
Copy link
Member

Name-based lookup is only applied if the URL is not present. In both of these cases the URL supersedes that and is (obviously) not a canonical URL for the MIT license.

When we built the library we decided to favor URL since the overwhelming majority linked to an external canonical license copy. I would attempt to reach out to those libraries about switching to the SPDX link or the opensource.org link for the license. In my experience library authors usually understand the argument for pointing at these rather than their own copy.

@mervyn-mccreight
Copy link
Author

mervyn-mccreight commented Nov 14, 2022

Ah, I see.

IMO you should consider taking a look at the name even if it can't map the URL to a license, because referring to the maven documentation about the licenses part of the pom, nothing is really specified for the url-part, but for the name part it is at least recommended to use a SPDX-identifier.

What do you think?

@JakeWharton
Copy link
Member

That recommendation is very new and when sampling a few thousand artifacts almost none matched. I suspect the fact that MIT matches is more luck than intent. The overwhelming majority were "Apache 2.0" instead of the intended "Apache-2.0", for example.

@mervyn-mccreight
Copy link
Author

mervyn-mccreight commented Nov 15, 2022

I understand.

I did not want to criticize the way you came to the conclusion that looking for the license-url with priority makes sense.
I don't mean to replace that - what I think of is more like:

  • look if the URL can be mapped to a license
  • if that does not work (either url is not present, or url is not mappable to a license on the spdx license list) try to map the name interpreted as a possible spdx-id
  • if that also does not work, identify the license as unknown

Its more like an additional fallback to try to increase the detection rate.

@JakeWharton
Copy link
Member

I wouldn't say I'm 100% opposed to it, but I did give this a fair bit of thought when figuring out how the project was going to initially work.

My concern is always that the URL and the name may disagree on the license, and that the Maven POM specification offers no guidance on precedence. I wanted to ensure we never surprised someone so in most cases the code prefers to fail and have a human decide. This is why you can allow a specific URL if you trust it to be stable. Or you can allow a specific coordinate if it uses a custom URL that you do not trust to be stable.

Another thing that was considered early on is attempting to download unknown URLs and match their contents. It was ruled out because the number of custom URLs was very low and things like GitHub/GitLab links involve a ton of surrounding HTML.

@hfhbd
Copy link
Contributor

hfhbd commented Nov 16, 2022

There is a problem with the url though: the official urls are often just a template. For a valid license, you need to adopt the owner and year.
At least this was the feedback I got: pgjdbc/pgjdbc#2514 (comment)

@TheMrMilchmann
Copy link

There is a problem with the url though: the official urls are often just a template. For a valid license, you need to adopt the owner and year.

I would be interested in a version of this plugin that does match by identifier instead for this reason. If this requires more manual overrides, so be it. Without promising anything yet, would you accept a PR that makes the matching strategy configurable or do you consider this to be out-of-scope for this plugin, @JakeWharton?

@JakeWharton
Copy link
Member

I think it still has a lot of behavioral questions to work out. It's not like we can just flip a conditional here. Do we fall back to the URL if the name does not match? What happens if the name and URL match different licenses? I simply have never considered matching on the name and the entire design of the existing logic is around the URL, so I'm not keen to rush into something here.

@TheMrMilchmann
Copy link

First of all, sorry for going quiet for a while, but work and vacation kept me busy 😅.
I fully agree that this is not something that should be rushed, but I did actually have an idea in mind already:

I propose to expose an API that the build author may use to tune the strategy. Specifically, we can mimic the API for configuring metadata sources for repositories. This could look as follows (exact names and syntax are tbd):

licensee {
    licenseSources {       // 1
        licenseName()      // 2
        licenseUrl()       // 3
    }
}

We introduce a LicenseSources interface that may be used to configure the sources from which to infer license information. In the DSL, this is exposed under the licensee extension (1).
The interface defines several methods that may be used to enable certain sources of information. Initially, those would use be license name (2) and license URL (3), but additional sources could be added in the future without breaking configuration or introducing needless complexity.

licenseSources would be a regular function under the licensee extension and probably have the signature fun licenseSources(action: Action<LicenseSources>). If the function is not called, we resort to the current behavior as default.

As an added benefit, we could allow the build author to call licenseSources {} (without adding any sources) to indicate that all license mappings must be specified explicitly. However, I suggest deferring this decision and treating it as a configuration mistake for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants