-
Notifications
You must be signed in to change notification settings - Fork 32
Add license check to automerge guidelines #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dbe4b95 to
64aa765
Compare
64aa765 to
ed68738
Compare
|
Hmm, just reread #261 (comment) and will need to change the approach here |
dd60211 to
01a8363
Compare
|
@oxinabox suggested using a |
|
the latest commit adds a check to see if there is a If the Project.toml has a license key, we do not try to find a license file ourselves; i.e. we take the Project as a source of truth. This can help if the autodetection goes wrong; the solution is to just put a license identifier in the Project.toml, and then there's no way to confuse it. It's helpful to have such an opt-out because the check is done on every version, so the same package could get repeatedly flagged by accident otherwise. That being said, I do think the autodetection should be pretty good here. Thanks to @oxinabox for helping decide on this logic to use for this mechanism + pointing to the license key in the first place. |
|
The license check requires access to the package code in order to look and see if there is a license there (and to parse the Project.toml to look for a license field). This functionality (looking at the package code) would also be needed for #313. My previous approach (i.e. in the 2nd to last commit) was to add in a In the latest commit, I instead added a This makes @GunnarFarneback, since you designed the new guidelines loop, would you mind taking a look and letting me know what you think? bors try |
|
I would do it like this:
|
|
Ah ok, that sounds good. So when we download the code, we'll use that field to get the tempdir to download it to, and when we use it, we just use that directory too. |
Alternatively doing only 1 and trying to do 2 in a later pull request, since the check in |
Yes, I would be comfortable with this as well. |
|
I am concerned that that might make for a sudden surge in manual work without a simple solution: as soon as we turn this on, new versions will need fixing or manual merging for every package that does not pass the guidelines, which includes ones with ok licenses like Torch.jl. I was hoping we could tell those packages with licenses but who don’t pass the check “just add this field and you won’t need manual merging for every version”, or some other fix that lets them go back to automatic mode. So I’d rather not start with the most strict option and add more options later; I’d rather start with a looser option and tighten it later. To that end, what about removing the project stuff (since that’s controversial), and to compensate, also relaxing the check in two ways?
I think these two requirements will cause most of the false positives, and when we have a better way to deal with false positives, we could add them back in (since I think they do have a legitimate function). |
|
Sure that sounds fine. I agree that long-term, we want to add both of those requirements back in. But we can do that after we have implemented the project override feature. |
Actually, I just realized that LicenseCheck.jl uses a hardcoded list of which SPDX identifiers are OSI-approved: https://github.com/ericphanson/LicenseCheck.jl/blob/main/src/OSI_LICENSES.jl So, even with the So that disadvantage is shared by both the @ericphanson Is it correct that the list of OSI-approved licenses is hard-coded in LicenseCheck.jl? |
|
Why don't we have a Pkg call tomorrow (2pm eastern time, IIRC) to discuss the I've posted in the #pkg-dev channel on Slack. |
|
@ericphanson Maybe hold off on making any changes to the PR for now? If we have a Pkg call tomorrow, and in the call it is decided that we go the |
Yep! The script to do so is in the repo as well so we can update it, but it definitely does need manual updating. Additionally, the list of licenses
Sure, I can probably attend that.
Sounds good, although it probably needs integration tests and some rebasing, and I might not have time to do that until this weekend anyway. |
|
Since we didn't end up having a package call, I'm thinking the next steps are to
Does that sound good? |
|
Yeah. |
|
I add some unit tests that try out the various code paths in What needs to be tested by integration tests then is that the bors try |
|
bors try |
|
Ok, I think this is good to go from my end! Bors can't squash, right? So I should do some git cleaning-up. Do the messages look OK? Too verbose? In case we go back to parsing the Project.toml and my poor git skills ruin the history, here's the code I had a couple commits back: Old code using Project.toml license fieldfunction parse_licenses_from_project(dir)
project_path = joinpath(dir, "Project.toml")
if !isfile(project_path)
project_path = joinpath(dir, "JuliaProject.toml")
end
isfile(project_path) || return String[]
project = TOML.tryparsefile(project_path)
project isa TOML.ParserError && return String[]
project_licenses = get(project, "license", String[])
# it should either be a string or a vector of strings;
# if it's a string, make it a vector of strings
if project_licenses isa AbstractString
return String[project_licenses]
elseif project_licenses isa AbstractVector{<:AbstractString}
return project_licenses
else
# what is it?
@error "`license` field has uncaught type" typeof(project_licenses)
return String[]
end
end
function check_license(dir)
project_licenses = parse_licenses_from_project(dir)
# only run `check_license` when we don't find
# a license in the Project.toml
if isempty(project_licenses)
file_license = LicenseCheck.find_license(dir)
else
file_license = nothing
end
return (; project_licenses=project_licenses, file_license=file_license)
end
function pkgdir_from_depot(depot_path::String, pkg::String)
pkgdirs = readdir(joinpath(depot_path, "packages", pkg); join=true)
always_assert(length(pkgdirs) == 1)
return pkgdirs[1]
end
function meets_version_has_osi_license(working_directory::String,
pkg::String,
version::VersionNumber;
registry_deps::Vector{<:AbstractString} = String[],
depot_path)
pkgdir = pkgdir_from_depot(depot_path, pkg)
if !isdir(pkgdir)
return false, "Could not check license because could not access package code. Perhaps the `Pkg.add` step failed earlier."
end
license_results = check_license(pkgdir)
project_licenses = license_results.project_licenses
# if there's a license in the Project.toml, then we only consider that.
if !isempty(project_licenses)
# possibly a list of licenses were passed
non_osi_licenses = filter(!LicenseCheck.is_osi_approved, lic.licenses)
if isempty(non_osi_licenses)
@info "Found license field in Project.toml with OSI-approved license identifier(s)" project_licenses
return true, ""
else
non_osi_licenses_str = join(non_osi_licenses, ", ")
@error "Found license field in Project.toml but with non-OSI-approved license identifier(s)" project_licenses non_osi_licenses_str
return false, "Non-OSI-approved license(s) $(non_osi_licenses_str) found in Project.toml"
end
end
# If we're here, then there was no `license` field in the Project.toml.
lic = license_results.file_license
percent_covered_threshold = 70
if LicenseCheck.is_osi_approved(lic) && lic.license_file_percent_covered > percent_covered_threshold
@info "Found OSI license(s)"
return true, string("")
# multiple failure modes:
# 1. No license file found at all
# of the contents of the file, so can't be sure what the license is.
elseif lic === nothing
@error "Did not find license file."
return false, "No license file found."
# 2. License file found with OSI licenses, but did not recognize more than 30%
elseif LicenseCheck.is_osi_approved(lic)
@error "Found OSI-approved license(s) in toplevel file $(lic.license_filename), but the recognized licenses cover less than $(percent_covered_threshold)% of the file." licenses_found=lic.licenses_found license_file_percent_covered=lic.license_file_percent_covered
return false, "Could not identify at least $(percent_covered_threshold)% of the contents of the license file."
# 3. License file found, but with one or more non-OSI licenses
else
non_osi_licenses = filter(!LicenseCheck.is_osi_approved, lic.licenses)
@error "Non-OSI-approved license(s) found in toplevel file $(lic.license_filename)." licenses_found=lic.licenses_found license_file_percent_covered=lic.license_file_percent_covered non_osi_licenses
non_osi_licenses_str = join(non_osi_licenses, ", ")
return false, "Non-OSI-approved license(s) $(non_osi_licenses_str) found in license file $(lic.license_filename)."
end
end |
DilumAluthge
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go. Yeah I would squash before you bors merge.
684ed9d to
8643a65
Compare
8643a65 to
fa87eb7
Compare
|
bors merge |
fixes #261 by using LicenseCheck.jl to wrap the Go library licensecheck in order to find the most likely license file and check what licenses it contains. LicenseCheck should be registered in a few hours; CI will fail until then.
License-file finding strategy: this is implemented in LicenseCheck.jl. The current strategy is: if there are more than 100 toplevel files, then check any that are in a long hardcoded list (COPYING, LICENSE, GPL.txt, etc). If there are fewer than 100, then check every plaintext file at toplevel. For each file we check,
licensecheckreturns a list of licenses it has found in that file (if any), and the percent of the file covered by those licenses it found (percent_covered). We take the file with the highestpercent_coveredand assume that is the license file of interest.License rules: what I've currently implemented is: for each version being registered (not new packages only), there must be a file of which at least 70% of the contents are covered by identified licenses, and all of those licenses must be OSI-approved (determined by taking the
licensecheck-obtained identifier, and seeing if it is' on a list of SDPX license identifiers which are OSI-approved, via LicenseCheck.jl).Lots of choices in all of this, and maybe these rules will lead to too many falsely-flagged packages, I'm not sure.
The license guideline is enabled by a keyword argument to
run, which is off by default so that this is not breaking. IMO we should just turn it on for General in it's config.update: Now also looks for a
licensekey in the Project.toml and takes that as the source of truth, if it exists.original message
Addresses #261. It doesn’t necessarily close the issue, since this PR just does the first step, which is check if a license file exists at all. Checking the license against a list of OSI compatible licenses via licensee or other tools is left for future work.
The first commit moves a TagBot utility up to be used as a general RegistryCI utility, and the second uses it for a check if a license file exists.
This isn't done yet because it still needs a test for the license check; I'm not totally sure how to do that because it needs an actual PR body to parse and a repo to go look at, and doesn't just care about the commits to the registry itself. I suspect the TagBot tests that use mocking might be what I should look at, but they are currently broken (#322).
This is breaking because I set the license check to default to true, but we could also just set it to default to false and turn it on specifically for General, in which case it would be not breaking. (I did make sure it could be turned off since private registries may not use licenses for all their packages.)
By the way, adding new guidelines to the pipeline is super easy thanks to @GunnarFarneback's recent work!