-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow REUSE best practices for licensing/copyright #8869
Conversation
Maybe a better approach is to simply generate a separate "DEP5" file at release time (without modifying existing files) and bundle that in the tarball? |
That's my preferred approach. |
The problem with this is that then the correctness of the licensing information is preserved in a script. It proved to be much more efficient and stable to put this information as close to the files as possible. Plus, one can assume that many re-users look at a clone of the repo or its Github UI and not always at the release tarball. Doing both is ideal, but depending on the build system it is quite tricky to reach 100% REUSE compliance for the built package. It's much simpler to fix this in the source code directly. |
We are always going to leave "the correctness" to a script anyway. How else are we going to verify and update?
Why so? Wouldn't compliance be tested in a test case and even in CI? Wouldn't then any breakage be as likely to be fixed no matter the exact way the information is stored?
My thinking is that the REUSE data is not for "users". It's there to be machine detected and parsed, so why not put that data away from the immediate human consumption. The human readable texts are the ones we already have.
What exactly does "100% REUSE compliance" mean? |
Worth noting that the REUSE team plans to deprecate the dep5 format in the future, see fsfe/reuse-tool#244 (comment) |
But AFAICT only to replace it with a different format, so if we can make the tool emit one format we can change it to emit another format. |
Perhaps I misunderstood your idea with the DEP5 file. I think it's not beneficial to the complete transparency of the repos licensing/copyright situation of this DEP5 file is only present in the tarball. It should be in the repo that serves as the basis for many packagers, users and individual/corporate re-users. The information is most accessible if it's in the file header. You already solved that quite well for a good percentage of the files. In those where there already is the comment header, adding one line with the License Identifier would be the most elegant solution. For the majority of missing files, especially test data, we could just add the
Also, there are some files that don't contain these headers already which someone may also want to use. REUSE would close this gap.
It means that for all in a repository, humans and machines can find copyright and licensensing information. This is in either of these three places:
(and in the future in another place that more flexible, as some here have noted) |
So how about adding |
Sure, that would be possible. So to clarify:
Do I understand correctly that you don't want commentable files, e.g. Markdown files in |
We also need to update the copyright year ranges in all files we update to please our copyright check script!
I'm worried that it will make them less readable (ie be user hostile) plus we would need to handle that for the website too if so, as we generate web pages from many of the markdown files and such headers would not go well there. The yaml files should probably have headers. |
Ah, alright, will check that out then.
Two lines of comments would suffice for them for the bare minimum. As someone who works with REUSE-compliant repos a lot, I personally don't find it too distracting. But again, just listing options ;) Regarding the websites, the markdown comment style would be HTML so that should™ not be a problem. Is there some way to test it? |
The PR is ready to be reviewed. I did this today:
Please have a look whether the files and the licenses make sense to you, thanks! Before merging, I can also clean up the reverted commits if wished. I tried to avoid force-pushes to keep my changes transparent for now. We should probably also discuss whether to add some documentation or so. |
This should be squashed into a single commit before going into master and the resuling commit-sha added to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs checksrc fixes:
./vtls/gskit.h:22:3: warning: Trailing whitespace (TRAILINGSPACE)
*
^
Please also rebase it (and force-push). "This branch cannot be rebased due to conflicts" |
But after you apply this patch, the script will warn on every file you've updated that does not also say |
Thank you for the useful feedback. Sorry for the trailing whitespaces, the script I was using was falsely producing these. I will also squash and put them to the blame ignore file.
I would fix these again shortly before the merge. Every change on master will most likely introduce new conflicts given the large number of edits I made here.
I am not sure I understand, sorry. I ran copyright.pl in my branch and fixed the newly introduced warnings caused by my edits so shouldn't we be fine then? In the changed files you will see in many files that not only the License Identifer has been added but also the copyright ranged changed, starting from the year of the first commit to 2022. Note here that some files had headers starting from much earlier, probably because they have just been copied from other files. |
09914b4
to
e076f04
Compare
Starting check jobs. |
I just rebased and squashed all commits and added the large commit to the From my perspective this would be good to go. Note that with any merged PR, the likelihood of a renewed need to resolve conflicts again increases. FYI, here's the output of
|
such a huge patch is hard to review!
Separately from this work: we should probably consider removing the examples with the two separate BSD licenses ( |
To simplify the license situation, as they were the only files in the source tree using these specific BSD-3 clause licenses. For an fopen style API, we recommend instead going https://github.com/curl/fcurl Ref: #8869
To simplify the license situation, as they were the only files in the source tree using these specific BSD-3 clause licenses. For an fopen style API, we recommend instead going https://github.com/curl/fcurl Ref: #8869 Closes #8949
I've emailed @jeroen to ask about relicensing docs/examples/crawler.c to the curl license. The two BSD licensed examples have now been removed. |
OK by me; you have my permission to change the license. |
With permission from Jeroen Ooms URL: #8869 (comment)
I know, but I think it's worth to tackle the issues in one run even it's a PITA (and trust me, I hate resolving the conflicts with each rebase as well ;) )
Yes, not in the resulting tarball. If you also wanted to have the released tar file being REUSE compliant (so that one could extract it and run
OK, got it. I will try to revert to the original dates.
Ah no, there was no warning. These files seem to be ignored here: Line 34 in df829a1
I can fix those manually still.
True. And as mentioned, each rebase costs quite some time to resolve conflicts. How about I fix this once after you and the team are fine with the general changes?
I rebased on master so fopen.c and rtsp.c are gone (I also deleted their license text files). I also relicensed crawler.c. The current output is:
|
With permission from Jeroen Ooms URL: #8869 (comment) Closes #8950
Add licensing and copyright information for all files in this repository. This either happens in the file itself as a comment header or in the file `.reuse/dep5`. This commit also adds a Github workflow to check pull requests and adapts copyright.pl to the changes.
Thanks for the commits outside this branch!
All that's left would be the tar.gz question (point 1 in my earlier comment). |
Alternatively, we could merge this PR and handle the release tarball in another step? It won't change much with the existing files I guess. |
Thanks! |
@mxmehl can you tell us how we can run this check locally? Isn't there a script/tool that does the check without having to run a docker thing? |
Many thanks for merging, and congratulations for successfully adopting the REUSE best practices!
Sure. There is the REUSE helper tool, a lightweight Python tool that allows you to run the check locally ( For curl however, I scripted most things as the year-range logic as well as the custom header are quite specific. We have some improvements regarding this in the milestone for the next release(s) though. |
This PR starts with adding some of the copyright/licensing information proposed on the mailing list, with the goal of making curl REUSE compliant and thereby following established community best practices and ISO standards.
The commits are very detailed and contain additional notes with the aim to a) individually showcase the various options to adding copyright/licensing info and b) to narrow down the possibilities how to treat existing copyright headers (specifically 8bf401facc9215737a21a54aa7506a2c5ead2954, db6f607ab8be08cdca50b3b7abc439bccd0ec744 and 79a923991f6dc2736a3d2cc258472bf48071a110).
As requested by @bagder, I used the year ranges everywhere (either keeping the existing ones or, for newly covered files, using the year of the first commit).
Commit 8ae81c2ad8dcd724e94116d107f5396b63668c59 also shows how the BSD-3-Clause licensed file is handled, and 3fea8f484406ec0c6c71cb6bf67851c87cea00cb how a continuous check in the future may be added.
Potential next steps
.reuse/dep5
.Current REUSE status
When running
reuse lint
over this branch, the result is as follows:Before that we were at: