Updating debian copyright file with cme

Dominique Dumont edited this page Jun 24, 2018 · 9 revisions

Updating debian copyright file with cme

In my opinion, creating and maintaining Debian copyright file is the most boring task required to create a Debian package. Unfortunately, this file is also one of the most important of a package: it specifies some legal aspect regarding the use of the software.

Debian copyright file is scrutinised by ftp masters gatekeepers when accepting a new package in Debian project: this file must accurately describe the copyright and licenses of all files of a source package, preferably using a specific syntax. (Kudos to the ftp-masters team: reading copyright files must be even more boring than writing them).

The content of the copyright file must reflect accurately the license of all files. This license is often specified in the comments of a source files. The licensecheck command is able to scan sources files and reports the copyright and licenses declared in there. But it does not summarise this information: a copyright line is generated for each file of a package.

Hence a lot of work is still required to get a proper debian/copyright file.

The command cme update dpkg-copyright aims to make this task much easier. When run, the debian/copyright file is created or updated:

  • copyright are coalesced when possible (i.e. 2001,2002,2003-2005 is changed to 2001-2005)
  • file entries same copyright owner and license are grouped, group of files may be represented with a wild card (*)
  • license text is filled with actual text for the most popular licenses

The command cme update dpkg-copyright relies on licensecheck to mine source files to extract copyright and license information. It often does a good job, but sometimes, the result needs to be improved:

  • some file types are unexpected
  • some files do not contain information
  • some files are not parsed correctly and the legal information contain garbage,

Let's see how to improve the results.

For what it's worth, the examples shown below are coming from moarvm package.

Getting started

Installation

First install packages cme and libconfig-model-dpkg-perl at least at version 2.074.

Sorting out skipped files warnings

Then, run scan-copyrights in your source package file. The command will probably issue a lot of messages like:

skipped file ./debian/README.source
skipped file ./lib/MAST/Ops.nqp
skipped file ./build/config.h.in
skipped file ./3rdparty/libatomic_ops/config.guess
skipped file ./3rdparty/dyncall/dyncall/dyncall_call_mips_n64_gas.s
skipped file ./3rdparty/dyncall/dyncall/dyncall_call_x64_generic_masm.asm
skipped file ./3rdparty/dyncall/test/call_suite/mk-cases.lua

These warnings are shown by licensecheck when a file type is not parsed. This may usual expected for files like config.guess or README.source, but is definitely a problem for lua or assembly files (s or asm suffixes). In the first case, we want to suppress the warning. In the latter case, we want to force licensecheck to parse the files.

This can be done with debian/copyright-scan-patterns.yml file. This files contains a list of suffixes (or patterns) to scan or to skip. This list of patterns is added to licensecheck default list. For instance moarvm package contains something like:

---
check:
  suffixes:
  - asm
  - lua
  - nqp
  - s
  - template
ignore:
  pattern:
  - /debian/
  - Makefile
  - MANIFEST
  - /config(.guess|ure|.h.in)
  suffixes:
  - generic
  - rst
  - jpg
  - txt
  - install
  - M
  - m4

This file forces licensecheck to parse asm, lua, nqp and others files. The files matching a pattern in the ignore section are silently skipped. You should edit a similar file until the list if skipped files shown by scan-copyrights is reduced to a reasonable size (or empty, depending of your definition of "reasonable").

You can edit this YAML file with your favourite editor.

You can also use the GUI provided by cme edit dpkg:

cme-dpkg-scan-cop-patterns

For more information, please see Dpkg::Copyright::Scanner man page, section "Selecting or ignoring files to scan".

Filling missing information

Once the skipped files are sorted out, you can re-run scan-copyrights command. The output may show a list of problematic files:

The following paths are missing information:
- 3rdparty/README.md: missing copyright and license
- 3rdparty/libuv/checksparse.sh: missing copyright
- 3rdparty/libuv/docs/src/conf.py: missing copyright and license
- 3rdparty/libuv/gyp_uv.py: missing copyright and license
- 3rdparty/libuv/src/unix/spinlock.h: missing copyright
- Configure.pl: missing copyright and license
- README.markdown: missing copyright and license
- docs/6model-parametric-extensions.markdown: missing copyright and license
- docs/README.md: missing copyright and license
- lib/MAST/Nodes.nqp: missing copyright and license
- lib/README.md: missing copyright and license
- ports/macports/README.md: missing copyright and license
- tools/ucd2c.pl: missing copyright and license
- tools/update_ops.p6: missing copyright and license
You may want to add a line in debian/fill.copyright.blanks.yml

Information may be missing because the source file does not contain information or because licensecheck failed to parse the file.

For each file, you'll have to read the file and use your best judgement to either ignore the file or provide missing information.

The missing information can be specified in debian/fill.copyright.blanks.yml. Each entry is a pattern, usually a directory name or a complete path, followed by missing information (or a special instruction to skip the file). For instance:

---
3rdparty/dynasm/:
  license: Expat
3rdparty/dyncall/:
  copyright: 2007-2015, Daniel Adler <dadler@uni-goettingen.de>
  license: ISC
3rdparty/libtommath/:
  copyright: Tom St Denis, tomstdenis@gmail.com
  license: dwtfyw-license
3rdparty/libtommath/bn_mp_div.c:
  skip: '1'
docs/moar.pod:
  license: Artistic-2.0
src/:
  comment: Almost no file in src has legal information. This entry provides default
    legal info for all files in there
  copyright: the MoarVM contributors. See the CREDITS file
  license: Artistic-2.0
src/gc/debug.h:
  skip: '1'

Note that these entries are handled as default values, they will always be superseded by information found in files (which may happen when the package is upgraded.

As before, you can edit edit this file with your favourite editor or with cme edit dpkg.

For more information, please see section "Filling the blanks" of Dpkg::Copyright::Scanner man page.

Trying update

Once you're satisfied with the information extracted from source file. it's time to actually merge this information with the content of the existing debian/copyright files (if any).

First, make sure that your current copyright file is archived in your VCS (be it git, svn or whatever).

Then cme update dpkg-copyright. Using the content of the source files, this command:

  • updates copyright and license information in existing entries
  • removes entries of removed files or directories
  • adds license text as needed (for known licenses)

Once this command is run, you must check the result and complete any missing information (e.g. license text of unknown licenses).

Fixing wrong entries

Despite the precautions taken above, some entries may still have wrong information. Either:

  • missing license text or comment
  • wrong copyright or license information

You may correct the first kind of error directly in the resulting copyright file. Any re-run of cme update dpkg-copyright will keep these information.

Correcting wrong copyright or license information is more problematic: cme considers that information found in files is more exact than old data found in debian/copyright. Thus, running cme update dpkg-copyright will clobber your manual updates.

You can instruct cme to alter or set specific copyright entries in "debian/fix.scanned.copyright" file. Each line of this file will be handled by Config::Model::Loader to modify copyright information.

For instance, if the extracted copyright contains:

Files: *
Copyright: 2014-2015, Adam Kennedy <adamk@cpan.org> "foobar
License: Artistic or GPL-1+

You may add this line in debian/fix.copyright file:

! Files:'*' Copyright=~s/\s*".*//

As before, you can edit edit this file with your favourite editor. cme edit dpkg will soon support this file as well.

For more information, please see section "Tweak results" of Config::Model::Dpkg::Copyright man page.

Bugs

In case of issues, please file a bug against libconfig-model-dpkg-perl

Common problems

Garbage in copyright information of a file

This is caused by lines like this in the source file:

function foo(c) { ...

The bug is in licensecheck but is very hard to fix since '(c)' has a legal value and is often used to specify copyright.

You should add an entry in fill.copyright.blanks.yml to ignore this file

Garbage in copyright information of a directory

A file that trips the licensecheck bug described above is contained in a directory that contains no copyright information. I.e. no files in there have a copyright statement. You should use licensecheck to identify the file tripping the bug and set fill.copyright.blanks.yml to ignore it.

For instance, with a wrong entry like :

Files: source/AYUGens/AY_libayemu/src/*
Copyright: V_Soft and Lion 17 static int Lion17_YM_table [32] = /  V_Soft and Lion 17 static int Lion17_AY_table [16] = /  Hacker KAY
License: LGPL-2+

Run:

$ licensecheck -r --copyright source/AYUGens/AY_libayemu/src/
source/AYUGens/AY_libayemu/src/ay8912.c: UNKNOWN
[Copyright:  V_Soft and Lion 17 static int Lion17_YM_table [32] = /  V_Soft and Lion 17 static int Lion17_AY_table [16] = /  Hacker KAY]

And add this in fill.copyright.blanks.yml:

source/AYUGens/AY_libayemu/src/:
  skip: '1'

More information

See: