This is a simple tool to ingest the raw contents of the crates.io
versions.license field, normalise and split multiply licensed crates into
their individual licenses, and generate JSON files with the individual licence
counts along with a file containing all the pairs of licences seen.
You'll need to generate a CSV in the right format with the raw licence data.
The easiest way to do this is to download a crates.io database
dump, import it into a
PostgreSQL database, and run this command in psql:
\copy (select versions.license, count(versions.id) as count from versions inner join default_versions ON default_versions.version_id = versions.id group by license order by count desc) to '/tmp/licences.csv' with (format csv);This gets all the licences for the current default version of each crate.
From there, you can run the tool with:
cargo run -- -o out /tmp/licences.csvAnd you'll get two JSON files in out.