Interpreting licenses of GBIF registered data
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
charts
code
data
LICENSE
Makefile
README.md
guidelines.md

README.md

GBIF data licenses

DOI

Rationale

Data publishers of the Global Biodiversity Information Facility (GBIF) apply a wide range of licenses to their datasets. This is problematic:

  • Users of the data need to investigate and understand those licenses before they can use the data.
  • Many licenses don't comply with the practices of GBIF and/or open data, limiting the use of the data.

Goal

We want to get an overview of the characteristics of the licenses used in all GBIF registered datasets.

Results

  1. Metadata of all datasets is obtained via the GBIF Registry API and written to datasets.csv. make data/generated/datasets.csv
  2. All unique licenses are written to licenses.csv. Rerunning the scripts will append newly found licenses to the file. make data/licenses.csv
  3. The characteristics of the licenses are manually interpreted using these guidelines.
  4. The annotated information is merged with the datasets†. make data/generated/datasets-annotated.csv
  5. These data are analyzed. make analysis
  6. The results are written to standard-license-data.csv and data.js.
  7. The latter is used as the basis for charts, which are displayed from the gh-pages branch.
  8. The results of the analysis were presented in this blog post.

† You can easily transform the UUID keys to working URLs as follows:

Requirements

These are the requirements for running the analysis:

  • Unix make
  • Python
  • requests
  • pandas
  • simplejson

These are the libraries used for the charts:

Disclaimer

This work (especially the manual interpretation of the licenses) is subject to error. We hope to mitigate this by opening up our workflow in this repository (such as our guidelines), but we disclaim any liability for all uses of this work. As new and updated datasets are published to GBIF all the time, our list of datasets (gets replaced with each analysis) and licenses (new licenses are added with each analysis) will be outdated. Verify the last commit timestamp for these files to see how recent they are.

License

LICENSE

Preferred citation

Want to use this work in a scholarly publication? You can cite this repository as:

Desmet P, Aelterman B (2013) Interpreting licenses of GBIF registered data. https://github.com/Datafable/gbif-data-licenses (accessed yyyy-mm-dd)