Dataset relationships

bellisk edited this page Sep 12, 2014 · 6 revisions
Clone this wiki locally

CKAN allows you to define relationships between datasets (known in CKAN <= 1.4 as 'packages').

Table of Contents

Schema

The relationship can be on of the following (based on API documentation at http://docs.ckan.org/api/version2.html#model-formats ):

  • depends_on
  • dependency_of
  • derives_from
  • has_derivation
  • child_of
  • parent_of

Creating a relationship

Example (API version 2):

 curl -v http://test.ckan.net/api/rest/dataset/testing/derives_from/osm -d '{"subject":"testing", "object":"osm", "type":"derives_from", "comment":"test"}' -H "Authorization: YOUR-API-KEY-HERE"

Example (API version 3):

    curl -v http://test.ckan.net/api/action/package_relationship_create -d 
    '{ 
    "subject":"id-or-name-of-datasource",
    "object": "id-or-name-of-second-datasource",
    "type": "RELATIONSHIP-TYPE",
    "comment" : "Some comments about this relationship"
    }' -H "Authorization: YOUR-API-KEY-HERE"

Examples

Background

See original ticket: http://trac.ckan.org/ticket/253

Overview

Functionality to associate datasets for example via parent-child, inheriting or dependency relations. Not only does this help navigation between datasets in the web interface, but it also provides a mechanism to automatically pull dependencies when downloading a dataset, in a similar manner as we see in software package management.

Use Cases

1. There are 27 datasets in data.gov.uk to do with the Data4NR's Health Poverty Index. There is currently no common link between these, unless you search for 'HPI' (which also brings up House Price Index), or look under tag 'health' (which also has 600 other results). There should be a link on each HPI dataset page to navigate to the other 'sibling' HPI datasets, and to a 'root' dataset that has info about the set. This could be partially achieved using the existing tag or group concepts, but a more explicit/official/obvious marking of their relationship could be beneficial.

2. In ckan.net is freedict, a collection of translation dictionaries. You could make each dictionary a child dataset and use this system. But it would probably be better to make each dictionary a different resource in the same dataset. (There are other ideas to denote a resource as the data making up a 'portion' of dataset, or a 'whole' of the dataset, to help people downloading datasets in the software package style.)

3. OSM has had some Naptan data imported (bus stops), with special permission - i.e. a more liberal license. It would be useful to show this link on both OSM and Naptan datasets in CKAN: OSM 'derives from' Naptan with a comment about the license change. I'm not sure this is useful to an automatic download or use of these datasets, but may aid exploration on the CKAN website and understanding the provenance of the bus stop data on it.