In [16]:
from ckanapi import RemoteCKAN
import json

# CKAN API Python Demo Notebook
This notebook demonstrates how to use the [`ckanapi` library](https://github.com/ckan/ckanapi/) to connect to the JO-CREWSnet Data Hub and perform basic actions, such as uploading, downloading, and modifying datasets

To install, we assume you are using [`conda`](https://conda.io/) for environment management. Create a new environment
```bash
conda create -n jo_crewsnet_data_hub
```

Activate the environment
```bash
conda activate jo_crewsnet_data_hub
```

Install requirements
```bash
conda install --file requirements/conda_requirements -y
pip install -r requirements/pip_requirements
```

## API authentication

Most actions will require authentication via an API key. You can create an API key for your account by clicking on your profile button on the top right, and selecting the "API-Tokens" tab.

The credentials should be stored in a file called `credentials.json`. A template file `credentials.json-template` is provided. Simply rename `credentials.json-template` to `credentials.json` and replace the text `fill-in-with-your-api-key` with your API key.

In [17]:
with open('credentials.json') as credentials_file:
    api_key = json.load(credentials_file)['key']

The `ckanapi` package allows you to call the CKAN API, which is documented here: https://docs.ckan.org/en/latest/api/index.html#action-api-reference

The usage pattern is that you create a `RemoteCKAN` connector object, and you use the `.action` attribute to call the API functions. The syntax of the function calls after the `.action` attibute in python should match the function name after the last `.` in the API reference.

In [18]:
# connector for handling API calls
hub = RemoteCKAN('https://data.jo-crewsnet.org', apikey=api_key)

In [19]:
# list organizations (calls https://docs.ckan.org/en/latest/api/index.html#ckan.logic.action.get.organization_list)
hub.action.organization_list()

['mit-lincoln-lab', 'test1']

# Creating a dataset

We will create a demo dataset to show how to use the API.

To first create a dataset, use the [`hub.action.package_create()` function](https://docs.ckan.org/en/latest/api/index.html#module-ckan.logic.action.create). Datasets are groups of files, called resources, that are all related. Resources can include the data itself in multiple file formats, supporting documentation, such as data dictionaries, licenses, and usage guides.

In [21]:
output = hub.action.package_create(
            name='demo-dataset',
            private=True,
            author='ckanapi demo',
            author_email='ckanapi@example.org',
            license_id='cc-by',
            notes='This is a plain text description of this demo dataset',
            version='1.0',
            extras=[{'key':'arbitrary_key', 'value':'arbitrary_value'}],
            owner_org='test1')
output

{'author': 'ckanapi demo',
 'author_email': 'ckanapi@example.org',
 'creator_user_id': 'c2cd2a09-1969-4fc2-a713-ee0b4eea3995',
 'id': '4e918926-0895-4be1-ae66-4a518c8b26fd',
 'isopen': True,
 'license_id': 'cc-by',
 'license_title': 'Creative Commons Attribution',
 'license_url': 'http://www.opendefinition.org/licenses/cc-by',
 'maintainer': None,
 'maintainer_email': None,
 'metadata_created': '2024-02-01T13:39:07.763183',
 'metadata_modified': '2024-02-01T13:39:07.763190',
 'name': 'demo-dataset',
 'notes': 'This is a plain text description of this demo dataset',
 'num_resources': 0,
 'num_tags': 0,
 'organization': {'id': '437d3982-e642-482e-82cc-939632882c27',
  'name': 'test1',
  'title': 'test1',
  'type': 'organization',
  'description': 'Test organization for demonstration purposes',
  'image_url': '',
  'created': '2024-02-01T10:45:03.758895',
  'is_organization': True,
  'approval_status': 'approved',
  'state': 'active'},
 'owner_org': '437d3982-e642-482e-82cc-939632882c27',

You will be returned a dictionary with information about the newly created dataset. We have stored it in the variable `output`

## Adding resources to a dataset

To add resources to a dataset, we call the [`hub.action.resource_create`](https://docs.ckan.org/en/latest/api/index.html#ckan.logic.action.create.resource_create) function. Note we need to pass in the `package_id` to associate the resource file with a dataset. The ID of our dataset is available in the `output` variable as `output['id']`. The file to upload should be input as a python file object opened using the [`open()` function](https://docs.python.org/3/library/functions.html#open) in `'rb'` mode.

In [23]:
hub.action.resource_create(package_id = output['id'],
                           name='dummy data file upload',
                           title='Sample data for demo dataset',
                           upload=open('dummy_data.csv','rb'))

{'cache_last_updated': None,
 'cache_url': None,
 'created': '2024-02-01T13:46:09.768732',
 'datastore_active': False,
 'datastore_contains_all_records_of_source_file': False,
 'description': None,
 'format': 'CSV',
 'hash': '',
 'id': 'c470c2a9-e78c-4d6b-9f3e-ded0d691335f',
 'last_modified': '2024-02-01T13:46:09.748993',
 'metadata_modified': '2024-02-01T13:46:09.766117',
 'mimetype': 'text/csv',
 'mimetype_inner': None,
 'name': 'dummy data file-upload',
 'package_id': '4e918926-0895-4be1-ae66-4a518c8b26fd',
 'position': 1,
 'resource_type': None,
 'size': 63,
 'state': 'active',
 'title': 'Sample data for demo dataset',
 'url': 'https://data.jo-crewsnet.org/dataset/4e918926-0895-4be1-ae66-4a518c8b26fd/resource/c470c2a9-e78c-4d6b-9f3e-ded0d691335f/download/dummy_data.csv',
 'url_type': 'upload'}

## Adding resources later
If you want to add resources to an existing dataset, but you forgot the unique ID, you can get the package information by name using the [`hub.action.package_show`](https://docs.ckan.org/en/latest/api/index.html#ckan.logic.action.get.package_show) with the `name_or_id` argument.

In [27]:
package_info = hub.action.package_show(name_or_id='demo-dataset')
package_info

{'author': 'ckanapi demo',
 'author_email': 'ckanapi@example.org',
 'creator_user_id': 'c2cd2a09-1969-4fc2-a713-ee0b4eea3995',
 'id': '4e918926-0895-4be1-ae66-4a518c8b26fd',
 'isopen': True,
 'license_id': 'cc-by',
 'license_title': 'Creative Commons Attribution',
 'license_url': 'http://www.opendefinition.org/licenses/cc-by',
 'maintainer': None,
 'maintainer_email': None,
 'metadata_created': '2024-02-01T13:39:07.763183',
 'metadata_modified': '2024-02-01T13:46:11.406600',
 'name': 'demo-dataset',
 'notes': 'This is a plain text description of this demo dataset',
 'num_resources': 2,
 'num_tags': 0,
 'organization': {'id': '437d3982-e642-482e-82cc-939632882c27',
  'name': 'test1',
  'title': 'test1',
  'type': 'organization',
  'description': 'Test organization for demonstration purposes',
  'image_url': '',
  'created': '2024-02-01T10:45:03.758895',
  'is_organization': True,
  'approval_status': 'approved',
  'state': 'active'},
 'owner_org': '437d3982-e642-482e-82cc-939632882c27',

## Editing a dataset or resource

There are two mechanisms for editing an existing dataset or resource: `update` and `patch`. Update rewrites all fields with the fields provided, and any that are omitted will be deleted. Patch only edits the fields provided, and leaves the other fields as they were.

In [31]:
# update the package to make it public - note that we left out the version field this time, so it is removed from the package
# note also that the resources are also deleted with update
hub.action.package_update(id=package_info['id'],
            name='demo-dataset',
            private=False,
            author='ckanapi demo',
            author_email='ckanapi@example.org',
            license_id='cc-by',
            notes='This is a plain text description of this demo dataset, updated',
            extras=[{'key':'arbitrary_key', 'value':'arbitrary_value'}],
            owner_org='test1')

{'author': 'ckanapi demo',
 'author_email': 'ckanapi@example.org',
 'creator_user_id': 'c2cd2a09-1969-4fc2-a713-ee0b4eea3995',
 'id': '4e918926-0895-4be1-ae66-4a518c8b26fd',
 'isopen': True,
 'license_id': 'cc-by',
 'license_title': 'Creative Commons Attribution',
 'license_url': 'http://www.opendefinition.org/licenses/cc-by',
 'maintainer': None,
 'maintainer_email': None,
 'metadata_created': '2024-02-01T13:39:07.763183',
 'metadata_modified': '2024-02-01T14:29:58.984576',
 'name': 'demo-dataset',
 'notes': 'This is a plain text description of this demo dataset, updated',
 'num_resources': 0,
 'num_tags': 0,
 'organization': {'id': '437d3982-e642-482e-82cc-939632882c27',
  'name': 'test1',
  'title': 'test1',
  'type': 'organization',
  'description': 'Test organization for demonstration purposes',
  'image_url': '',
  'created': '2024-02-01T10:45:03.758895',
  'is_organization': True,
  'approval_status': 'approved',
  'state': 'active'},
 'owner_org': '437d3982-e642-482e-82cc-93963

In [32]:
hub.action.resource_create(package_id = package_info['id'],
                           name='dummy data file re-upload',
                           title='Sample data for demo dataset',
                           upload=open('dummy_data.csv','rb'))

{'cache_last_updated': None,
 'cache_url': None,
 'created': '2024-02-01T14:30:14.499903',
 'datastore_active': False,
 'datastore_contains_all_records_of_source_file': False,
 'description': None,
 'format': 'CSV',
 'hash': '',
 'id': '194b4a5b-a092-4ee3-b55f-dc8287eb877c',
 'last_modified': '2024-02-01T14:30:14.448480',
 'metadata_modified': '2024-02-01T14:30:14.489651',
 'mimetype': 'text/csv',
 'mimetype_inner': None,
 'name': 'dummy data file re-upload',
 'package_id': '4e918926-0895-4be1-ae66-4a518c8b26fd',
 'position': 0,
 'resource_type': None,
 'size': 63,
 'state': 'active',
 'title': 'Sample data for demo dataset',
 'url': 'https://data.jo-crewsnet.org/dataset/4e918926-0895-4be1-ae66-4a518c8b26fd/resource/194b4a5b-a092-4ee3-b55f-dc8287eb877c/download/dummy_data.csv',
 'url_type': 'upload'}

In [33]:
# let's add the version field back in with patch
hub.action.package_patch(id=package_info['id'],
                         version=1.1)

{'author': 'ckanapi demo',
 'author_email': 'ckanapi@example.org',
 'creator_user_id': 'c2cd2a09-1969-4fc2-a713-ee0b4eea3995',
 'id': '4e918926-0895-4be1-ae66-4a518c8b26fd',
 'isopen': True,
 'license_id': 'cc-by',
 'license_title': 'Creative Commons Attribution',
 'license_url': 'http://www.opendefinition.org/licenses/cc-by',
 'maintainer': None,
 'maintainer_email': None,
 'metadata_created': '2024-02-01T13:39:07.763183',
 'metadata_modified': '2024-02-01T14:30:15.741164',
 'name': 'demo-dataset',
 'notes': 'This is a plain text description of this demo dataset, updated',
 'num_resources': 1,
 'num_tags': 0,
 'organization': {'id': '437d3982-e642-482e-82cc-939632882c27',
  'name': 'test1',
  'title': 'test1',
  'type': 'organization',
  'description': 'Test organization for demonstration purposes',
  'image_url': '',
  'created': '2024-02-01T10:45:03.758895',
  'is_organization': True,
  'approval_status': 'approved',
  'state': 'active'},
 'owner_org': '437d3982-e642-482e-82cc-93963

## Deleting dataset
If you need to delete a dataset, you can use [`package_delete`](https://docs.ckan.org/en/latest/api/index.html#ckan.logic.action.delete.package_delete) to delete it

In [34]:
hub.action.package_delete(id=package_info['id'])