# CKAN API

https://docs.ckan.org/en/2.8/api/index.html

### API Tools

##### Postman

Postman is a sandbox for REST APIs which can generate code snippets in several languages.

* Download: https://www.getpostman.com/
* CKAN API Postman collection: https://github.com/EMN-Data/ckan-api-postman

##### Python

**ckanapi** CLI

https://github.com/ckan/ckanapi

**ckanapi** python package

https://github.com/ckan/ckanapi#ckanapi-python-module

manually with requests

http://docs.python-requests.org/en/master/

`pip install requests`

In [1]:
import requests
import json

# CKAN Structure Overveiw

The EMN data hubs are built on a [CKAN framework](http://docs.ckan.org/en/ckan-2.7.3/user-guide.html).

The CKAN **web application** has a hierarchical layout. From top to bottom:

* ##### Projects
* ##### Datasets
* ##### Resources

The CKAN **API** has the same structure, but **datasets are _packages_**. And **project** is often synonymous with **_group_ or _organization_**.

The CKAN API documentation does not explicitly include Projects, but the documentation for _groups_ and _organizations_ apply to projects. 

# API Use Cases

* Get existing projects, datasets (_packages_), and resources
* Edit dataset or resource level metadata
* Add new resoureces to datasets
* Add new datasets to projects

# Building Requests

Each request follows the same format

`<datahub>/api/3/action/<action>`

> Examples
>
> `https://datahub.h2awsm.org/api/3/action/project_list`
> `https://datahub.h2awsm.org/api/3/action/resource_show`

### Actions

Actions also follow a similar format. The start the the entity and end with a verb. Entity are things like, **project**, **package**, **resource**, and **revision** (plus many others). Verbs are **list**, **show**, **create**, **update**, **patch**, and **delete**. Together you may get a **`project_list`** or a **`resource_show`**, or maybe you need to delete a dataset with **`package_delete`**.

The helper below will generate a URI for an **<action\>** that we will use in each request.

In [2]:
# emn_datahub = 'https://datahub.h2awsm.org'
emn_datahub = 'http://192.168.200.147:5000'

# Helper to build a URI for a given API action
action = lambda a: '{}/api/3/action/{}'.format(emn_datahub, a)

action('project_list')

'http://192.168.200.147:5000/api/3/action/project_list'

### Get your API token

Most API calls require your API token. In general, any time you need to create or modify content, an API token is required. To read or download **public** datasets, no API token is required.

1. Login to the data hub
2. Click on your user name in the top ribbon
3. Your API token is at the bottom of the left column

In [3]:
# Set your API token
api_token = '63cd53b0-2c24-4c4e-b16d-9aeee6c4da8c'

The CKAN API is RESTful. For the most part, it uses GET and POST. An authorization header is requred for both types of requests. POST requests require an additional header to tell the API to expect JSON in the body of the request.

In [4]:
# Include these headers in GET requests
get_headers = {
    'authorization': api_token
}

# Include these headers in POST requests
post_headers = {
    'authorization': api_token,
    'content-type': 'application/json;charset=utf-8'
}

# CKAN API responses

Every CKAN response will be a JSON object.

Example **success** response:

```json
{
    "success": true,
    "result": <array|object>,
    "help": ""
}
```

Example **error** response:

```json
{
    "success": false,
    "error": "",
    "help": ""
}
```

# Walkthrough

## Projects

Remember, projects are called _organizations_ or _groups_ in the CKAN API documentation.

### organization_list

https://docs.ckan.org/en/2.8/api/index.html#ckan.logic.action.get.organization_list

In [5]:
# https://datahub.h2awsm.org/api/3/action/project_list
response = requests.get(action('organization_list'), 
                        headers=get_headers)

projects = response.json()['result']

print(json.dumps(projects, indent=2))

[
  "api-demo",
  "h2"
]


### organization_list_for_user

Alternatively use `organization_list_for_user` to get a list of projects with permission to perform a given action.

https://docs.ckan.org/en/2.8/api/index.html#ckan.logic.action.get.organization_list_for_user

In [6]:
params = {'permission': 'create_dataset'}

# https://datahub.h2awsm.org/api/3/action/project_list_for_user
response = requests.get(action('organization_list_for_user'), 
                        headers=get_headers, 
                        params=params)

my_organizations = response.json()['result']

print(json.dumps(my_organizations, indent=2))

[
  {
    "image_display_url": "",
    "capacity": "admin",
    "description": "API Demo organization.",
    "title": "API Demo",
    "created": "2021-07-20T14:27:16.067016",
    "approval_status": "approved",
    "is_organization": true,
    "state": "active",
    "image_url": "",
    "display_name": "API Demo",
    "revision_id": "9c1ffefe-52b4-4f86-8676-c5029a8defc2",
    "type": "organization",
    "id": "88c42410-c064-4338-b25f-2b1d2a7b0701",
    "name": "api-demo"
  },
  {
    "image_display_url": "",
    "capacity": "admin",
    "description": "Two protons and two electrons.",
    "title": "H2",
    "created": "2021-01-19T10:15:32.174722",
    "approval_status": "approved",
    "is_organization": true,
    "state": "active",
    "image_url": "",
    "display_name": "H2",
    "revision_id": "792d9bab-4289-476b-b138-ca41801cefda",
    "type": "organization",
    "id": "5985cb3f-b380-49f3-af51-3d00a41d9c8a",
    "name": "h2"
  }
]


Find the id for the organization with the name "API Demo":

In [10]:
api_demo_org_name = 'API Demo'
organizations = [organization['id'] for organization in my_organizations if organization['title'] == api_demo_org_name]
if len(organizations) < 1:
    print(f'No organization named "{api_demo_org_name}" found. Please create an organization named "{api_demo_org_name}".')
organization_id = organizations[0]
print(organization_id)

88c42410-c064-4338-b25f-2b1d2a7b0701


### organization_show

https://docs.ckan.org/en/2.8/api/index.html#ckan.logic.action.get.organization_show

`organization_show` shows the specifcations of an orgnaization.

In [11]:
params = {
    'id': organization_id,
    'include_datasets': True,
    'include_users': False
}

# https://datahub.h2awsm.org/api/3/action/project_show
response = requests.get(action('organization_show'), 
                        headers=get_headers, 
                        params=params)

organization = response.json()['result']

print(json.dumps(organization, indent=2))

{
  "display_name": "API Demo",
  "description": "API Demo organization.",
  "image_display_url": "",
  "package_count": 1,
  "created": "2021-07-20T14:27:16.067016",
  "name": "api-demo",
  "is_organization": true,
  "state": "active",
  "extras": [],
  "image_url": "",
  "groups": [],
  "type": "organization",
  "title": "API Demo",
  "revision_id": "9c1ffefe-52b4-4f86-8676-c5029a8defc2",
  "packages": [
    {
      "license_title": null,
      "maintainer": null,
      "relationships_as_object": [],
      "private": true,
      "maintainer_email": "nick.wunder@nrel.gov",
      "num_tags": 0,
      "planets": [],
      "id": "ad0cdaad-94b3-4153-a807-d561f1b2a34f",
      "metadata_created": "2021-07-20T21:42:55.730479",
      "owner_org": "88c42410-c064-4338-b25f-2b1d2a7b0701",
      "metadata_modified": "2021-07-20T21:42:55.730492",
      "author": null,
      "author_email": null,
      "transneptunian_objects": [],
      "state": "active",
      "version": null,
      "license_id":

### Datasets

Historically, datasets were called "packages" in CKAN and the CKAN documentation and source code still refers to datasets as packages.

### package_create

If you run the following code block successfully once, you will need to delete the dataset it creates if you want to run the code block again.

https://docs.ckan.org/en/2.8/api/index.html#ckan.logic.action.create.package_create

You can specify a metadata schema when loading metadata along side a dataset if you have the `ckanext-metadata-service` plugin installed. To do this, you use the `dataset_schema_key` in the package creation. The value of this key will depend on the metadata schema avaialable to your installation of CKAN. When you specify this key, CKAN validates the dataset metadata on the backend.

In [13]:
# owner_org = project id
dataset_metadata = {
    'name': 'api-walkthrough',
    'owner_org': organization_id,
    'maintainer_email': 'nick.wunder@nrel.gov',
    'institution': 'National Renewable Energy Laboratory',
    'dataset_schema_key': 'metadata_planets'
}

# https://datahub.h2awsm.org/api/3/action/package_create
response = requests.post(action('package_create'), 
                         headers=post_headers, 
                         data=json.dumps(dataset_metadata))

if 'result' in response.json():
    new_dataset = response.json()['result']
    print(json.dumps(new_dataset, indent=2))
else:
    print(json.dumps(response.json()))
    print('An error ocurred while creating the dataset. Did you try to create the dataset twice?')

{
  "license_title": null,
  "maintainer": null,
  "relationships_as_object": [],
  "private": true,
  "maintainer_email": "nick.wunder@nrel.gov",
  "num_tags": 0,
  "planets": [],
  "id": "691b579c-89f5-4e3c-91b0-32058ffc39da",
  "metadata_created": "2021-07-29T15:05:38.066738",
  "metadata_modified": "2021-07-29T15:05:38.066753",
  "author": null,
  "author_email": null,
  "tags": [],
  "state": "active",
  "version": null,
  "license_id": null,
  "type": "dataset",
  "resources": [],
  "num_resources": 0,
  "transneptunian_objects": [],
  "groups": [],
  "creator_user_id": "24163dfa-e3e7-4106-bfb7-c77af3ac1853",
  "relationships_as_subject": [],
  "organization": {
    "description": "API Demo organization.",
    "created": "2021-07-20T14:27:16.067016",
    "title": "API Demo",
    "name": "api-demo",
    "is_organization": true,
    "state": "active",
    "image_url": "",
    "revision_id": "9c1ffefe-52b4-4f86-8676-c5029a8defc2",
    "type": "organization",
    "id": "88c42410-c064

### Resources

### resource_create

https://docs.ckan.org/en/2.8/api/index.html#ckan.logic.action.create.resource_create

#### Single file upload

Substitute the `file_name` with a path to a file on your local computer you want to make visible on the datahub.

In [14]:
file_name = '/Users/akey/Projects/ckan-api-demo/ckan-api-demo/data/charlotte_perkins_gilman_the_yellow_walpaper.txt'

resource_metadata = {
    'package_id': new_dataset['id'],
    'name': 'charlotte_perkins_gilman_the_yellow_walpaper.txt'
}

request = requests.post(action('resource_create'),
                        data=resource_metadata,
                        headers=get_headers,
                        files=[('upload', open(file_name, 'rb'))])

new_resource = request.json()['result']

print(json.dumps(new_resource, indent=2))

{
  "cache_last_updated": null,
  "cache_url": null,
  "mimetype_inner": null,
  "hash": "",
  "description": "",
  "format": "TXT",
  "url": "http://192.168.200.147:5000/dataset/691b579c-89f5-4e3c-91b0-32058ffc39da/resource/1a170705-24f4-40de-859e-e5b51e2ce1ad/download/charlotte_perkins_gilman_the_yellow_walpaper.txt",
  "created": "2021-07-29T16:18:15.939142",
  "state": "active",
  "package_id": "691b579c-89f5-4e3c-91b0-32058ffc39da",
  "last_modified": "2021-07-29T16:18:15.839125",
  "mimetype": "text/plain",
  "url_type": "upload",
  "position": 0,
  "revision_id": "086bd45b-1d06-4efc-b678-4017b05029b1",
  "size": 51185,
  "datastore_active": false,
  "id": "1a170705-24f4-40de-859e-e5b51e2ce1ad",
  "resource_type": null,
  "name": "charlotte_perkins_gilman_the_yellow_walpaper.txt"
}


#### Multiple file upload

In [15]:
def upload(file_name, resource_metadata):
    print('Uploading \n{}'.format(json.dumps(resource_metadata, indent=2)))
    r = requests.post(action('resource_create'),
                      data=resource_metadata,
                      headers=get_headers,
                      files=[('upload', open(file_name, 'rb'))])
    print('Status: {}\n'.format(r.status_code))
    return r.json()

In [16]:
path = '/Users/akey/Projects/ckan-api-demo/ckan-api-demo/data/books'

from os import listdir
file_names = [f for f in listdir(path) if not f.startswith('.')]
print(file_names)

['jane_austen_emma.txt', 'james_joyce_dubliners.txt', 'james_joyce_ulysses.txt', 'joseph_conrad_heart_of_darkness.txt', 'jane_austen_pride_and_prejudice.txt', 'henrik_ibsen_a_dolls_house.txt']


In [17]:
resources_metadata = []

for file_name in file_names:
    resource_metadata = {
        'package_id': new_dataset['id'],
        'name': file_name,
        'data_tool': 'multi-spectra'
    }
    resources_metadata.append(resource_metadata)

new_resources = []

for resource_metadata in resources_metadata:
    new_resource_obj = upload('{}/{}'.format(path, resource_metadata['name']), 
                          resource_metadata=resource_metadata) 
    new_resources.append(new_resource_obj)

Uploading 
{
  "package_id": "691b579c-89f5-4e3c-91b0-32058ffc39da",
  "name": "jane_austen_emma.txt",
  "data_tool": "multi-spectra"
}
Status: 200

Uploading 
{
  "package_id": "691b579c-89f5-4e3c-91b0-32058ffc39da",
  "name": "james_joyce_dubliners.txt",
  "data_tool": "multi-spectra"
}
Status: 200

Uploading 
{
  "package_id": "691b579c-89f5-4e3c-91b0-32058ffc39da",
  "name": "james_joyce_ulysses.txt",
  "data_tool": "multi-spectra"
}
Status: 200

Uploading 
{
  "package_id": "691b579c-89f5-4e3c-91b0-32058ffc39da",
  "name": "joseph_conrad_heart_of_darkness.txt",
  "data_tool": "multi-spectra"
}
Status: 200

Uploading 
{
  "package_id": "691b579c-89f5-4e3c-91b0-32058ffc39da",
  "name": "jane_austen_pride_and_prejudice.txt",
  "data_tool": "multi-spectra"
}
Status: 200

Uploading 
{
  "package_id": "691b579c-89f5-4e3c-91b0-32058ffc39da",
  "name": "henrik_ibsen_a_dolls_house.txt",
  "data_tool": "multi-spectra"
}
Status: 200



### resource_patch

https://docs.ckan.org/en/2.8/api/index.html#ckan.logic.action.patch.resource_patch

# <font color="red">CAUTION!</font> patch versus update

There are two ways to update existing content, **update** and **patch**. 

Update and patch are two ways to update existing content. You should nearly always use **patch** because it will only change the attributes you specify and preserve the values you do not specifically change. Update is destructive and will change the values you specify and delete the values you do not specify. The destructive nature of update makes patch the safer option.

In [18]:
resource_metadata = {
    'id': new_resource['id'],
    'description': '## The Yellow Walpaper\n### by Charlotte Perkins Gilman\n_A splendid short story._'
}

request = requests.post(action('resource_patch'),
                        data=json.dumps(resource_metadata),
                        headers=post_headers)

modified_resource = request.json()['result']

print(json.dumps(modified_resource, indent=2))

{
  "cache_last_updated": null,
  "cache_url": null,
  "mimetype_inner": null,
  "hash": "",
  "description": "## The Yellow Walpaper\n### by Charlotte Perkins Gilman\n_A splendid short story._",
  "format": "TXT",
  "url": "http://192.168.200.147:5000/dataset/691b579c-89f5-4e3c-91b0-32058ffc39da/resource/1a170705-24f4-40de-859e-e5b51e2ce1ad/download/charlotte_perkins_gilman_the_yellow_walpaper.txt",
  "created": "2021-07-29T16:18:15.939142",
  "state": "active",
  "package_id": "691b579c-89f5-4e3c-91b0-32058ffc39da",
  "last_modified": "2021-07-29T16:18:15.839125",
  "mimetype": "text/plain",
  "url_type": "upload",
  "position": 0,
  "revision_id": "96c39cb8-df33-4728-b624-6462a1ee92ff",
  "size": 51185,
  "datastore_active": false,
  "id": "1a170705-24f4-40de-859e-e5b51e2ce1ad",
  "resource_type": null,
  "name": "charlotte_perkins_gilman_the_yellow_walpaper.txt"
}
