# Harvesting WMS into CKAN
This notebook illustrates harvesting of a WMS endpoint into a CKAN instance.

## Context
The harvested WMS endpoint belongs to Landgate's Spatial Land Information Program (SLIP). The layers within are authored by partner agencies or Landgate. There are one or several different web service endpoints per WMS layer.

### Organisations
From a spreadsheet of agency references, names, and further information, CKAN organisations are initially created and subsequently used as owners of the respective harvested WMS layers.

### Topics
The WMS layers are organised by topics, which will be created both as CKAN groups and keywords. Harvested datasets will be allocated to releveant CKAN groups.

### Layer names
The WMS layer names contain the layer ID, consisting of agancy slug and layer reference, and the publishing date, and will be split up during harvesting.

### Additional resources
Additional web service end points, as well as a list of published PDFs with further information, are added as extra resources to the CKAN datasets from harvested WMS layers.

## CKAN credentials
Sensitive information and related configuration, such as CKAN URLs and credentials, are stored in a separate file.
To use this workbook on your own CKAN instance, write the following contents into a file `secret.py` in the same directory as this workbook:
```
CKAN = {
  "ca":{
    "url": "http://catalogue.alpha.data.wa.gov.au/",
    "key": "your-api-key" 
  },
  "cb":{
    "url": "http://catalogue.beta.data.wa.gov.au/",
    "key": "your-api-key" 
  }
}

SOURCES = {
  "NAME": {
    "proxy": "proxy_url",
    "url": "https://www2.landgate.wa.gov.au/ows/wmspublic"
  },
  ...
}

ARCGIS = {
  "SLIPFUTURE" : {
    "url": "http://services.slip.wa.gov.au/arcgis/rest/services",
    "folders": ["QC", ...]
 },
 ...
}
```
Insert your catalogue names, urls, and importantly, your write-permitted CKAN API keys.
Next we'll import the whole dictionary `CKAN`.

In [2]:
import ckanapi
from harvest_helpers import *
from secret import CKAN, SOURCES

## enable one of:
#ckan = ckanapi.RemoteCKAN(CKAN["ct"]["url"], apikey=CKAN["ct"]["key"])
#ckan = ckanapi.RemoteCKAN(CKAN["ca"]["url"], apikey=CKAN["ca"]["key"])
ckan = ckanapi.RemoteCKAN(CKAN["cb"]["url"], apikey=CKAN["cb"]["key"])

print("Using CKAN {0}".format(ckan.address))

Using CKAN http://catalogue.beta.data.wa.gov.au/


### OGC W*S endpoints

In [3]:
wmsP = WebMapService(SOURCES["wmspublic"]["proxy"])
wmsP_url = SOURCES["wmspublic"]["url"]

wmsCM = WebMapService(SOURCES["wmsCsMosaic"]["proxy"])
wmsCM_url = SOURCES["wmsCsMosaic"]["url"]

wmsCC = WebMapService(SOURCES["wmsCsCadastre"]["proxy"])
wmsCC_url = SOURCES["wmsCsCadastre"]["url"]

wfsP = WebFeatureService(SOURCES["wfspublic_4326"]["proxy"])
wfsP_url = SOURCES["wfspublic_4326"]["url"]

wfsCA = WebFeatureService(SOURCES["wfsCsAdmin_4283"]["proxy"])
wfsCA_url = SOURCES["wfsCsAdmin_4283"]["url"]

wfsCC = WebFeatureService(SOURCES["wfsCsCadastre_4283"]["proxy"])
wfsCC_url = SOURCES["wfsCsCadastre_4283"]["url"]

#wfsCT = WebFeatureService(SOURCES["wfsCsTopo_4283"]["proxy"])
#wfsCT_url = SOURCES["wfsCsTopo_4283"]["url"]

### Additional Lookups

In [4]:
pdfs = get_pdf_dict("data-dictionaries.csv")
org_dict = get_org_dict("organisations.csv")
group_dict = get_group_dict(wmsP)

[get_pdf_dict] Reading data-dictionaries.csv...
[get_pdf_dict] Done.
[get_org_dict] Reading organisations.csv...
[get_org_dict] Done.
[get_group_dict] Reading wms...
[get_group_dict] Done.


### Create Organisations and Groups
The next step will create or update CKAN organisations from `organisations.csv`, and CKAN groups from WMS topics.

In [5]:
orgs = upsert_orgs(org_dict, ckan, debug=False)
groups = upsert_groups(group_dict, ckan, debug=False)

[upsert_orgs] Refreshing orgs...
[upsert_org] Upserting organisation Department of Fire & Emergency Services, id dfes
[upsert_org]   Organisation exists, updating...
[upsert_org]   Updated Department of Fire & Emergency Services
[upsert_org] Upserting organisation Department of Education and Training, id det
[upsert_org]   Organisation exists, updating...
[upsert_org]   Updated Department of Education and Training
[upsert_org] Upserting organisation Public Transport Authority, id pta
[upsert_org]   Organisation exists, updating...
[upsert_org]   Updated Public Transport Authority
[upsert_org] Upserting organisation World Wildlife Fund for Nature, id wwf
[upsert_org]   Organisation exists, updating...
[upsert_org]   Updated World Wildlife Fund for Nature
[upsert_org] Upserting organisation Department of Parks and Wildlife, id dpaw
[upsert_org]   Organisation exists, updating...
[upsert_org]   Updated Department of Parks and Wildlife
[upsert_org] Upserting organisation Geoscience Austral

### Prepare data
The following step will prepare a dictionary of dataset metadata, ready to be inserted into CKAN. 
It parses the WMS endpoint and looks up dictionaries `organisations`, `groups`, and `pdf_dict`.

This step runs very quickly, as it only handles dictionaries of WMS layers, organisations and groups (both: name and id) and PDFs (name, id, url). There are no API calls to either CKAN or the WMS involved.

In [8]:
l_wmsP = get_layer_dict(wmsP, wmsP_url, ckan, orgs, groups, pdfs, res_format="WMS", debug=False)
l_wmsCC = get_layer_dict(wmsCC, wmsCC_url, ckan, orgs, groups, pdfs, res_format="WMS", debug=False)
l_wmsCM = get_layer_dict(wmsCM, wmsCM_url, ckan, orgs, groups, pdfs, res_format="WMS", debug=False)
l_wfsP = get_layer_dict(wfsP, wfsP_url, ckan, orgs, groups, pdfs, res_format="WFS", debug=False)
l_wfsCA = get_layer_dict(wfsCA, wfsCA_url, ckan, orgs, groups, pdfs, res_format="WFS", debug=False)
l_wfsCC = get_layer_dict(wfsCC, wfsCC_url, ckan, orgs, groups, pdfs, res_format="WFS", debug=False)

[wms_to_dict] No dataset name found, skipping
[wms_to_dict] No dataset name found, skipping


### Delete old datasets
Note: With great power comes great responsibility. Execute the next chunk with care and on your own risk.

In [None]:
# Delete all datasets with old SLIP layer id name slug
kill_list = [n for n in ckan.action.package_list() if re.match(r"(.)*-[0-9][0-9][0-9]$", n)]
#killed = [ckan.action.package_delete(id=n) for n in kill_list]
print("Killed {0} obsolete datasets".format(len(kill_list)))

### Update datasets in CKAN
First pass: add public WMS layer, overwrite metadata if dataset exists and drop any existing resources.

In [18]:
p_wmsP = upsert_datasets(l_wmsP, ckan, overwrite_metadata=True, drop_existing_resources=True)
print("{0} datasets created or updated from {1} Public WMS layers".format(len(p_wmsP), len(wmsP.contents)))

Refreshing harvested WMS layer datasets...
[upsert_dataset] Reading WMS layer intensive-land-use-zones
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were updated.
  [upsert_dataset]  Existing resources were replaced with new resources.
[upsert_dataset] Reading WMS layer geomorphic-wetlands-swan-coastal-plain
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were updated.
  [upsert_dataset]  Existing resources were replaced with new resources.
[upsert_dataset] Reading WMS layer forest-disease-risk-areas
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were updated.
  [upsert_dataset]  Existing resources were replaced with new resources.
[upsert_dataset] Reading WMS layer threatened-ecological-communities
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were updated.
  [upsert_dataset]  Existing resources were replaced with new resources.
[upsert_dataset] Reading WMS laye

Second pass: add public WFS, but retain metadata and resources of existing datasets. Repeat this mode for remaining sources.

In [9]:
p_wfs = upsert_datasets(l_wfsP, ckan, overwrite_metadata=False, drop_existing_resources=False)
print("{0} datasets created or updated from {1} public WFS layers".format(len(p_wfs), len(wfsP.contents)))

Refreshing harvested WMS layer datasets...
[upsert_dataset] Reading WMS layer swan-river-trust-development-control-area
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer geomorphic-wetlands-swan-coastal-plain
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer forest-blocks
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer native-title-determination
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert

In [19]:
p_wmsCC = upsert_datasets(l_wmsCC, ckan, overwrite_metadata=False, drop_existing_resources=False, debug=False)
print("{0} datasets created or updated from {1} Cadastre WMS layers".format(len(p_wmsCC), len(wmsCC.contents)))

Refreshing harvested WMS layer datasets...
[upsert_dataset] Reading WMS layer lodged-cadastre-polygons
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer lodged-cadastre-lines
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer lodged-cadastre-points
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer easements-polygons
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer easem

In [20]:
p_wfsCC = upsert_datasets(l_wfsCC, ckan, overwrite_metadata=False, drop_existing_resources=False)
print("{0} datasets created or updated from {1} Cadastre WFS layers".format(len(p_wfsCC), len(wfsCC.contents)))

Refreshing harvested WMS layer datasets...
[upsert_dataset] Reading WMS layer cadastre-no-attrib
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer cadastre-address
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer cadastre-lodged
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer easements-points
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer easement-lines
[upsert_da

In [21]:
p_wfsCA = upsert_datasets(l_wfsCA, ckan, overwrite_metadata=False, drop_existing_resources=False)
print("{0} datasets created or updated from {1} Cadastre Admin WFS layers".format(len(p_wfsCA), len(wfsCA.contents)))

Refreshing harvested WMS layer datasets...
[upsert_dataset] Reading WMS layer localities
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer districts
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer mapsheets
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer local-government-authority-lga-boundaries
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were not changed.
  [upsert_dataset]  Existing resources were kept, new resources were added.
[upsert_dataset] Reading WMS layer townsites
[upsert_dat