# Implicit Georeferencing
This workbook sets explicit georeferences from implicit georeferencing through names of extents given in dataset titles or keywords.


A file `sources.py` needs to contain the CKAN and SOURCE config as follows:

```
CKAN = {
  "dpaw-internal":{
    "url": "http://internal-data.dpaw.wa.gov.au/",
    "key": "API-KEY" 
  }
}
```

## Configure CKAN and source

In [1]:
import ckanapi
from harvest_helpers import *
from secret import CKAN

ckan = ckanapi.RemoteCKAN(CKAN["dpaw-internal"]["url"], apikey=CKAN["dpaw-internal"]["key"])
print("Using CKAN {0}".format(ckan.address))

Using CKAN http://internal-data.dpaw.wa.gov.au/


## Spatial extent name-geometry lookup
The fully qualified names and GeoJSON geometries of relevant spatial areas are contained in our custom dataschema.

In [2]:
# Getting the extent dictionary e
url = "https://raw.githubusercontent.com/datawagovau/ckanext-datawagovautheme/dpaw-internal/ckanext/datawagovautheme/datawagovau_dataset.json"
ds = json.loads(requests.get(url).content)
choice_dict = [x for x in ds["dataset_fields"] if x["field_name"] == "spatial"][0]["choices"]
e = dict([(x["label"], json.dumps(x["value"])) for x in choice_dict])
print("Extents: {0}".format(e.keys()))

Extents: [u'IBRA GVD01 Shield', u'IBRA SWA01 Dandaragan Platea', u'IBRA LSD02 Trainor', u'MPA Jurien Bay', u'IBRA ESP01 Fitzgerald', u'IBRA GVD04 Kintore', u'MPA Shoalwater Islands', u'IBRA OVP02 South Kimberley Interzone', u'IBRA NOK02 Berkeley', u'MPA Rowley Shoals', u'IBRA CER01 Mann-Musgrave Block', u'IBRA COO03 Eastern Goldfield', u'IBRA WAR01 Warren', u'IBRA GID01 Lateritic Plain', u'IBRA MAL02 Western Mallee', u'IBRA AVW02 Katanning', u'IBRA PIL04 Roebourne', u'IBRA GID02 Dune Field', u'IBRA LSD01 Rudall', u'IBRA CAR02 Wooramel', u'IBRA YAL01 Edel', u'MPA Swan Estuary', u'IBRA GVD02 Central', u'IBRA TAN01 Tanami Desert', u'IBRA GSD02 Mackay', u'IBRA NUL01 Carlisle', u'IBRA AVW01 Merredin', u'MPA Walpole Nornalup', u'IBRA COO02 Southern Cross', u'IBRA JAF02 Southern Jarrah Forest', u'IBRA VIB01 Keep', u'MPA Eighty Mile Beach', u'IBRA MAL01 Eastern Mallee', u'IBRA GES02 Lesueur Sandplain', u'IBRA DAL02 Pindanland', u'IBRA GAS02 Carnegie', u'IBRA PIL01 Chichester', u'IBRA GAS03 Aug

## Name lookups
Relevant areas are listed under different synonyms. We'll create a dictionary of synonymous search terms ("s") and extent names (index "i").

In [55]:

# Creating a search term - extent index lookup
# m is a list of keys "s" (search term) and "i" (extent index)
m = [
    {"s":"Eighty", "i":"MPA Eighty Mile Beach"},
    {"s":"EMBMP", "i":"MPA Eighty Mile Beach"},
    {"s":"Camden", "i":"MPA Lalang-garram / Camden Sound"},
    {"s":"LCSMP", "i":"MPA Lalang-garram / Camden Sound"},
    {"s":"Rowley", "i":"MPA Rowley Shoals"},
    {"s":"RSMP", "i":"MPA Rowley Shoals"},
    {"s":"Montebello", "i":"MPA Montebello Barrow"},
    {"s":"MBIMPA", "i":"MPA Montebello Barrow"},
    {"s":"Ningaloo", "i":"MPA Ningaloo"},
    {"s":"NMP", "i":"MPA Ningaloo"},
    {"s":"Shark bay", "i":"MPA Shark Bay Hamelin Pool"},
    {"s":"SBMP", "i":"MPA Shark Bay Hamelin Pool"},
    {"s":"Jurien", "i":"MPA Jurien Bay"},
    {"s":"JBMP", "i":"MPA Jurien Bay"},
    {"s":"Marmion", "i":"MPA Marmion"},
    {"s":"Swan Estuary", "i":"MPA Swan Estuary"},
    {"s":"SEMP", "i":"MPA Swan Estuary"},
    {"s":"Shoalwater", "i":"MPA Shoalwater Islands"},
    {"s":"SIMP", "i":"MPA Shoalwater Islands"},
    {"s":"Ngari", "i":"MPA Ngari Capes"},
    {"s":"NCMP", "i":"MPA Ngari Capes"},
    {"s":"Walpole", "i":"MPA Walpole Nornalup"},
    {"s":"WNIMP", "i":"MPA Walpole Nornalup"}
]

In [56]:
def add_spatial(dsdict, extent_string, force=False, debug=False):
    """Adds a given spatial extent to a CKAN dataset dict if 
        "spatial" is None, "" or force==True.
    
    Arguments:
        dsdict (ckanapi.action.package_show()) CKAN dataset dict
        extent_string (String) GeoJSON geometry as json.dumps String
        force (Boolean) Whether to force overwriting "spatial"
        debug (Boolean) Debug noise
    
    Returns:
        (dict) The dataset with spatial extent replaced per above rules.
    """ 
    if not dsdict.has_key("spatial"):
        overwrite = True
        if debug:
            msg = "Spatial extent not given"
    elif dsdict["spatial"] == "":
        overwrite = True
        if debug:
            msg = "Spatial extent is empty"
    elif force:
        overwrite = True
        msg = "Spatial extent was overwritten"
    else:
        overwrite = False
        msg = "Spatial extent unchanged"

    if overwrite:
        dsdict["spatial"] = extent_string
        
    print(msg)
    return dsdict


def restore_extents(search_mapping, extents, ckan, debug=False):
    """Restore spatial extents for datasets
    
    Arguments:
        search_mapping (list) A list of dicts with keys "s" for ckanapi 
            package_search query parameter "q", and key "i" for the name
            of the extent
            e.g.:
            m = [
                {"s":"tags:marinepark_80_mile_beach", "i":"MPA Eighty Mile Beach"},
                ...
            ]
        extents (dict) A dict with key "i" (extent name) and 
        GeoJSON Multipolygon geometry strings as value, e.g.:
        {u'MPA Eighty Mile Beach': '{"type": "MultiPolygon", "coordinates": [ .... ]', ...}
        ckan (ckanapi) A ckanapi instance
        debug (boolean) Debug noise
    Returns:
        A list of dictionaries returned by ckanapi's package_update
    """
    for x in search_mapping:
        if debug:
            print("\nSearching CKAN with '{0}'".format(x["s"]))
        found = ckan.action.package_search(q=x["s"])["results"]
        if debug:
            print("Found datasets: {0}\n".format([d["title"] for d in found]))
        fixed = [add_spatial(d, extents[x["i"]], force=True, debug=True) for d in found]
        if debug:
            print(fixed, "\n")
        datasets_updated = upsert_datasets(fixed, ckan, debug=False)


In [57]:
restore_extents(m, e, ckan)

Spatial extent was overwritten
Spatial extent was overwritten
Spatial extent was overwritten
Spatial extent was overwritten
Spatial extent was overwritten
Spatial extent was overwritten
Spatial extent was overwritten
Spatial extent was overwritten
Spatial extent not given
Spatial extent not given
Refreshing harvested WMS layer datasets...
[upsert_dataset] Reading WMS layer sandy-shoreline-at-eighty-mile-beach-mp
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were updated.
  [upsert_dataset]  Existing resources were replaced with new resources.
[upsert_dataset] Reading WMS layer turtle-monitoring-at-embmpa
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were updated.
  [upsert_dataset]  Existing resources were replaced with new resources.
[upsert_dataset] Reading WMS layer intertidal-infauna-at-eighty-mile-beach-mp
[upsert_dataset]  Layer exists.
  [upsert_dataset]  Existing dataset metadata were updated.
  [upsert_dataset]  E

In [46]:
d = [ckan.action.package_show(id = x) for x in ckan.action.package_list()]

In [48]:
fix = [x["title"] for x in d if not x.has_key("spatial")]

In [49]:
len(fix)

795

In [53]:
d[0]

{u'author': u'Kathy Murray',
 u'author_email': u'kathy.murray@dpaw.wa.gov.au',
 u'creator_user_id': u'7561256b-5317-4d35-a5d5-f4ff37c24e8e',
 u'extras': [],
 u'groups': [],
 u'id': u'387ad1b2-ffb8-486d-aa10-9bec47ba633e',
 u'isopen': True,
 u'license_id': u'cc-by-sa',
 u'license_title': u'Creative Commons Attribution Share-Alike',
 u'license_url': u'http://www.opendefinition.org/licenses/cc-by-sa',
 u'maintainer': u'Florian Mayer',
 u'maintainer_email': u'florian.mayer@dpaw.wa.gov.au',
 u'metadata_created': u'2015-11-17T03:37:14.378717',
 u'metadata_modified': u'2015-11-17T03:37:14.432860',
 u'name': u'2012-aerial-photography-of-the-ngari-capes-marine-protected-area',
 u'notes': u'These images were created to remove the water glint from the original aerial photography image delivered by Landgate over the Busselton and the Leeuwin tiles captured in 2012. This technique did create some errors which have been masked with a land, intertidal and surf zone mask. In the rare case that errors 

In [54]:
fix

[u'2012 aerial photography of the Ngari Capes Marine Protected Area',
 u'Abundance and community composition of marine invertebrates at the Ningaloo MPA',
 u'Abundance and community composition of marine invertebrates at the Rowley Shoals Marine Park',
 u'Abundance of Highly Targeted finfish in shallow water (<10m) at the Montebello-Barrow Islands MPA',
 u'Abundance of Australian Sea Lion pups at Jurien Bay MP',
 u'Abundance of Australian Sea Lions at Jurien Bay MP',
 u'Abundance of seabirds and shorebirds at Walpole and Nornalup Inlets Marine Park',
 u'Abundance of highly targeted finfish in deep water (10-20m) at the Montebello-Barrow Islands MPA',
 u'Abundance of indicator sharks and rays at Ningaloo MPA',
 u'Abundance of intertidal invertebrates at the Marmion MP',
 u'Abundance of intertidal invertebrates at the Shoalwater Islands MP',
 u'Abundance of invertebrates at the Walpole and Nornalup Inlets MP',
 u'Abundance of invertebrates in the Ngari Capes MP',
 u'Abundance of inverteb