# Goose Catalog Python Client Tutorial

In [1]:
from dgcatalog import Stac
from pprint import pprint

All interaction with the catalog is done using a `Stac` object.

For production use it is not necessary to specify the url parameter as the default catalog will be used.  But the `url` parameter can be used to point to test and development catalogs.

There are two ways to specify GBDX credentials when constructing a `Stac` object.  If you already have a GBDX token you can provide it to the Stac constructor using the `token` parameter.  Or you can use the `username` and `password` parameters to specify GBDX credentials.  I this case the constructor calls GBDX to generate a token.  If the password is omitted then the constructor will prompt you for it.

If `verbose` is True then `Stac` methods will print brief messages and web requests and responses to stdout.

The `Stac` object does not handle token expiration.  If you use a `Stac` object long enough that its token expires then you must create a new `Stac` object.

Use one of the following service URL's depending on the environment:

In [2]:
# service_url = 'https://api-test-2.discover.digitalglobe.com/v2/stac'
service_url = 'https://api-dev-2.discover.digitalglobe.com/v2/stac'

In [3]:
stac = Stac(url=service_url, username='super_tester@mailinator.com', verbose=True)

Password:  ············


Requesting token from https://geobigdata.io/auth/v1/oauth/token
Token successfully received.


## Working with catalogs

Every catalog has an associated JSON schema used to validate STAC items when they are added to the catalog.
Associating a JSON schema with a catalog in this way is a DigitalGlobe extension to the STAC specification.

When STAC items are inserted into a catalog they are also validated against a basic STAC item JSON schema,
which verifies they are valid GeoJSON and have the minimum required STAC properties (like `datetime`).  So regardless
of what JSON schema is associated with a catalog this additional validation is always performed.

For this tutorial we simply use the GeoJSON Feature schema.  Since every STAC item is a GeoJSON feature this is suitable for demo purposes.  Later there will be STAC JSON schemas that are more suitable for validating STAC items.

In [4]:
import json
import requests
schema = json.loads(requests.get('http://geojson.org/schema/Feature.json').text)

In [5]:
catalog = {
    'stac_version': '0.6.0',
    'id': 'wv',
    'title': 'DigitalGlobe WV',
    'description': 'DigitalGlobe WV images',
    'links': [
        {
            'rel': 'self',
            'href': 'https://api.discover.digitalglobe.com/v2/stac/catalog/wv04'
        }
    ],
    'stac_item_schema': schema
}
stac.insert_catalog(catalog)

POST: https://api-dev-2.discover.digitalglobe.com/v2/stac/catalog
HTTP Status: 400
Request ID: 66920c65-0180-48c8-9233-18b75eeaf1eb


StacException: A catalog with the ID "wv" already exists (Request ID: 66920c65-0180-48c8-9233-18b75eeaf1eb)

In [6]:
catalog = stac.get_catalog('wv')

Get catalog catalog_id=wv
GET: https://api-dev-2.discover.digitalglobe.com/v2/stac/catalog/wv
HTTP Status: 200
Request ID: f9494d04-7d58-465e-876c-b3f088fca110


In [7]:
pprint(catalog, depth=1)

{'description': 'DigitalGlobe WV images',
 'id': 'wv',
 'links': [...],
 'stac_item_schema': {...},
 'stac_version': '0.6.0',
 'title': 'DigitalGlobe WV'}


Catalogs can be updated.  A catalog's ID cannot be changed but its other properties can, including its schema.
Note that if a catalog's schema is modified existing items in the catalog are not revalidated against the new schema.

In [8]:
catalog['description'] = 'DigitalGlobe WorldView 4 images'
stac.update_catalog(catalog)

PUT: https://api-dev-2.discover.digitalglobe.com/v2/stac/catalog/wv
HTTP Status: 204
Request ID: 54fa2041-ee69-4bb2-b920-88672b1ad330


## Working with STAC items

For this tutorial we will copy a few catalog records from the DUC database to the Goose database.  We will
use the `duc_get_image` function in the `dgcatalog.tools` module.  It reads an image's catalog
metadata from the DUC catalog service and returns it as a STAC item.

In [9]:
from dgcatalog.tools import duc_get_image, duc_query

In [10]:
item = duc_get_image(image_id='10400100108FCE00')

In [11]:
pprint(item, depth=1)

{'assets': {...},
 'geometry': {...},
 'id': '10400100108FCE00',
 'links': [...],
 'properties': {...},
 'type': 'Feature'}


Inserting a new item into a catalog:

Use head_item to perform an HTTP HEAD operation and determine whether a STAC item exists.

In [None]:
stac.head_item()

In [None]:
stac.insert_item(item, 'wv')

In [None]:
item = stac.get_item('10400100108FCE00')

In [None]:
image_ids = ["10200100782DBA00", "103001008817BC00", "102001007FB83A00", "102001007D14CD00", "102001007C528D00"]
items = duc_get_image(image_ids=image_ids)
stac.insert_items(items, 'wv')

Let's create some test data to search on.  Select a month's worth of WV04 images from DUC and insert them into the Goose "wv" catalog:

In [None]:
items = duc_query("collect_time_start >= '2017-01-01' and collect_time_start <= '2017-02-01' and vehicle_name = 'WV04'")

In [None]:
len(items)

In [None]:
stac.insert_items(items, 'wv')

## Working with item attachments

The catalog supports associating an arbitrary JSON object with each STAC item called its "attachments."  Propeties in the attachments are used to associate metadata with a STAC item that's not included in the item's feature itself.

Some attachment properties may be recognized by the catalog itself.  For now the only such property is "data-access-profile".

When inserting multiple STAC items using a feature collection you can specify an attachments property that is copied to each newly inserted item.

In [9]:
image_ids = ['10500100144DD900', '102001008164D600', '1020010080207100']
items = duc_get_image(image_ids=image_ids)

attachments = {
    'data-access-profile': {
        'policies': [
            {
                'startDate': '2019-02-01T00:00:00Z',
                'endDate': '2019-03-01T00:00:00Z',
                'allow': ['customer.001'],
                'deny': []
            },
            {
                'startDate': '2019-03-01T00:00:00Z',
                'endDate': '9999-12-31T23:59:59Z',
                'allow': ['dataaccess.public'],
                'deny': []
            }
        ]
    }
}

stac.insert_items(items, 'wv', attachments)

POST: https://api-dev-2.discover.digitalglobe.com/v2/stac/catalog/wv/item
HTTP Status: 201
Request ID: bc458039-5854-4b73-9f53-1742a714e8ce


Get the attachments for one of the images just inserted.

In [10]:
att = stac.get_attachments('10500100144DD900', 'wv')

GET: https://api-dev-2.discover.digitalglobe.com/v2/stac/catalog/wv/item/10500100144DD900/attachments
HTTP Status: 200
Request ID: 6cda2f03-2a30-4b2d-99d0-e5a1d7221bca


In [11]:
att

{'data-access-profile': {'policies': [{'deny': [],
    'allow': ['customer.001'],
    'endDate': '2019-03-01T00:00:00Z',
    'startDate': '2019-02-01T00:00:00Z'},
   {'deny': [],
    'allow': ['dataaccess.public'],
    'endDate': '9999-12-31T23:59:59Z',
    'startDate': '2019-03-01T00:00:00Z'}]}}

You can update an item's attachments.  Here we remove the first policy in the array and associate it back with the item:

In [12]:
att['data-access-profile']['policies'].pop(0)

{'deny': [],
 'allow': ['customer.001'],
 'endDate': '2019-03-01T00:00:00Z',
 'startDate': '2019-02-01T00:00:00Z'}

In [13]:
stac.update_attachments('10500100144DD900', 'wv', att)

PUT: https://api-dev-2.discover.digitalglobe.com/v2/stac/catalog/wv/item/10500100144DD900/attachments
HTTP Status: 204
Request ID: 39b86b1f-440f-4647-9fc6-e55aa968731c


Read it back to make sure it was updated.

In [14]:
att = stac.get_attachments('10500100144DD900', 'wv')

GET: https://api-dev-2.discover.digitalglobe.com/v2/stac/catalog/wv/item/10500100144DD900/attachments
HTTP Status: 200
Request ID: 50ea8e87-a0f8-4257-a3d9-c6fb08dab14a


In [15]:
att

{'data-access-profile': {'policies': [{'deny': [],
    'allow': ['dataaccess.public'],
    'endDate': '9999-12-31T23:59:59Z',
    'startDate': '2019-03-01T00:00:00Z'}]}}

Each STAC item has its own attachments.  When we inserted the three images above and
included attachments with them the attachments were copied for each STAC item.  Read the
attachments for another of the images to see that its attachments are unchanged.

In [16]:
att = stac.get_attachments('102001008164D600', 'wv')

GET: https://api-dev-2.discover.digitalglobe.com/v2/stac/catalog/wv/item/102001008164D600/attachments
HTTP Status: 200
Request ID: 9cababca-cb0b-4fd7-89df-3e00b784bcbd


In [17]:
att

{'data-access-profile': {'policies': [{'deny': [],
    'allow': ['customer.001'],
    'endDate': '2019-03-01T00:00:00Z',
    'startDate': '2019-02-01T00:00:00Z'},
   {'deny': [],
    'allow': ['dataaccess.public'],
    'endDate': '9999-12-31T23:59:59Z',
    'startDate': '2019-03-01T00:00:00Z'}]}}

In [18]:
stac.delete_attachments('102001008164D600', 'wv')

DELETE: https://api-dev-2.discover.digitalglobe.com/v2/stac/catalog/wv/item/102001008164D600/attachments
HTTP Status: 204
Request ID: c3ef6b3a-c06e-410b-bdaf-2d209ea0a3df


## Searching catalogs

Some notes on searching:
    
* A search may be performed against the entire database or against a particular catalog.
Specify the `catalog_id` parameter in the call to `search` to search against a particular catalog, otherwise the search is against the entire database.
* The maximum number of items returned by any search is 1000.  For larger resultsets use multiple calls to `search` with paging and ordering to retrieve the full set of results.

### Searching by date and time
Note that when search by datetime the start is inclusive and the end is exclusive.

In [None]:
from datetime import datetime
items = stac.search(catalog_id='wv', start_datetime=datetime(2017, 1, 1), end_datetime=datetime(2017, 1, 2))

In [None]:
stac._last_response.text

### Searching with a property filter

Use the `query` parameter to filter results.  Filtering is performed on the server side against properties in the STAC item's "properties" dictionary.

The general form of a query filter is "property operation value".

* "property" is the name of a STAC item property
* "operation" is one of the following:
    * Comparison operators:  =, !=, <>, <, >, <=, >=
    * "is" and "is not" for comparing with booleans and null.
    * "like" and "not like" for comparing strings with SQL patterns.
    * "in" for comparing with a list of integers or strings.
* "value" is a number, string, boolean, or null.
    * Exponential notation for floating-point values is not supported.
    * Strings are delimited by single quotes.  There is no facility for escaping single quotes inside the string or for any other escape sequences.
    * A boolean value is "true" or "false", specified without quotes, and case-insensitive.
    * A null value is "null", specified without quotes, and case-insensitive.

Filters can be combined using the operators "and" and "or".  The "and" operator takes precedence over the "or" operator.  Parenthesis can be used when combining filters with "and" and "or".

String comparisons are case-sensitive.

These are examples of valid query filters:

* vendor = 'DigitalGlobe'
* eo:cloud_cover < 20
* dg:rda_available is true
* dg:rda_available is false
* eo:gsd < 1.5
* eo:epsg is null
* eo:epsg in (32613, 26913, 26914)
* dg:sun_elevation_min < 20 and dg:sun_azimuth_max < 30
* (vendor = 'DigitalGlobe' and eo:platform = 'WORLDVIEW02') or (vendor = 'KOMPSAT' and eo:platform = 'KOMPSAT3A')

It is not an error to specify a property that an item doesn't have, but the item will
not be returned by the query no matter what other filters are provided.

Properties with nested values are not currently supported for filtering on.

A property filter is not intended to search on an item's `datetime` property.
Use the start_datetime and end_datetime seach parameters for that.
    

In [None]:
items = stac.search(start_datetime=datetime(2015, 1, 1), end_datetime=datetime(2016, 1, 1), query='eo:cloud < 10')

In [None]:
len(items)

# Bulk loads

In [None]:
from osgeo import gdal
from osgeo import ogr

In [None]:
daShapefile = r"C:\Temp\Voting_Centers_and_Ballot_Sites.shp"

dataSource = ogr.Open(daShapefile)
daLayer = dataSource.GetLayer(0)
layerDefinition = daLayer.GetLayerDefn()


for i in range(layerDefinition.GetFieldCount()):
    print layerDefinition.GetFieldDefn(i).GetName()