# Search records from a CSW catalog


This notebook shows a typical workflow to query a Catalog Service for the Web (CSW) and create a request for data endpoints that are suitable for download.

* CSW and CSWLib
* Create filter_list
* Query NCI's geonetwork records
* Fetch data catalogue information 

---


- Authors: NCI Virtual Research Environment Team
- Keywords: CSW, geonetwork, data query, search
- Create Date: 2020-Jun
- Lineage/Reference: This tutorial is developed based on [IOOS data fetching tutorial](https://ioos.github.io/notebooks_demos/notebooks/2017-12-15-finding_HFRadar_currents/).

---


### About CSW and OWSLib

[Catalogue Service for the Web (CSW)](https://en.wikipedia.org/wiki/Catalogue_Service_for_the_Web) is a standard for exposing a catalogue of geospatial records in XML on the Internet (over HTTP). The catalogue is made up of records that describe geospatial data (e.g. KML), geospatial services (e.g. WMS), and related resources.

OWSLib is a Python package for client programming with Open Geospatial Consortium (OGC) web service (hence OWS) interface standards, and their related content models.

Below we created a `csw` object:

In [1]:
from owslib.csw import CatalogueServiceWeb
endpoint = "http://geonetwork.nci.org.au/geonetwork/srv/eng/csw"
csw = CatalogueServiceWeb(endpoint, timeout=60)
csw.identification.type

'CSW'

Get supported resultType’s:

In [2]:
[op.name for op in csw.operations]

['GetCapabilities',
 'DescribeRecord',
 'GetDomain',
 'GetRecords',
 'GetRecordById',
 'Transaction',
 'Harvest']

### Create filter list

    Now we create a `filter_list`, bounding box and search words using `owslib.fes`.

In [3]:
from owslib import fes
from owslib.fes import PropertyIsEqualTo, PropertyIsLike, BBox

# Region: Australia.
min_lon, max_lon = 110, 160
min_lat, max_lat = -45, -5

bbox = [min_lon, min_lat, max_lon, max_lat]

# searching words
words = [
    "dataset",
    "climate",
    "CMIP"
]

kw = dict(wildCard="*", escapeChar="\\", singleChar="?", propertyname="apiso:AnyText")

or_filt = fes.And([fes.PropertyIsEqualTo('csw:AnyText',f'{val}') for val in words])

bbox = fes.BBox(bbox)

filter_list = [
    fes.And(
        [
            bbox,  # bounding box
            or_filt,  # or conditions (searching words)
        ]
    )
]

### Query NCI's geonetwork records

Below we create a `get_csw_records` function that calls the OSWLib method `getrecords2` iteratively to retrieve all the records matching the search criteria specified by the `filter_list`.

In [5]:
def get_csw_records(csw, filter_list, pagesize=10, maxrecords=1000):
    """Iterate `maxrecords`/`pagesize` times until the requested value in
    `maxrecords` is reached.
    """
    from owslib.fes import SortBy, SortProperty

    # Iterate over sorted results.
    sortby = SortBy([SortProperty("dc:title")])
    csw_records = {}
    startposition = 0
    nextrecord = getattr(csw, "results", 1)
    while nextrecord != 0:
        csw.getrecords2(
            constraints=filter_list,
            startposition=startposition,
            maxrecords=pagesize,
            sortby=sortby,
        )
        csw_records.update(csw.records)
        if csw.results["nextrecord"] == 0:
            break
        startposition += pagesize + 1  # Last one is included.
        if startposition >= maxrecords:
            break
    csw.records.update(csw_records)

The query result is a record list which satisfying all the filtering conditions.

In [6]:
get_csw_records(csw, filter_list, pagesize=10, maxrecords=1000)

records = "\n".join(csw.records.keys())
print("Found {} records.\n".format(len(csw.records.keys())))
for key, value in list(csw.records.items()):
    print(u"[{}]\n{}\n".format(value.title, key))

Found 49 records.

[CSIRO-Mk3-6-0 model output prepared for CMIP5 rcp26]
f6946_9470_6588_4731

[CSIRO-Mk3-6-0 model output prepared for CMIP5 rcp45]
f5323_9527_2057_9046

[CSIRO-Mk3-6-0 model output prepared for CMIP5 rcp60]
f4615_6545_5514_4988

[CSIRO-Mk3-6-0 model output prepared for CMIP5 rcp85]
f9566_7417_8270_9939

[CSIRO-Mk3-6-0 model output prepared for CMIP5 sstClim]
f4638_7453_4754_4897

[CSIRO-Mk3-6-0 model output prepared for CMIP5 sstClim4xCO2]
f5198_1503_3359_1997

[CSIRO-Mk3-6-0 model output prepared for CMIP5 sstClimAerosol]
f7310_7948_2407_3802

[CSIRO-Mk3-6-0 model output prepared for CMIP5 sstClimSulfate]
f9974_7123_7418_9226

[Earth System Grid Federation (ESGF) Australian CMIP6-era Datasets]
f3154_9976_7262_7595

[ACCESS1-0 model output prepared for CMIP5 1pctCO2]
f1126_7442_2531_9007

[ACCESS1-0 model output prepared for CMIP5 abrupt4xCO2]
f6709_3440_4100_9566

[ACCESS1-0 model output prepared for CMIP5 amip]
f3614_5840_1226_9615

[ACCESS1-0 model output prepared 

Now, to further narrow down the searching criteria, one can filter by a string. For example, search CMIP datasets which are published through THREDDS.

In [7]:
filter_list = [
    fes.And(
        [
            bbox,  # bounding box
            or_filt,  # or conditions (searching words)
            fes.PropertyIsLike(literal="*THREDDS*", **kw),  # must have THREDDS
        ]
    )
]
get_csw_records(csw, filter_list, pagesize=10, maxrecords=1000)

records = "\n".join(csw.records.keys())
print("Found {} records.\n".format(len(csw.records.keys())))
for key, value in list(csw.records.items()):
    print("[{}]\n{}\n".format(value.title, key))

Found 38 records.

[CSIRO-Mk3-6-0 model output prepared for CMIP5 rcp26]
f6946_9470_6588_4731

[CSIRO-Mk3-6-0 model output prepared for CMIP5 rcp45]
f5323_9527_2057_9046

[CSIRO-Mk3-6-0 model output prepared for CMIP5 rcp60]
f4615_6545_5514_4988

[CSIRO-Mk3-6-0 model output prepared for CMIP5 rcp85]
f9566_7417_8270_9939

[CSIRO-Mk3-6-0 model output prepared for CMIP5 sstClim]
f4638_7453_4754_4897

[CSIRO-Mk3-6-0 model output prepared for CMIP5 sstClim4xCO2]
f5198_1503_3359_1997

[CSIRO-Mk3-6-0 model output prepared for CMIP5 sstClimAerosol]
f7310_7948_2407_3802

[CSIRO-Mk3-6-0 model output prepared for CMIP5 sstClimSulfate]
f9974_7123_7418_9226

[ACCESS1-0 model output prepared for CMIP5 1pctCO2]
f1126_7442_2531_9007

[ACCESS1-0 model output prepared for CMIP5 abrupt4xCO2]
f6709_3440_4100_9566

[ACCESS1-0 model output prepared for CMIP5 amip]
f3614_5840_1226_9615

[ACCESS1-0 model output prepared for CMIP5 historical]
f3187_2606_9930_4751

[ACCESS1-0 model output prepared for CMIP5 his

Now we got fewer records. Those records shows the CMIP datasets that are available through THREDDS.

### Fetch data catalogue information 

The easiest way to get more information is to explorer the individual records. Here is the abstract and subjects from ACCESS1-0 model output prepared for CMIP5 historical.

In [8]:
import textwrap

value = csw.records[
    "f3187_2606_9930_4751"
]

print("\n".join(textwrap.wrap(value.abstract)))

historical is an experiment of the CMIP5 - Coupled Model
Intercomparison Project Phase 5 (http://cmip-pcmdi.llnl.gov/cmip5/).
CMIP5 is meant to provide a framework for coordinated climate change
experiments for the next five years and thus includes simulations for
assessment in the AR5 as well as others that extend beyond the AR5.
3.2 historical (3.2 Historical) - Version 1: Simulation of recent past
(1850 to 2005). Impose changing conditions (consistent with
observations).  Experiment design: http://cmip-
pcmdi.llnl.gov/cmip5/docs/Taylor_CMIP5_design.pdf List of output
variables: http://cmip-pcmdi.llnl.gov/cmip5/docs/standard_output.pdf
Output: time series per variable in model grid spatial resolution in
netCDF format Earth System model and the simulation information: CIM
repository  Entry name/title of data are specified according to the
Data Reference Syntax (http://cmip-
pcmdi.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf) as
activity/product/institute/model/experiment/freque

Let's see what attributes each csw object has:

In [9]:
attrs = [attr for attr in dir(value) if not attr.startswith("_")]
nonzero = [attr for attr in attrs if getattr(value, attr)]
nonzero

['abstract',
 'identifier',
 'identifiers',
 'relation',
 'subjects',
 'title',
 'type',
 'xml']

In [10]:
value.subjects

['Other Chemical Sciences (Environmental Chemistry)',
 'Atmospheric Sciences',
 'Oceanography',
 'Physical Geography and Environmental Geoscience',
 'National Computational Infrastructure (NCI)',
 '0399 - Other Chemical Sciences',
 '0401 - Atmospheric Sciences',
 '0405 - Oceanography',
 '0406 - Physical Geography and Environmental Geoscience',
 None]

In [11]:
value.xml

b'<csw:SummaryRecord xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:geonet="http://www.fao.org/geonetwork" xmlns:dct="http://purl.org/dc/terms/" xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">\n      <dc:identifier>f3187_2606_9930_4751</dc:identifier>\n      <dc:title>ACCESS1-0 model output prepared for CMIP5 historical</dc:title>\n      <dc:type>dataset</dc:type>\n      <dc:subject>Other Chemical Sciences (Environmental Chemistry)</dc:subject>\n      <dc:subject>Atmospheric Sciences</dc:subject>\n      <dc:subject>Oceanography</dc:subject>\n      <dc:subject>Physical Geography and Environmental Geoscience</dc:subject>\n      <dc:subject>National Computational Infrastructure (NCI)</dc:subject>\n      <dc:subject>0399 - Other Chemical Sciences</dc:subject>\n      <dc:subject>0401 - Atmospheric Sciences</dc:subject>\n      <dc:subject>0405 - Oceanography</dc:subject>\n      <dc:subject>0406 - Physical Geography and Environmental 