# CMIP6 vocabularies with pyessv 

1. pyessv is a pragmatic, simple to use vocabulary management tool
2. pyessv archive has been [seeded](https://github.com/ES-DOC/pyessv-writer/blob/master/sh/write_wcrp_cmip6.py) with CV's pulled from [WCRP-CMIP CMIP6 CVs](https://github.com/WCRP-CMIP/CMIP6_CVs)
3. pyessv CV data model:  
   3.1  Authority (e.g. WCRP)  
   3.2  Scope (e.g. CMIP6)  
   3.3  Collection (e.g. institution-id)  
   3.4  Term (e.g. noaa-gfdl)  

## Pre-Requisites

1.  Download pyessv-archive repository to local file system:  
    
    git clone https://github.com/ES-DOC/pyessv-archive.git YOUR_WORK_DIRECTORY  
    

2.  Create ES-DOC pyessv folder:  

    mkdir ~/.esdoc/pyessv
    
3.  Create authority sym links:

    ln -s YOUR_WORK_DIRECTORY/pyessv-archive/wcrp ~/.esdoc/pyessv

## Setup

In [1]:
import pyessv

2017-05-22T11:54:18.212897 [INFO] :: ES-DOC PYESSV :: Loading vocabularies from /Users/macg/.esdoc/pyessv-archive:
2017-05-22T11:54:18.737631 [INFO] :: ES-DOC PYESSV :: ... loaded: wcrp


## Loading

#### Loading by namespace

In [2]:
# Load authority: WCRP (once loaded it is cached).
wcrp = pyessv.load('wcrp')
assert isinstance(wcrp, pyessv.Authority)

In [3]:
# Load scope: CMIP6.
cmip6 = pyessv.load('wcrp:cmip6')
assert isinstance(cmip6, pyessv.Scope)

In [4]:
# Load collection: institutions.
institutions = pyessv.load('wcrp:cmip6:institution-id')
assert isinstance(institutions, pyessv.Collection)

In [5]:
# Load term: IPSL.
gfdl = pyessv.load('wcrp:cmip6:institution-id:noaa-gfdl')
assert isinstance(gfdl, pyessv.Term)

#### Loading by unique identifer

In [6]:
wcrp == pyessv.load('62663984-eb8b-4c3f-a70a-e54c62d62774')
assert isinstance(wcrp, pyessv.Authority)

In [7]:
cmip6 == pyessv.load('39feab1d-889a-442d-938e-568c263cb642')
assert isinstance(cmip6, pyessv.Scope)

In [8]:
institutions == pyessv.load('4d5aa940-139b-45d0-ad5f-6e768f682108')
assert isinstance(institutions, pyessv.Collection)

In [9]:
gfdl == pyessv.load('012a1913-f6c5-4d89-b41f-95a1c25a773c')
assert isinstance(gfdl, pyessv.Term)

## Iteration

#### Simple iteration of vocabulary hierarchy

In [10]:
# Iterate scopes managed by an authority.
for scope in wcrp:
    assert isinstance(scope, pyessv.Scope)
    
# Iterate collections within a scope.
for collection in cmip6:
    assert isinstance(collection, pyessv.Collection)
    
# Iterate terms within a collection.
for term in institutions:
    assert isinstance(term, pyessv.Term)

#### Iterables are sorted

In [11]:
# Iterables are sorted: scopes
for scope in wcrp:
    print(scope)
    for collection in cmip6:
        print(collection)
        for term in institutions:
            print(term)

wcrp:cmip6
wcrp:cmip6:activity-id
wcrp:cmip6:institution-id:awi
wcrp:cmip6:institution-id:bnu
wcrp:cmip6:institution-id:cams
wcrp:cmip6:institution-id:cccma
wcrp:cmip6:institution-id:cccr-iitm
wcrp:cmip6:institution-id:cmcc
wcrp:cmip6:institution-id:cnrm-cerfacs
wcrp:cmip6:institution-id:cola-cfs
wcrp:cmip6:institution-id:csir-csiro
wcrp:cmip6:institution-id:csiro-bom
wcrp:cmip6:institution-id:ec-earth-consortium
wcrp:cmip6:institution-id:fio-ronm
wcrp:cmip6:institution-id:inm
wcrp:cmip6:institution-id:inpe
wcrp:cmip6:institution-id:ipsl
wcrp:cmip6:institution-id:lasg-iap
wcrp:cmip6:institution-id:messy-consortium
wcrp:cmip6:institution-id:miroc
wcrp:cmip6:institution-id:mohc
wcrp:cmip6:institution-id:mpi-m
wcrp:cmip6:institution-id:mri
wcrp:cmip6:institution-id:nasa-giss
wcrp:cmip6:institution-id:ncar
wcrp:cmip6:institution-id:ncc
wcrp:cmip6:institution-id:nerc
wcrp:cmip6:institution-id:nims-kma
wcrp:cmip6:institution-id:noaa-gfdl
wcrp:cmip6:institution-id:noaa-ncep
wcrp:cmip6:institu

#### Iterable access via key

In [12]:
# Set pointer to a scope within an authority.
assert wcrp['cmip6'] == cmip6

In [13]:
# Set pointer to a collection within a scope.
assert cmip6['institution-id'] == institutions

In [14]:
# Set pointer to a term within a collection.
assert institutions['noaa-gfdl'] == gfdl

## Properties

#### All domain objects have the following standard properties: description, label, name, namespace, raw_name, uid, url

### Authority properties

In [15]:
# Canonical name (always lower cased).
print(wcrp.canonical_name)

wcrp


In [16]:
# Raw name.
print(wcrp.raw_name)

WCRP


In [17]:
# Label for UI purposes.
print(wcrp.label)

WCRP


In [18]:
# Description.
print(wcrp.description)

World Climate Research Program


In [19]:
# Homepage / URL.
print(wcrp.url)

https://www.wcrp-climate.org/wgcm-overview


In [20]:
# Namespace.
print(wcrp.namespace)

wcrp


In [21]:
# Universally unique identifier (assigned at point of creation).
print(wcrp.uid)

331bfd05-63e4-4ae4-9f7f-113933e434b8


### Scope properties

In [22]:
# Canonical name (always lower cased).
print(cmip6.canonical_name)

cmip6


In [23]:
# Raw name.
print(cmip6.raw_name)

CMIP6


In [24]:
# Label for UI purposes.
print(cmip6.label)

CMIP6


In [25]:
# Description.
print(cmip6.description)

Controlled Vocabularies (CVs) for use in CMIP6


In [26]:
# Homepage / URL.
print(cmip6.url)

https://github.com/WCRP-CMIP/CMIP6_CVs


In [27]:
# Namespace.
print(cmip6.namespace)

wcrp:cmip6


In [28]:
# Universally unique identifier (assigned at point of creation).
print(cmip6.uid)

ba2ccd0c-921e-45d4-a13a-6bbc5bdc7340


### Collection properties

In [29]:
# Canonical name (always lower cased).
print(institutions.canonical_name)

institution-id


In [30]:
# Description.
print(institutions.description)

WCRP CMIP6 CV collection:


In [31]:
# Label for UI purposes.
print(institutions.label)

institution_id


In [32]:
# Namespace.
print(institutions.namespace)

wcrp:cmip6:institution-id


In [33]:
# Raw name.
print(institutions.raw_name)

institution_id


In [34]:
# Universally unique identifier (assigned at point of creation).
print(institutions.uid)

d48ddea5-9a09-463a-be8d-6bf3379e3d19


In [35]:
# URL.
print(institutions.url)

None


### Term properties

In [36]:
# Canonical name (always lower cased).
print(gfdl.canonical_name)

noaa-gfdl


In [37]:
# Raw name.
print(gfdl.raw_name)

NOAA-GFDL


In [38]:
# Label for UI purposes.
print(gfdl.label)

NOAA-GFDL


In [39]:
# Description (optional).
print(gfdl.description)

None


In [40]:
# Homepage / URL (optional).
print(gfdl.url)

None


In [41]:
# Creation date
print(gfdl.create_date)

2017-03-21 00:00:00+00:00


In [42]:
# Namespace (authority:scope:collection:term).
print(gfdl.namespace)

wcrp:cmip6:institution-id:noaa-gfdl


In [43]:
# Universally unique identifier (assigned at point of creation).
print(gfdl.uid)

c3a30ea1-312f-437d-b6ac-1cfe9f0b1689


In [44]:
# Governance status.
print(gfdl.status)

pending


In [45]:
# Collection relative identifier.
print(gfdl.idx)

25


## Encoding

In [46]:
# Encode authority as a python dictionary.
assert isinstance(pyessv.encode(wcrp, 'dict'), dict)

# Encode scope as a python dictionary.
assert isinstance(pyessv.encode(cmip6, 'dict'), dict)

# Encode collection as a python dictionary.
assert isinstance(pyessv.encode(institutions, 'dict'), dict)

# Encode term as a python dictionary.
assert isinstance(pyessv.encode(gfdl, 'dict'), dict)

In [47]:
# Encode authority as a JSON text blob.
assert isinstance(pyessv.encode(wcrp, 'json'), basestring)

# Encode scope as a JSON text blob.
assert isinstance(pyessv.encode(cmip6), basestring)

# Encode collection as a JSON text blob.
assert isinstance(pyessv.encode(institutions), basestring)

# Encode term as a JSON text blob.
assert isinstance(pyessv.encode(gfdl), basestring)

## Parsing

#### Parsing canonical name

In [48]:
# Parse authority.
assert pyessv.parse('wcrp') == 'wcrp'

# Parse scope.
assert pyessv.parse('wcrp', 'cmip6') == 'cmip6'

# Parse collection.
assert pyessv.parse('wcrp', 'cmip6', 'institution-id') == 'institution-id'

# Parse term.
assert pyessv.parse('wcrp', 'cmip6', 'institution-id', 'ipsl') == 'ipsl'

#### Parsing name originally used at point of creation (returns canonical name)

In [49]:
# Parse authority.
assert pyessv.parse('WCRP') == 'wcrp'

# Parse scope.
assert pyessv.parse('WCRP', 'CMIP6') == 'cmip6'

# Parse collection.
assert pyessv.parse('WCRP', 'CMIP6', 'required_global_attributes') == 'required-global-attributes'

# Parse term.
assert pyessv.parse('WCRP', 'CMIP6', 'required_global_attributes', 'Conventions') == 'conventions'

#### Parsing error is raised upon an unsuccessful parse

In [50]:
# Parse invalid authority.
try:
    pyessv.parse('xxx')
except pyessv.ParsingError:
    pass

In [51]:
# Parse invalid scope.
try:
    pyessv.parse('wcrp', 'xxx')
except pyessv.ParsingError:
    pass

In [52]:
# Parse invalid collection.
try:
    pyessv.parse('wcrp', 'cmip6', 'xxx')
except pyessv.ParsingError:
    pass

In [53]:
# Parse invalid term.
try:
    pyessv.parse('wcrp', 'cmip6', 'institution-id', 'xxx')
except pyessv.ParsingError:
    pass

#### Set strict = false in order to parse mixed-case, underscores & synonyms

In [54]:
assert pyessv.parse('wCRp', strictness=3) == 'wcrp'

In [55]:
assert pyessv.parse('wCRp', 'cMIp6', strictness=3) == 'cmip6'

In [56]:
assert pyessv.parse('wCRp', 'cMIp6', 'inSTitutION-id', strictness=3) == 'institution-id'

In [57]:
assert pyessv.parse('wCRp', 'cMIp6', 'inSTitutION-id', 'IPsl', strictness=3) == 'ipsl'

## Regular Expression Collections

In [58]:
# Create a collection specifying a regular expression to be applied against terms.
re_collection = pyessv.create_collection(
    cmip6,
    "test-regex-collection", 
    description="Ensemble member",
    term_name_regex=r'^[a-z\-]*$'
)

In [59]:
# Create a valid term.
term = pyessv.create_term(re_collection, "abc-def", description="valid-regex-term")
assert pyessv.is_valid(term) == True

In [60]:
# Create an invalid term - raises ValidationError.
try:
    pyessv.create_term(re_collection, "ABC-DEF", "invalid-regex-term")
except pyessv.ValidationError:
    pass

## Template parsing

In [61]:
# Template.
template = 'ciclad/cmip6/{}/{}/{}/{}/afilename.nc1'

# Template collections.
collections = (
    pyessv.load('wcrp:cmip6:institution-id'),
    pyessv.load('wcrp:cmip6:activity-id'),
    pyessv.load('wcrp:cmip6:source-id'),
    pyessv.load('wcrp:cmip6:experiment-id')
    )

# Parser.
parser = pyessv.create_template_parser(template, collections)

In [62]:
# Parse a valid.
parser.parse('ciclad/cmip6/ipsl/dcpp/hadgem3-gc31-ll/dcppc-atl-spg/afilename.nc1')

In [63]:
# Parse a valid.
parser.parse('ciclad/cmip6/ipsl/dcpp/hadgem3-gc31-ll/dcppc-atl-spg/afilename.nc1')

In [64]:
# Parsing: invalid - raises TemplateParsingError. 
try:
    parser.parse('ciclad/cmip6/WWW/XXX/YYY/ZZZ/afilename.nc1')
except pyessv.TemplateParsingError:
    pass

In [65]:
pyessv.load('331bfd05-63e4-4ae4-9f7f-113933e434b8')

wcrp