# CMIP6 vocabularies with pyessv 

1. pyessv is a pragmatic, simple to use vocabulary management tool
2. pyessv archive has been [seeded](https://github.com/ES-DOC/pyessv-writer/blob/master/sh/write_wcrp_cmip6.py) with CV's pulled from [WCRP-CMIP CMIP6 CVs](https://github.com/WCRP-CMIP/CMIP6_CVs)
3. pyessv CV data model is built upon the idea of nodes, there are 4 node types:  
   3.1  Authority (e.g. WCRP)  
   3.2  Scope (e.g. CMIP6)  
   3.3  Collection (e.g. institution-id)  
   3.4  Term (e.g. noaa-gfdl)  

## Pre-Requisites

1.  Download pyessv-archive repository to local file system:  
    
    git clone https://github.com/ES-DOC/pyessv-archive.git YOUR_WORK_DIRECTORY  
    

2.  Create ES-DOC pyessv folder:  

    mkdir ~/.esdoc/pyessv
    
3.  Create authority sym links:

    ln -s YOUR_WORK_DIRECTORY/pyessv-archive/wcrp ~/.esdoc/pyessv

## Setup

In [1]:
# The library auto-initializes upon import.  
import pyessv

2017-12-08T15:05:48.160491 [INFO] :: ESDOC-PYESSV :: Loading vocabularies from /Users/macg/.esdoc/pyessv-archive:
2017-12-08T15:05:49.538367 [INFO] :: ESDOC-PYESSV :: ... loaded: wcrp
2017-12-08T15:05:49.566444 [INFO] :: ESDOC-PYESSV :: ... loaded: esdoc


## Loading

#### Loading by namespace

In [2]:
# Load authority: WCRP.
wcrp = pyessv.load('wcrp')
assert isinstance(wcrp, pyessv.Authority)

In [3]:
# Load scope: CMIP6.
cmip6 = pyessv.load('wcrp:cmip6')
assert isinstance(cmip6, pyessv.Scope)

In [4]:
# Load collection: institutions.
institutions = pyessv.load('wcrp:cmip6:institution-id')
assert isinstance(institutions, pyessv.Collection)

In [5]:
# Load term: IPSL.
noaa_gfdl = pyessv.load('wcrp:cmip6:institution-id:noaa-gfdl')
assert isinstance(noaa_gfdl, pyessv.Term)

#### Loading by unique identifer

In [6]:
wcrp == pyessv.load('35804ce8-9754-4c47-b599-e376823307b2')
assert isinstance(wcrp, pyessv.Authority)

In [7]:
cmip6 == pyessv.load('18ac371f-12b6-460b-8cd6-e378f51b5414')
assert isinstance(cmip6, pyessv.Scope)

In [8]:
institutions == pyessv.load('7d4c8d3b-9d8f-4ffb-8378-50651cdd943f')
assert isinstance(institutions, pyessv.Collection)

In [9]:
noaa_gfdl == pyessv.load('56065fc9-08ec-4572-a6a6-52cd9d9d1d68')
assert isinstance(noaa_gfdl, pyessv.Term)

## Iteration

#### Simple iteration of vocabulary hierarchy

In [10]:
# Iterate scopes managed by an authority.
for scope in wcrp:
    assert isinstance(scope, pyessv.Scope)
    
# Iterate collections within a scope.
for collection in cmip6:
    assert isinstance(collection, pyessv.Collection)
    
# Iterate terms within a collection.
for term in institutions:
    assert isinstance(term, pyessv.Term)

#### Iterables are sorted

In [11]:
# Iterables are sorted: scopes
for scope in wcrp:
    print(scope)
    for collection in cmip6:
        print(collection)
        for term in institutions:
            print(term)

wcrp:cmip5
wcrp:cmip6:activity-id
wcrp:cmip6:institution-id:awi
wcrp:cmip6:institution-id:bcc
wcrp:cmip6:institution-id:bnu
wcrp:cmip6:institution-id:cams
wcrp:cmip6:institution-id:cas
wcrp:cmip6:institution-id:cccma
wcrp:cmip6:institution-id:cccr-iitm
wcrp:cmip6:institution-id:cmcc
wcrp:cmip6:institution-id:cnrm-cerfacs
wcrp:cmip6:institution-id:csir-csiro
wcrp:cmip6:institution-id:csiro-bom
wcrp:cmip6:institution-id:ec-earth-consortium
wcrp:cmip6:institution-id:fio-ronm
wcrp:cmip6:institution-id:hammoz-consortium
wcrp:cmip6:institution-id:inm
wcrp:cmip6:institution-id:inpe
wcrp:cmip6:institution-id:ipsl
wcrp:cmip6:institution-id:messy-consortium
wcrp:cmip6:institution-id:miroc
wcrp:cmip6:institution-id:mohc
wcrp:cmip6:institution-id:mpi-m
wcrp:cmip6:institution-id:mri
wcrp:cmip6:institution-id:nasa-giss
wcrp:cmip6:institution-id:ncar
wcrp:cmip6:institution-id:ncc
wcrp:cmip6:institution-id:nerc
wcrp:cmip6:institution-id:nims-kma
wcrp:cmip6:institution-id:niwa
wcrp:cmip6:institution-id

wcrp:cmip6:institution-id:cccma
wcrp:cmip6:institution-id:cccr-iitm
wcrp:cmip6:institution-id:cmcc
wcrp:cmip6:institution-id:cnrm-cerfacs
wcrp:cmip6:institution-id:csir-csiro
wcrp:cmip6:institution-id:csiro-bom
wcrp:cmip6:institution-id:ec-earth-consortium
wcrp:cmip6:institution-id:fio-ronm
wcrp:cmip6:institution-id:hammoz-consortium
wcrp:cmip6:institution-id:inm
wcrp:cmip6:institution-id:inpe
wcrp:cmip6:institution-id:ipsl
wcrp:cmip6:institution-id:messy-consortium
wcrp:cmip6:institution-id:miroc
wcrp:cmip6:institution-id:mohc
wcrp:cmip6:institution-id:mpi-m
wcrp:cmip6:institution-id:mri
wcrp:cmip6:institution-id:nasa-giss
wcrp:cmip6:institution-id:ncar
wcrp:cmip6:institution-id:ncc
wcrp:cmip6:institution-id:nerc
wcrp:cmip6:institution-id:nims-kma
wcrp:cmip6:institution-id:niwa
wcrp:cmip6:institution-id:noaa-gfdl
wcrp:cmip6:institution-id:nuist
wcrp:cmip6:institution-id:pcmdi
wcrp:cmip6:institution-id:snu
wcrp:cmip6:institution-id:thu
wcrp:cmip6:source-id
wcrp:cmip6:institution-id:awi

wcrp:cmip6:institution-id:nuist
wcrp:cmip6:institution-id:pcmdi
wcrp:cmip6:institution-id:snu
wcrp:cmip6:institution-id:thu
wcrp:cmip6:thredds-exclude-variables
wcrp:cmip6:institution-id:awi
wcrp:cmip6:institution-id:bcc
wcrp:cmip6:institution-id:bnu
wcrp:cmip6:institution-id:cams
wcrp:cmip6:institution-id:cas
wcrp:cmip6:institution-id:cccma
wcrp:cmip6:institution-id:cccr-iitm
wcrp:cmip6:institution-id:cmcc
wcrp:cmip6:institution-id:cnrm-cerfacs
wcrp:cmip6:institution-id:csir-csiro
wcrp:cmip6:institution-id:csiro-bom
wcrp:cmip6:institution-id:ec-earth-consortium
wcrp:cmip6:institution-id:fio-ronm
wcrp:cmip6:institution-id:hammoz-consortium
wcrp:cmip6:institution-id:inm
wcrp:cmip6:institution-id:inpe
wcrp:cmip6:institution-id:ipsl
wcrp:cmip6:institution-id:messy-consortium
wcrp:cmip6:institution-id:miroc
wcrp:cmip6:institution-id:mohc
wcrp:cmip6:institution-id:mpi-m
wcrp:cmip6:institution-id:mri
wcrp:cmip6:institution-id:nasa-giss
wcrp:cmip6:institution-id:ncar
wcrp:cmip6:institution-id

#### Key or attribute based access

In [12]:
# Set pointer to a scope within an authority.
assert wcrp['cmip6'] == cmip6
assert wcrp.cmip6 == cmip6

In [13]:
# Set pointer to a collection within a scope.
assert wcrp['cmip6']['institution-id'] == institutions
assert wcrp.cmip6.institution_id == institutions

In [14]:
# Set pointer to a term within a collection.
assert wcrp['cmip6']['institution-id']['noaa-gfdl'] == noaa_gfdl
assert wcrp.cmip6.institution_id.noaa_gfdl == noaa_gfdl

## Properties

#### All domain objects have the following standard properties: description, label, name, namespace, raw_name, uid, url

### Authority properties

In [15]:
# Canonical name (ALWAYS lower cased).
print('Authority canonical_name = {}'.format(wcrp.canonical_name))

# Raw name.
print('Authority raw_name = {}'.format(wcrp.raw_name))

# Label for UI purposes.
print('Authority label = {}'.format(wcrp.label))

# Description.
print('Authority description = {}'.format(wcrp.description))

# Homepage / URL.
print('Authority url = {}'.format(wcrp.url))

# Namespace.
print('Authority namespace = {}'.format(wcrp.namespace))

# Universally unique identifier (assigned at point of creation).
print('Authority uid = {}'.format(wcrp.uid))

# Creation date
print('Authority create_date = {}'.format(wcrp.create_date))

Authority canonical_name = wcrp
Authority raw_name = WCRP
Authority label = WCRP
Authority description = World Climate Research Program
Authority url = https://www.wcrp-climate.org/wgcm-overview
Authority namespace = wcrp
Authority uid = 35804ce8-9754-4c47-b599-e376823307b2
Authority create_date = 2017-06-21 00:00:00+00:00


### Scope properties

In [16]:
# Canonical name (ALWAYS lower cased).
print('Scope canonical_name = {}'.format(cmip6.canonical_name))

# Raw name.
print('Scope raw_name = {}'.format(cmip6.raw_name))

# Label for UI purposes.
print('Scope label = {}'.format(cmip6.label))

# Description.
print('Scope description = {}'.format(cmip6.description))

# Homepage / URL.
print('Scope url = {}'.format(cmip6.url))

# Namespace.
print('Scope namespace = {}'.format(cmip6.namespace))

# Universally unique identifier (assigned at point of creation).
print('Scope uid = {}'.format(cmip6.uid))

# Creation date
print('Scope create_date = {}'.format(cmip6.create_date))

Scope canonical_name = cmip6
Scope raw_name = CMIP6
Scope label = CMIP6
Scope description = Controlled Vocabularies (CVs) for use in CMIP6
Scope url = https://github.com/WCRP-CMIP/CMIP6_CVs
Scope namespace = wcrp:cmip6
Scope uid = 18ac371f-12b6-460b-8cd6-e378f51b5414
Scope create_date = 2017-06-21 00:00:00+00:00


### Collection properties

In [17]:
# Canonical name (ALWAYS lower cased).
print('Collection canonical_name = {}'.format(institutions.canonical_name))

# Raw name.
print('Collection raw_name = {}'.format(institutions.raw_name))

# Label for UI purposes.
print('Collection label = {}'.format(institutions.label))

# Description.
print('Collection description = {}'.format(institutions.description))

# Homepage / URL.
print('Collection url = {}'.format(institutions.url))

# Namespace.
print('Collection namespace = {}'.format(institutions.namespace))

# Universally unique identifier (assigned at point of creation).
print('Collection uid = {}'.format(institutions.uid))

# Creation date
print('Collection create_date = {}'.format(institutions.create_date))

Collection canonical_name = institution-id
Collection raw_name = institution_id
Collection label = Institution ID
Collection description = WCRP CMIP6 CV collection:
Collection url = None
Collection namespace = wcrp:cmip6:institution-id
Collection uid = 7d4c8d3b-9d8f-4ffb-8378-50651cdd943f
Collection create_date = 2017-06-21 00:00:00+00:00


### Term properties

In [18]:
# Canonical name (ALWAYS lower cased).
print('Term canonical_name = {}'.format(noaa_gfdl.canonical_name))

# Raw name.
print('Term raw_name = {}'.format(noaa_gfdl.raw_name))

# Label for UI purposes.
print('Term label = {}'.format(noaa_gfdl.label))

# Description (optional).
print('Term description = {}'.format(noaa_gfdl.description))

# Homepage / URL (optional).
print('Term url = {}'.format(noaa_gfdl.url))

# Namespace.
print('Term namespace = {}'.format(noaa_gfdl.namespace))

# Universally unique identifier (assigned at point of creation).
print('Term uid = {}'.format(noaa_gfdl.uid))

# Governance status.
print('Term status = {}'.format(noaa_gfdl.status))

# Creation date
print('Term create_date = {}'.format(noaa_gfdl.create_date))

Term canonical_name = noaa-gfdl
Term raw_name = NOAA-GFDL
Term label = NOAA-GFDL
Term description = None
Term url = None
Term namespace = wcrp:cmip6:institution-id:noaa-gfdl
Term uid = 56065fc9-08ec-4572-a6a6-52cd9d9d1d68
Term status = pending
Term create_date = 2017-06-21 00:00:00+00:00


## Encoding

In [19]:
# Encode authority as a python dictionary.
assert isinstance(pyessv.encode(wcrp, 'dict'), dict)

# Encode scope as a python dictionary.
assert isinstance(pyessv.encode(cmip6, 'dict'), dict)

# Encode collection as a python dictionary.
assert isinstance(pyessv.encode(institutions, 'dict'), dict)

# Encode term as a python dictionary.
assert isinstance(pyessv.encode(noaa_gfdl, 'dict'), dict)

In [20]:
# Encode authority as a JSON text blob.
assert isinstance(pyessv.encode(wcrp, 'json'), basestring)

# Encode scope as a JSON text blob.
assert isinstance(pyessv.encode(cmip6), basestring)

# Encode collection as a JSON text blob.
assert isinstance(pyessv.encode(institutions), basestring)

# Encode term as a JSON text blob.
assert isinstance(pyessv.encode(noaa_gfdl), basestring)

## Parsing

#### Parsing strictness options

In [21]:
# Parsing strictness 0: canonical-name;
assert pyessv.PARSING_STRICTNESS_0 == 0

# Parsing strictness 1: raw-name;
assert pyessv.PARSING_STRICTNESS_1 == 1

# Parsing strictness 2: canonical-name + raw-name;
# NOTE - this is the default;
assert pyessv.PARSING_STRICTNESS_2 == 2

# Parsing strictness 3: 2 + synonyms
assert pyessv.PARSING_STRICTNESS_3 == 3

# Parsing strictness 4: 3 + case-insensitive
assert pyessv.PARSING_STRICTNESS_4 == 4

#### Parsing level 0 - canonical name

In [22]:
assert pyessv.parse('wcrp', strictness=0) == 'wcrp'
assert pyessv.parse('wcrp:cmip6', strictness=0) == 'cmip6'
assert pyessv.parse('wcrp:cmip6:institution-id', strictness=0) == 'institution-id'
assert pyessv.parse('wcrp:cmip6:institution-id:ipsl', strictness=0) == 'ipsl'

#### Parsing level 1 - raw name

In [23]:
assert pyessv.parse('WCRP', strictness=1) == 'wcrp'
assert pyessv.parse('WCRP:CMIP6', strictness=1) == 'cmip6'
assert pyessv.parse('WCRP:CMIP6:institution_id', strictness=1) == 'institution-id'
assert pyessv.parse('WCRP:CMIP6:institution_id:IPSL', strictness=1) == 'ipsl'

#### Parsing level 2 - canonical name | raw name

In [24]:
assert pyessv.parse('WCRP', strictness=2) == 'wcrp'
assert pyessv.parse('WCRP:cmip6', strictness=2) == 'cmip6'
assert pyessv.parse('WCRP:cmip6:institution_id', strictness=2) == 'institution-id'
assert pyessv.parse('WCRP:cmip6:institution_id:IPSL', strictness=2) == 'ipsl'

#### Parsing level 4 - canonical name | raw name | synonyms | case-insensitive

In [25]:
# Parsing strictness 4: 3 + case-insensitive
assert pyessv.parse('wCRp', strictness=4) == 'wcrp'
assert pyessv.parse('wCRp:cMIp6', strictness=4) == 'cmip6'
assert pyessv.parse('wCRp:cMIp6:inSTitutION-id', strictness=4) == 'institution-id'
assert pyessv.parse('wCRp:cMIp6:inSTitutION-id:IPsl', strictness=4) == 'ipsl'

#### Parsing error is raised (& logged) upon an unsuccessful parse

In [26]:
# Parse invalid authority.
try:
    pyessv.parse('xxx')
except pyessv.ParsingError:
    pass



In [27]:
# Parse invalid scope.
try:
    pyessv.parse('wcrp:xxx')
except pyessv.ParsingError:
    pass



In [28]:
# Parse invalid collection.
try:
    pyessv.parse('wcrp:cmip6:xxx')
except pyessv.ParsingError:
    pass



In [29]:
# Parse invalid term.
try:
    pyessv.parse('wcrp:cmip6:institution-id:xxx')
except pyessv.ParsingError:
    pass



## Regular Expression Collections

In [30]:
# Create a collection specifying a regular expression to be applied against terms.
ensemble_members = pyessv.create_collection(
    cmip6,
    "test-regex-collection", 
    description="Ensemble member",
    term_regex=r'r[0-9]i[0-9]p[0-9]f[0-9]'
)

# Create a valid term.
term = pyessv.create_term(ensemble_members, "r1i1p1f1", description="valid-regex-term")
assert pyessv.is_valid(term) == True

# # Create an invalid term - raises ValidationError.
try:
    pyessv.create_term(ensemble_members, "ABC-DEF", "invalid-regex-term")
except pyessv.ValidationError:
    pass

# Parse a name.
assert pyessv.parse('wcrp:cmip6:test-regex-collection:r1i1p1f1') == 'r1i1p1f1'

## Template parsing

#### Specify a string template plus associated CV collections then create parser.

In [31]:
# Set template.
template = 'ciclad/CMIP6/{}/{}/{}/{}/afilename.nc1'

# Set seperator.
seperator = '/'

# Set collections.
collections = (
    'wcrp:cmip6:institution-id',
    'wcrp:cmip6:activity-id',
    'wcrp:cmip6:source-id',
    'wcrp:cmip6:experiment-id'
    )

# Set parsing stricness = 1 (raw-name).  
strictness = pyessv.PARSING_STRICTNESS_1

# Create parser.
parser = pyessv.create_template_parser(template, collections, strictness, seperator)

# Parsing: valid.
# parser.parse('ciclad/CMIP6/ipsl/dcpp/hadgem3-gc31-ll/dcppc-atl-spg/afilename.nc1')
parser.parse('ciclad/CMIP6/IPSL/DCPP/HadGEM3-GC31-LL/dcppC-atl-spg/afilename.nc1')

# Parsing: invalid - raises TemplateParsingError. 
try:
    parser.parse('ciclad/cmip6/WWW/XXX/YYY/ZZZ/afilename.nc1')
except pyessv.TemplateParsingError:
    pass