# eMAST Vocabulary Service

Before DIVER can be queried, the host URL and API token must be set. This is not needed to query the Vocab Service only, but is for queries that involve both DIVER and the Vocab Service working in conjunction

The Vocabulary Service directly access the variable definitions as a CSV file.

## Import libraries and set configuration variables


In [1]:
import VocabService as vocab
import diver.hiev as meta 
from diver.hiev import pretty_print_json as pjson

# Set token and host url if you have not already edited the config.ini file
meta.set_token('w5SPghuczm32WcQQaJ3m')
meta.set_host('https://rdsi-emast4-vm.intersect.org.au')

Any variable can easily have it's definition fetched by calling ***variableDefine*** and passing the variable's short name.

In [23]:
print ("The definition of life_form: " + vocab.variableDefine('life_form'))

The definition of life_form: Plants life form is a classification that takes in account plants structural architecture (e.g. tree, shrub, herb, grass, fern, forb, etc). Same species can present different life form as a way to adapt to different environments. 


The ***vocabLookup*** function allows any column to be extracted for a variable by specifying the variables short name and then giving the name of the column you would like returned. The contents of the column being fetched is returned as a single string.

In [24]:
print ("The units of life_form: " + vocab.Lookup('life_form', 'Unit') + "\n")
print ("The units of max_photo_A_sat_light_leaf_area: " + vocab.Lookup('max_photo_A_sat_light_leaf_area', 'Unit'))

The units of life_form: unitless

The units of max_photo_A_sat_light_leaf_area: _mol CO2 m-2 s-1


## There are lots of other functions for returning information from the Vocab Service

The following functions all return lists of strings to return the results of queries that may have multiple matches. 

In [25]:
print ("Variables in the category Isotypes:\n  " + '\n  '.join(vocab.listVariables('Isotopes')))
print ()
print ("Categories that leaf_area is in:\n  " + '\n  '.join(vocab.variableCategories('leaf_area')))
print ()
print ("Labels for max_photo_A_sat_light_leaf_area: \n  " + '\n  '.join(vocab.listLabels('max_photo_A_sat_light_leaf_area')))
print ()
print ("Labels for quantum_yeld:\n  " + '\n  '.join(vocab.listLabels('quantum_yeld')))
print ()
print ("Variables with the label 'leaf':\n  " + '\n  '.join(vocab.variablesWithLabel('leaf')))
print ()
print ("Variables with the word 'carbon' in their definition:\n  " + '\n  '.join(vocab.variablesDefinedWith('carbon')))

Variables in the category Isotypes:
  carbon_isotope_delta_13
  trans_leaf_area_sat_light
  stomatal_cond_leaf_area_sat_light
  max_height
  huber_value
  max_electron_transport_leaf_area_25_celsius
  leaf_area_index_LAI_vert_photography
  leaf_area_index_LAI_aerial
  leaf_area
  life_form
  leaf_dry_mass
  leaf_dry_mass_per_area
  leaf_width_dim
  midday_leaf_water_potential
  Nitrogen_N_mass
  pre_dawn_leaf_water_potential
  light_use_efficiency
  Phosphorus_P_mass

Categories that leaf_area is in:
  Isotopes

Labels for max_photo_A_sat_light_leaf_area: 
  maximum
   photosynthesis
   saturated light
   leaf area

Labels for quantum_yeld:
  leaf
   quantum yield

Variables with the label 'leaf':
  leaf_dry_mass
  leaf_dry_mass_per_area
  leaf_width_dim
  midday_leaf_water_potential
  quantum_yeld

Variables with the word 'carbon' in their definition:
  max_photo_A_sat_light_leaf_area
  max_photo_A_sat_light_CO2_leaf_area
  max_carboxylation_leaf_area
  max_electron_transport_leaf_are

## Querying a NetCDF file directly once the file location has been fetched

The ***variablesIn*** function returns a list of the variables that are contained within a file. This information is obtained by opening the file directly and querying it. The Vocab Service is then queries once per variable to fetch the definition for all the variables.

In [26]:
variables = vocab.variablesIn('eMAST_eWATCH_day_prec_v1m0_1979_2012_19900701.nc')
print(variables)
print()
for variable in variables:
    print (variable + ": " + vocab.variableDefine(variable))


RuntimeError: No such file or directory

## Fetching a list of files and their full paths from DIVER

This example searches for a list of files by date range and product (there are many other ways to search for a file), then displays the metadata for the file that is returned by DIVER.

In [27]:
# Search for all TOA5 and text files
# Args: There are many available, for the full list of args and their format, 
# see https://github.com/IntersectAustralia/dc21-doc/blob/master/Search_API.md
#results = meta.search(variables=['latitude'], )
#print (results)

results = meta.search(from_date='1990-01-01', to_date='1990-03-01', experiments=['4'], quiet=True)
file_list = []

for file in results:
    print (str(file['file_id']) + ": " + file['path'])
    file_list.append(file['file_id'])
    

# This list of files could be allocated to a compute node to operate on.
print()
results = meta.search(file_id=file_list[0])
print ()
print (results)

7646: /emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900302.nc
7647: /emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900301.nc
7666: /emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900218.nc
7678: /emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900227.nc
7679: /emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900226.nc
7680: /emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900225.nc
7681: /emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900224.nc
7682: /emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900223.nc
7683: /emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900222.nc
7684: /emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900221.nc
7685: /emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900220.nc
7686: /emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900219.nc
7687: /emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900217.nc
7688: /emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_1990

## Here are some queries that are needed to support tab completion

The ability to get a list of all unqiue variables contained within a search. This is really about querying all the files returned by the search and building a list. The primary search here is likely to be a single product.

In [28]:
results = meta.search(from_date='1990-01-01', to_date='1990-03-01', experiments=['4'], quiet=True)
variable_list = []

for file in results:
    variables = vocab.variablesIn(file['path'])
    for variable in variables:
        if variable not in variable_list:
            variable_list.append(variable)
            
print (variable_list)

RuntimeError: No such file or directory

The ability to list all the products that contain data with a specified variable. This is primarily to drive the tab-completion.

Build the list of unique variables in a product (using the code above), index it in a map by product. This may be slow and should be cached somehow.

In [29]:
# Fetch the list of proucts using the new API call.
# Iterate over it.

results = meta.search(from_date='1990-01-01', to_date='1990-03-01', experiments=['4'], quiet=True)

print(results)

variable_list = []

for file in results:
    variables = vocab.variablesIn(file['path'])
    for variable in variables:
        if variable not in variable_list:
            variable_list.append(variable)
            
print (variable_list)

[{'file_processing_description': '', 'path': '/emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900302.nc', 'filename': 'eMAST_eWATCH_day_snow_v1m0_1979_2012_19900302.nc', 'format': 'NETCDF', 'experiment_id': 4, 'published_by_id': None, 'created_by_id': 7, 'interval': None, 'file_processing_status': 'CLEANSED', 'published_date': None, 'file_id': 7646, 'published': False, 'start_time': '1990-03-02T01:00:00+11:00', 'id': 'eMAST_eWATCH_day_snow_v1m0_1979_2012__1990-03-02 01:00_1990-03-02 01:00', 'facility_id': 4, 'file_size': 1066696.0, 'url': 'https://rdsi-emast4-vm.intersect.org.au/data_files/7646/download.json', 'created_at': '2015-03-25T16:39:10+11:00', 'end_time': '1990-03-02T01:00:00+11:00', 'updated_at': '2015-03-25T16:39:10+11:00'}, {'file_processing_description': '', 'path': '/emast/data/meta/eMAST_eWATCH_day_snow_v1m0_1979_2012_19900301.nc', 'filename': 'eMAST_eWATCH_day_snow_v1m0_1979_2012_19900301.nc', 'format': 'NETCDF', 'experiment_id': 4, 'published_by_id': None, 'cre

RuntimeError: No such file or directory

## This code is only here to autoreload libraries while they in development

In [7]:
%reload_ext autoreload
%autoreload 2

In [9]:
from IPython.html import widgets
from IPython.display import display

# First get the JSON for all available variables
#results = meta.list_variables(quiet=False)
#print (results)

# Now filter that JSON to get just the 'name' variable (hiev.VAR_NAME or "name" as the second argument)
#names = meta.get_variables(results, meta.VAR_NAME)
# Get a list of data types

#print (names)

#data_types = meta.get_variables(results, meta.VAR_DATA_TYPE)
# Get a list of column mappings
#print (data_types)

first = widgets.Dropdown(
    options = vocab.commonNames(),
    description = 'Common name:',
)

def on_trait_change_func():
    #print ("State change to " + first.value + "!")
    second.options = vocab.listVariables(first.value)
    

first.on_trait_change(on_trait_change_func, name='value')

display(first)

second = widgets.Dropdown(
    options = vocab.listVariables(first.value),
    description = 'Variable name:',
)

display(second)

temporal = widgets.FloatSlider(
    value=7.5,
    min=5.0,
    max=10.0,
    step=0.1,
    description='Start Date:',
)

display(temporal)

In [10]:
itr = 1

for file in ['first', 'second', 'third', 'forth']


