# NVS P07 and P06 to the CF Standard Names XML

This is the the first attempt and creating the standard name table from information contained only within NVS (and some units mappings). Note that I did not use any sort of JSON-LD or RDF library, this was directly manipulating the JSON-LD. This is mostly due to my inexperience with these types of libraries. I found this manipulation to be very easy with the exception of sorting out the aliases, but I suspect an RDF library would not have helped much in the problem I encountered.

Simple imports to start with... except for cfunits and requests, this is all standard library, if trying to run this in another environment, you might need to manipulate how cfunits is doing stuff.

In [1]:
import requests as rq
import xml.etree.ElementTree as ET
from datetime import datetime
from collections import defaultdict

# if testing for udunits2, this is very specific to mac os and how homebrew does stuff on arm macs
import os
os.environ["DYLD_LIBRARY_PATH"] = "/opt/homebrew/Cellar/udunits/2.2.28/lib/"
from cfunits import Units

These are the URIs at nerc for the nvs. Note that I'm using the accept header to ask for the JSON-LD and not some specific URI, query params I think are used in the NVS links on the landing pages.

In [2]:
P07 = "http://vocab.nerc.ac.uk/collection/P07/current/" # CF Standard Names
P06 = "http://vocab.nerc.ac.uk/collection/P06/current/" # Units

In [3]:
headers = {"Accept": "application/ld+json"}

In [4]:
p07_ld = rq.get(P07, headers=headers).json()
p06_ld = rq.get(P06, headers=headers).json()

In [5]:
p06_ld_by_id = {item["@id"]:item for item in p06_ld["@graph"]}

In [6]:
#extract collection information, seperate active names and aliases
collection_info = None
standard_names = {}
deprecated = {}
for node in p07_ld["@graph"]:
    if node["@id"] == P07:
        collection_info = node
    elif node["owl:deprecated"] == "true":
        deprecated[node["@id"]] = node
    elif node["owl:deprecated"] == "false":
        standard_names[node["@id"]] = node

In [7]:
# Make xml root
root = ET.Element("standard_name_table", {"xmlns:xsi":"https://www.w3.org/2001/XMLSchema-instance",
 "xsi:noNamespaceSchemaLocation":"https://cfconventions.org/Data/schema-files/cf-standard-name-table-2.0.xsd"})

The xml is ordered as per Appendix B so the code must be in order. The following is missing from NVS itself:
* first_published
* contact

The institution value differs from what is in the online published xml

In [8]:
table_version = collection_info["owl:versionInfo"]
version_number = ET.SubElement(root, "version_number")
version_number.text = table_version
conventions = version_number = ET.SubElement(root, "conventions")
conventions.text = f"CF-StandardNameTable-{table_version}"

# There is only one time it seems, so using for both
dt = datetime.strptime(collection_info["dc:date"], "%Y-%m-%d %H:%M:%S.%f")
time_str = dt.strftime("%Y-%m-%dT%H:%M:%SZ")
first_published = ET.SubElement(root, "first_published")
first_published.text = time_str
last_modified = ET.SubElement(root, "last_modified")
last_modified.text = time_str

institution = ET.SubElement(root, "institution")
institution.text = collection_info["dc:creator"]

# There is no contact info in NVS
contact = ET.SubElement(root, "contact")
contact.text = "support@ceda.ac.uk"

The following prepares a concept id to unit string mapping, I found the altLabel to be very close to udunits already and only had to have a few custom mappings (18 of em). This is used as a lookup table when writing the entry records.

I suspect that using the P06 to QUDT relationship might allow this custom mapping to go away.

**12 of the standard names contain no unit information at all** See the print output for which ones

In [9]:
names_no_units = []

alt_to_udunits = { # hack to just... make thing valid for now, QUDT might be the actual way to do this
    "deg": "degree",
    "Dmnless": "1",
    "NA": None,
    # seems udunits2 doesn't like the / notation
    "/m": "m-1",
    "/m^2": "m-2",
    "/s": "s-1",
    "/m^3/s": "m-3 s-1",
    "/m^3": "m-3",
    "/s^2": "s-2",
    "deg/m": "degree m-1",
    "#/m^3": "m-3",
    "deg/s": "degree s-1",
    "/m/sr": "m-1 sr-1",
    "/m^2/s": "m-2 s-1",
    "#/m^2": "m-2",
    "/Pa/s": "Pa-1 s-1",
    "/m/s": "m-1 s-1",
    "/sr": "sr-1",
}

def ensure_list(r):
    if isinstance(r, list):
        return r
    return [r]

canonical_units_dict = {}

for id, name in sorted(standard_names.items(), key=lambda x: x[1]["skos:prefLabel"]["@value"]):
    try:
        related = name["skos:related"]
    except KeyError:
        names_no_units.append(name)

    for concept in ensure_list(related):
        if not concept["@id"].startswith(P06):
            continue
        unit = p06_ld_by_id[concept["@id"]]["skos:altLabel"]
    units = alt_to_udunits.get(unit, unit)
    if units is not None:
        canonical_units_dict[id] = units
for name in names_no_units:
    print(name["@id"], name["skos:prefLabel"]["@value"])

http://vocab.nerc.ac.uk/collection/P07/current/AKK6D0XA/ aerodynamic_resistance
http://vocab.nerc.ac.uk/collection/P07/current/EQUNJT0R/ isotope_ratio_of_18O_to_16O_in_sea_water_excluding_solutes_and_solids
http://vocab.nerc.ac.uk/collection/P07/current/CF12N559/ ocean_salt_x_transport
http://vocab.nerc.ac.uk/collection/P07/current/CF12N560/ ocean_salt_y_transport
http://vocab.nerc.ac.uk/collection/P07/current/OJUDV53W/ ratio_of_volume_extinction_coefficient_to_volume_backwards_scattering_coefficient_by_ranging_instrument_in_air_due_to_ambient_aerosol_particles
http://vocab.nerc.ac.uk/collection/P07/current/4FD2J2GJ/ storm_motion_speed
http://vocab.nerc.ac.uk/collection/P07/current/CF12N787/ tendency_of_sea_water_salinity
http://vocab.nerc.ac.uk/collection/P07/current/CFV8N121/ tendency_of_sea_water_salinity_due_to_advection
http://vocab.nerc.ac.uk/collection/P07/current/CFV8N123/ tendency_of_sea_water_salinity_due_to_horizontal_mixing
http://vocab.nerc.ac.uk/collection/P07/current/CFV

This creates the standard name entries, NVS does not have GRIB or AIMP mappings so these nodes were added but no value set. There is some discussion to drop these from the standard name table so this might not be an issue later.

This also asserts that all the units are valid udunit strings.

In [10]:
# Put all the standard name entries in
for id, name in sorted(standard_names.items(), key=lambda x: x[1]["skos:prefLabel"]["@value"]):
    entry = ET.SubElement(root, "entry", id=name["skos:prefLabel"]["@value"])
    canonical_units = ET.SubElement(entry, "canonical_units")
    if (units := canonical_units_dict.get(id)) is not None:
        canonical_units.text = units
        assert Units(units).isvalid
    grib = ET.SubElement(entry, "grib") # NVS Does not have GRIB
    amip = ET.SubElement(entry, "amip") # NVS Does not have AIMP
    description = ET.SubElement(entry, "description")
    try:
        description.text = name["skos:definition"]["@value"]
    except TypeError:
        description.text = ""

I found aliases to be a bit of a mess and this output is probably incomplete. NVS has many deprecated terms. I initially started with the assumption that ever deprecated term needs an alias entry. Not every deprecated term has an "isReplacedBy" property. Through a bunch of playing around, I found it easiest to check the list of non-deprecated standard names for their "sameAs" property and checked those if they were in the deprecated terms list. Some deprecated terms were replaced by other terms that themselves got deprecated, when writing code that tried to walk this tree/graph, I hit cycles and opted for the following simple but likely incomplete impliemnation

In [11]:
# Sort out aliases
aliases = defaultdict(list)
for id, name in sorted(standard_names.items(), key=lambda x: x[1]["skos:prefLabel"]["@value"]):
    #alias = ET.SubElement(root, "alias", id=name["skos:prefLabel"]["@value"])
    sameAs = name["owl:sameAs"]
    if isinstance(sameAs, list):
        for same in sameAs:
            if same["@id"] in deprecated:
                aliases[same["@id"]].append(name["@id"])
    else:
        # in testing this never got hit as a condition
        if sameAs["@id"] in deprecated:
            print(sameAs["@id"])

# looping though deprecated this was so the alias list is osrted by id
for id, name in sorted(deprecated.items(), key=lambda x: x[1]["skos:prefLabel"]["@value"]):
    if id in aliases:
        alias = ET.SubElement(root, "alias", id=name["skos:prefLabel"]["@value"])
        for entry in aliases[id]:
            entry_id = ET.SubElement(alias, "entry_id")
            entry_id.text = standard_names[entry]["skos:prefLabel"]["@value"]

In [12]:
ET.indent(root)
tree = ET.ElementTree(root)

In [13]:
tree.write("nvs-to-std-names.xml", xml_declaration=True)