**identifierPropertyMissing.ipynb**

EDAM identifier concept is missing a mandatory or recommended annotation property

**Documentation:** https://github.com/edamontology/edamverify/blob/master/docs/identifierPropertyMissing.md

Set constants for script return values. Load EDAM_dev.owl from GitHub into an RDF graph.

In [5]:
import os
from rdflib import ConjunctiveGraph, Namespace
import json
from collections import OrderedDict

# Constants for script error reporting as per https://github.com/edamontology/edamverify.
NOERR = "NOERR"
INFO  = "INFO"
WARN  = "WARN"
ERROR = "ERROR"

#Load EDAM_dev.owl from GitHub into an RDF graph.
print("Loading graph ...", end="")
g = ConjunctiveGraph()
g.load(os.environ.get('EDAM_PATH', '../EDAM_dev.owl'), format='xml')
# g.load('https://raw.githubusercontent.com/edamontology/edamontology/master/EDAM_dev.owl', format='xml')
# g.load('EDAM_dev.owl', format='xml')
g.bind('edam', Namespace('http://edamontology.org#'))
print("done!")



Loading graph ...done!


https://neuroml.org/neuromlv2|https://neuroml-db.org/ does not look like a valid URI, trying to serialize this will break.


Define SPARQL query to retrieve ID, term, and all properties that are mandatory or recommended for Format concepts. Run the query.

**NB.1:** BASE is used to define the define the default namespace (for various elements below).

**NB.2:** A concept is considered to be an identifier if it has a property ``<oboInOwl:inSubset>`` whose value contains the string ``identifiers``, e.g. ``<oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/edam#identifiers"/>`` or ``<oboInOwl:inSubset>identifiers />``

**NB.3:** Many of the checks should not apply to "placeholder" Identifier concepts (see https://edamontologydocs.readthedocs.io/en/latest/technical_details.html#concept-types).  The code could be improved to take account of this.

In [6]:
# Compile SPARQL query
query_term = """
BASE <http://edamontology.org/>
SELECT ?id ?term ?subsetpath ?documentation ?regex WHERE
{
?id rdfs:label ?term .
?id oboInOwl:inSubset ?subsetpath .
OPTIONAL {?id :documentation ?documentation .}
OPTIONAL {?id :regex ?regex .}
}
"""

# Run SPARQL query and collate results
err = NOERR
errfound = False
report = list()
results = g.query(query_term)

Analyse results of query.

In [7]:
# Error message building function
def report_append(err_new, msg):
    """
    Writes a report of an error and sets the error level appropriately.
    err: the error level to set
    msg: the message to write
    """
    global err
    global errfound
    global report
    if err_new > err:
        err = err_new
    errfound = True
    report.append(msg)

report.append('Recommended properties missing from these Identifier concepts:')

for r in results :
#    print(str(r['id']), str(r['term']), str(r['ext']), str(r['exact_syn']))
    id   = str(r['id'])
    term = str(r['term'])
    subsetpath  = str(r['subsetpath'])

    documentation = str(r['documentation'])
    regex = str(r['regex'])


    # Check for Format concepts
    if "identifiers" in subsetpath:
        if documentation == "None":
            report_append(INFO, "INFO Missing property (documentation) on ::: " + id +  ' (' + term + ')')
        if regex == "None":
            report_append(INFO, "INFO Missing property (regex) on ::: " + id +  ' (' + term + ')')

Write report and return approriate value.

In [8]:
# Write report
# Use ordered dictionary to get consistent listing of properties in the JSON output
report_dict = OrderedDict

if errfound:
    report_dict = {
        'test_name': 'identifierPropertyMissing',
        'status' : err,
        'reason' : report
    }
else:
    report_dict = {
        'test_name': 'identifierPropertyMissing',
        'status' : NOERR,
        'reason' : '-'
    }

report_json = json.dumps(report_dict, indent=4)
print(report_json)

{
    "reason": [
        "Recommended properties missing from these Identifier concepts:",
        "INFO Missing property (documentation) on ::: http://edamontology.org/data_1115 (Sequence profile ID)",
        "INFO Missing property (regex) on ::: http://edamontology.org/data_1115 (Sequence profile ID)",
        "INFO Missing property (documentation) on ::: http://edamontology.org/data_2388 (TAIR accession (At gene))",
        "INFO Missing property (regex) on ::: http://edamontology.org/data_2388 (TAIR accession (At gene))",
        "INFO Missing property (documentation) on ::: http://edamontology.org/data_1900 (NCBI locus tag)",
        "INFO Missing property (regex) on ::: http://edamontology.org/data_1900 (NCBI locus tag)",
        "INFO Missing property (documentation) on ::: http://edamontology.org/data_1045 (Species name)",
        "INFO Missing property (regex) on ::: http://edamontology.org/data_1045 (Species name)",
        "INFO Missing property (documentation) on ::: http