**fileExtensionMissingSynonym.ipynb**

EDAM Format concept is missing synyonm or label matching the file extension.

**Documentation:** https://github.com/edamontology/edamverify/blob/master/docs/fileExtensionMissingSynonym.md

**NB.1: - Running the notebook**
The directory containing the ``EDAM_dev.owl`` file must be defined by ``EDAM_PATH`` environment variable.

The script requires the test to be run from a subdirectory of ``EDAM_PATH`` (hence ``'../EDAM_dev.owl'`` below)

In [36]:
import os
from rdflib import ConjunctiveGraph, Namespace
import json

# Constants for script return value as per https://github.com/edamontology/edamverify.
NOERR = "NOERR"
INFO  = "INFO"
WARN  = "WARN"
ERROR = "ERROR"

#Load EDAM_dev.owl from GitHub into an RDF graph.
print("Loading graph ...", end="")
g = ConjunctiveGraph()
g.load(os.environ.get('EDAM_PATH', '../EDAM_dev.owl'), format='xml')
# g.load('https://raw.githubusercontent.com/edamontology/edamontology/master/EDAM_dev.owl', format='xml')
# g.load('EDAM_dev.owl', format='xml')
g.bind('edam', Namespace('http://edamontology.org#'))
print("done!")



Loading graph ...done!


Define SPARQL query to retrieve ID, term, file extension and exact synonym of concepts. Run the query.

**NB:** BASE is used to define the define the default namespace (for ``file_extension`` below).

In [37]:
# Compile SPARQL query
query_term = """
BASE <http://edamontology.org/>
SELECT ?id ?term ?ext ?exact_syn WHERE
{
?id rdfs:label ?term .
?id :file_extension ?ext .
?id oboInOwl:hasExactSynonym ?exact_syn 
}
"""

# Run SPARQL query and collate results
errfound = False
errs = {}
report = list()
results = g.query(query_term)

Analyse results of query.

In [38]:
# Code is a bit bloaty below (3 separate loops) but has been tested & works well.

for r in results :
    id   = str(r['id'])
    errs[id]={}

for r in results :
#    print(str(r['id']), str(r['term']), str(r['ext']), str(r['exact_syn']))
    id   = str(r['id'])
    term = str(r['term']) 
    ext  = str(r['ext'])
    exact_syn = str(r['exact_syn'])

    if (ext.lower() != exact_syn.lower()) and (ext.lower() != term.lower()): 
        if ext not in errs[id]:
            errs[id][ext] = True
    else:
        errs[id][ext] = False

for r in results :
    id   = str(r['id'])
    term = str(r['term'])
    ext  = str(r['ext'])

    if errs[id][ext] == True:
        errfound = True
        report.append(id +  ' (' + term + '): ' + ext)

Write report and return approriate value.

In [39]:
report_obj = {}
report_obj['test_name'] = 'fileExtensionMissingSynonym'
report_obj['comment'] = 'No <exactSynonym> or <rdfs:label> found matching to <file_extension> for one or more concepts.'

if errfound:
    report_obj['status'] = WARN
    report_obj['reason'] = report
else:
    report_obj['status'] = NOERR

report_json = json.dumps(report_obj, indent=4)
print(report_json)

{
    "comment": "No <exactSynonym> or <rdfs:label> found matching to <file_extension> for one or more concepts.",
    "test_name": "fileExtensionMissingSynonym",
    "status": "NOERR"
}
