**fileExtensionMissingSynonym.ipynb**

EDAM Format concept is missing synyonm or label matching the file extension.

**Documentation:** https://github.com/edamontology/edamverify/blob/master/docs/fileExtensionMissingSynonym.md

Set constants for script return values. Load EDAM_dev.owl from GitHub into an RDF graph.

**NB.1:** The ``EDAM_PATH`` environment variable (location of ``EDAM_dev.owl`` file) must be set.  The script requires the test to be run from a subdirectory of this path, hence ``'../EDAM_dev.owl'`` below.

In [1]:
import os
from rdflib import ConjunctiveGraph, Namespace
import json
from collections import OrderedDict

# Constants for script return value as per https://github.com/edamontology/edamverify.
NOERR = "NOERR"
INFO  = "INFO"
WARN  = "WARN"
ERROR = "ERROR"

#Load EDAM_dev.owl from GitHub into an RDF graph.
print("Loading graph ...", end="")
g = ConjunctiveGraph()
g.load(os.environ.get('EDAM_PATH', '../EDAM_dev.owl'), format='xml')
# g.load('https://raw.githubusercontent.com/edamontology/edamontology/master/EDAM_dev.owl', format='xml')
# g.load('EDAM_dev.owl', format='xml')
g.bind('edam', Namespace('http://edamontology.org#'))
print("done!")



Loading graph ...done!


Define SPARQL query to retrieve ID, term, file extension and exact synonym of concepts. Run the query.

**NB:** BASE is used to define the define the default namespace (for ``file_extension`` below).

In [2]:
# Compile SPARQL query
query_term = """
BASE <http://edamontology.org/>
SELECT ?id ?term ?ext ?exact_syn WHERE
{
?id rdfs:label ?term .
?id :file_extension ?ext .
?id oboInOwl:hasExactSynonym ?exact_syn 
}
"""

# Run SPARQL query and collate results
errfound = False    
report = list()
results = g.query(query_term)

Analyse results of query.

In [3]:
for r in results :
#    print(str(r['id']), str(r['term']), str(r['ext']), str(r['exact_syn']))
    id   = str(r['id'])
    term = str(r['term']) 
    ext  = str(r['ext'])
    exact_syn = str(r['exact_syn'])

    if (ext.lower() != exact_syn.lower()) and (ext.lower() != term.lower()): 
        errfound = True
        report.append(id +  ' (' + term + '): ' + ext)

Write report and return approriate value.

In [4]:
# Write report
# Use ordered dictionary to get consistent listing of properties in the JSON output
#report_dict = OrderedDict
#if errfound:
#    report_dict = {
#        'test_name': 'fileExtensionMissingSynonym',
#        'status' : WARN,
#        'reason' : report
#    }
#else:
#    report_dict = {
#        'test_name': 'fileExtensionMissingSynonym',
#        'status' : NOERR,
#        'reason' : '-'
#    }
#report_json = json.dumps(report_dict, indent=4)
#print(report_json)

report_obj = {}
report_obj['test_name'] = 'fileExtensionMissingSynonym'
report_obj['comment'] = 'No <exactSynonym> or <rdfs:label> found matching to <file_extension> for one or more concepts.'

if errfound:
    report_obj['status'] = WARN
    report_obj['reason'] = report
else:
    report_obj['status'] = NOERR

report_json = json.dumps(report_obj, indent=4)
print(report_json)

{
    "test_name": "fileExtensionMissingSynonym",
    "reason": [
        "http://edamontology.org/format_3475 (TSV): tsv|tab",
        "http://edamontology.org/format_3475 (TSV): tsv|tab",
        "http://edamontology.org/format_3750 (YAML): yaml|yml",
        "http://edamontology.org/format_3556 (MHTML): mhtml|mht|eml",
        "http://edamontology.org/format_3556 (MHTML): mhtml|mht|eml",
        "http://edamontology.org/format_3556 (MHTML): mhtml|mht|eml",
        "http://edamontology.org/format_3556 (MHTML): mhtml|mht|eml",
        "http://edamontology.org/format_3556 (MHTML): mhtml|mht|eml",
        "http://edamontology.org/format_3556 (MHTML): mhtml|mht|eml",
        "http://edamontology.org/format_3556 (MHTML): mhtml|mht|eml",
        "http://edamontology.org/format_3789 (XQuery): xq|xqy|xquery",
        "http://edamontology.org/format_3749 (JSON-LD): jsonld",
        "http://edamontology.org/format_3746 (BIOM format): biom",
        "http://edamontology.org/format_1930 (FASTQ):