**wikipediaMisuse.ipynb**

Misuse of Wikipedia links.

**Documentation:** https://github.com/edamontology/edamverify/blob/master/docs/wikiepediaMisuse.md

**NB.1:** Currently detects the following misuses of Wikipedia links:
* ``<oboInOwl:hasDbXref rdf:resource="http://en.wikipedia.org/wiki/Simplified_molecular_input_line_entry_specification"/>``
* ``<oboInOwl:hasExactSynonym rdf:resource="https://en.wikipedia.org/wiki/Pathology"/>``


**NB.2:** The ``EDAM_PATH`` environment variable (location of ``EDAM_dev.owl`` file) must be set.  The script requires the test to be run from a subdirectory of this path, hence ``'../EDAM_dev.owl'`` below.

Set constants for script return values. Load EDAM_dev.owl from GitHub into an RDF graph.

In [5]:
import os
from rdflib import ConjunctiveGraph, Namespace
import json
from collections import OrderedDict

# Constants for script return value as per https://github.com/edamontology/edamverify.
NOERR = "NOERR"
INFO  = "INFO"
WARN  = "WARN"
ERROR = "ERROR"

#Load EDAM_dev.owl from GitHub into an RDF graph.
print("Loading graph ...", end="")
g = ConjunctiveGraph()
g.load(os.environ.get('EDAM_PATH', '../EDAM_dev.owl'), format='xml')
# g.load('https://raw.githubusercontent.com/edamontology/edamontology/master/EDAM_dev.owl', format='xml')
# g.load('EDAM_dev.owl')
g.bind('edam', Namespace('http://edamontology.org#'))
print("done!")

Loading graph ...done!


https://neuroml.org/neuromlv2|https://neuroml-db.org/ does not look like a valid URI, trying to serialize this will break.


Define SPARQL query to extract ID, term, and (if available) the exact synonym, dbxref and deprecated fieds of all concepts. Run the query.

In [6]:
# Compile SPARQL query
query_term = """
SELECT ?id ?term ?exact_syn ?dbxref ?deprecated WHERE
{
?id rdfs:label ?term .
OPTIONAL {?id oboInOwl:hasExactSynonym ?exact_syn .}
OPTIONAL {?id oboInOwl:hasDbXref ?dbxref .}
OPTIONAL {?id owl:deprecated ?deprecated .}
}
"""

# Run SPARQL query and collate results
errfound = False    
report = list()
results = g.query(query_term)

Analyse results of query.

In [7]:
for r in results :
    
    id         = str(r['id'])
    term       = str(r['term'])
    exact_syn  = str(r['exact_syn'])
    dbxref     = str(r['dbxref'])
    deprecated = str(r['deprecated'])

    # Skip deprecated concepts
    if deprecated == "true":
        continue
            
    # print(id, "(", term, ")   ", exact_syn, "   ", dbxref)

      
    # "None" string is return from SPARQL query where value was not found
    if (exact_syn != "None" and "wikipedia" in exact_syn):
        errfound = True
        report.append("Wikipedia link in <oboInOwl:hasExactSynonym> (not allowed)::: " +id + "(" + term + ")   " + exact_syn)

      
    # "None" string is return from SPARQL query where value was not found
    if (dbxref != "None" and "wikipedia" in dbxref):
        errfound = True
        report.append("Wikipedia link in <oboInOwl:hasDbXref> (not allowed) ::: " +id + "(" + term + ")   " + dbxref)

Write report and return approriate value.

In [8]:
# Write report
# Use ordered dictionary to get consistent listing of properties in the JSON output
#report_dict = OrderedDict
#if errfound:
#    report_dict = {
#        'test_name': 'wikipediaMisuse',
#        'status' : WARN,
#        'reason' : report
#    }
#else:
#    report_dict = {
#        'test_name': 'wikipediaMisuse',
#        'status' : NOERR,
#        'reason' : '-'
#    }
#report_json = json.dumps(report_dict, indent=4)
#print(report_json)


report_obj = {}
report_obj['test_name'] = 'wikipediaMisuse'
report_obj['comment'] = 'Misuse of wikepedia link for one or more concepts.'

if errfound:
    report_obj['status'] = WARN
    report_obj['reason'] = report
else:
    report_obj['status'] = NOERR

report_json = json.dumps(report_obj, indent=4)
print(report_json)

{
    "status": "WARN",
    "test_name": "wikipediaMisuse",
    "reason": [
        "Misuse of wikepedia link for these concepts:",
        "Wikipedia link in <oboInOwl:hasDbXref> (not allowed) ::: http://edamontology.org/format_3748(Linked data format)   https://en.wikipedia.org/wiki/Linked_data",
        "Wikipedia link in <oboInOwl:hasExactSynonym> (not allowed)::: http://edamontology.org/topic_3342(Translational medicine)   https://en.wikipedia.org/wiki/Translational_medicine",
        "Wikipedia link in <oboInOwl:hasDbXref> (not allowed) ::: http://edamontology.org/format_1961(Stockholm format)   http://en.wikipedia.org/wiki/Stockholm_format",
        "Wikipedia link in <oboInOwl:hasExactSynonym> (not allowed)::: http://edamontology.org/topic_2640(Oncology)   https://en.wikipedia.org/wiki/Oncology",
        "Wikipedia link in <oboInOwl:hasExactSynonym> (not allowed)::: http://edamontology.org/topic_3414(Trauma medicine)   https://en.wikipedia.org/wiki/Traumatology",
        "Wikip