**wikipediaLinkMissing.ipynb**

Wikipedia link missing from EDAM Topic concept.

**Documentation:** https://github.com/edamontology/edamverify/blob/master/docs/wikiepediaLinkMissing.md

**NB:** Wikipedia links may be specified in the following ways:
* ``<rdfs:seeAlso>http://en.wikipedia.org/wiki/List_of_file_formats</rdfs:seeAlso>``        
* ``<rdfs:seeAlso rdf:resource="https://en.wikipedia.org/wiki/Information_Hyperlinked_over_Proteins"/>``       


Set constants for script return values. Load EDAM_dev.owl from GitHub into an RDF graph.

In [5]:
import os
from rdflib import ConjunctiveGraph, Namespace
import json
from collections import OrderedDict

# Constants for script error reporting as per https://github.com/edamontology/edamverify.
NOERR = "NOERR"
INFO  = "INFO"
WARN  = "WARN"
ERROR = "ERROR"

#Load EDAM_dev.owl from GitHub into an RDF graph.
print("Loading graph ...", end="")
g = ConjunctiveGraph()
g.load(os.environ.get('EDAM_PATH', '../EDAM_dev.owl'), format='xml')
# g.load('https://raw.githubusercontent.com/edamontology/edamontology/master/EDAM_dev.owl', format='xml')
# g.load('EDAM_dev.owl')
g.bind('edam', Namespace('http://edamontology.org#'))
print("done!")

Loading graph ...done!


https://neuroml.org/neuromlv2|https://neuroml-db.org/ does not look like a valid URI, trying to serialize this will break.


Define SPARQL query to extract ID, term, and (if available) seealso and deprecated fields of all Topic concepts. Run the query.

**NB:** Use ``"/topic_"`` in query to avoid detection of http://edamontology.org/is_topic_of

In [6]:
# Compile SPARQL query
query_term = """
SELECT ?id ?term ?seealso ?deprecated WHERE
{
?id rdfs:label ?term .
OPTIONAL {?id rdfs:seeAlso ?seealso .}
OPTIONAL {?id owl:deprecated ?deprecated .}
FILTER regex(str(?id), "/topic_")
}
"""
# Declare hash tables for results
ids = {}
terms = {}
errs = {}

# Run SPARQL query and collate results
errfound = False    
report = list()
results = g.query(query_term)

Analyse results of query.

In [7]:
report.append("Missing wikepedia link for these Topic concepts:")

for r in results :
    
    id      = str(r['id'])
    term    = str(r['term'])
    seealso = str(r['seealso']) 
    deprecated = str(r['deprecated'])

    # Skip deprecated concepts
    if deprecated == "true":
        continue
            
  # print(id, "(", term, ")   ", seealso, "   ", str(r['seealso']))

    # id is assingned to both the key and value of the 'ids' hash table
    # Later on, just the key is used
    ids[id] = id
    terms[id] = term
    errs[id] = True    
      
    # "None" string is return from SPARQL query where value was not found
    if seealso != "None" and "wikipedia" in seealso:
        errs[id] = False
    
for key in ids:
    if errs[key]:
        errfound = True
        report.append("Missing wikipedia link ::: " + key +  ' (' + terms[key] + ')')

Write report and return approriate value.

In [8]:
# Write report
# Use ordered dictionary to get consistent listing of properties in the JSON output
report_dict = OrderedDict

if errfound:
    report_dict = {
        'test_name': 'wikipediaLinkMissing',
        'status' : INFO,
        'reason' : report
    }
else:
    report_dict = {
        'test_name': 'wikipediaLinkMissing',
        'status' : NOERR,
        'reason' : '-'
    }

report_json = json.dumps(report_dict, indent=4)
print(report_json)



{
    "test_name": "wikipediaLinkMissing",
    "reason": [
        "Missing wikepedia link for these Topic concepts:",
        "Missing wikipedia link ::: http://edamontology.org/topic_3955 (Fluxomics)",
        "Missing wikipedia link ::: http://edamontology.org/topic_3421 (Surgery)",
        "Missing wikipedia link ::: http://edamontology.org/topic_3534 (Protein binding sites)",
        "Missing wikipedia link ::: http://edamontology.org/topic_3120 (Protein variants)",
        "Missing wikipedia link ::: http://edamontology.org/topic_3324 (Infectious disease)",
        "Missing wikipedia link ::: http://edamontology.org/topic_3293 (Phylogenetics)",
        "Missing wikipedia link ::: http://edamontology.org/topic_3411 (Gynaecology and obstetrics)",
        "Missing wikipedia link ::: http://edamontology.org/topic_0089 (Ontology and terminology)",
        "Missing wikipedia link ::: http://edamontology.org/topic_3954 (Echography)",
        "Missing wikipedia link ::: http://edamontolo