**wikipediaLinkMissing.ipynb**

Wikipedia link missing from EDAM Topic concept.

**Documentation:** https://github.com/edamontology/edamverify/blob/master/docs/wikiepediaLinkMissing.md

**NB:** Wikipedia links may be specified in the following ways:
* ``<rdfs:seeAlso>http://en.wikipedia.org/wiki/List_of_file_formats</rdfs:seeAlso>``        
* ``<rdfs:seeAlso rdf:resource="https://en.wikipedia.org/wiki/Information_Hyperlinked_over_Proteins"/>``       


Set constants for script return values. Load EDAM_dev.owl from GitHub into an RDF graph.

In [17]:
import sys
from rdflib import ConjunctiveGraph, Namespace

# Constants for script error reporting as per https://github.com/edamontology/edamverify.
NOERR = "NOERR"
INFO  = "INFO"
WARN  = "WARN"
ERROR = "ERROR"

#Load EDAM_dev.owl from GitHub into an RDF graph.
print("Loading graph ...", end="")
g = ConjunctiveGraph()
g.load('https://raw.githubusercontent.com/edamontology/edamontology/master/EDAM_dev.owl', format='xml')
# g.load('EDAM_dev.owl')
g.bind('edam', Namespace('http://edamontology.org#'))
print("done!")

Loading graph ...done!


Define SPARQL query to extract ID, term, and (if available) seealso and deprecated fields of all Topic concepts. Run the query.

**NB:** Use ``"/topic_"`` in query to avoid detection of http://edamontology.org/is_topic_of

In [18]:
# Compile SPARQL query
query_term = """
SELECT ?id ?term ?seealso ?deprecated WHERE
{
?id rdfs:label ?term .
OPTIONAL {?id rdfs:seeAlso ?seealso .}
OPTIONAL {?id owl:deprecated ?deprecated .}
FILTER regex(str(?id), "/topic_")
}
"""
# Declare hash tables for results
ids = {}
terms = {}
errs = {}

# Run SPARQL query and collate results
errfound = False    
report = list()
results = g.query(query_term)

Analyse results of query.

In [19]:
report.append("Missing wikepedia link for these Topic concepts:")

for r in results :
    
    id      = str(r['id'])
    term    = str(r['term'])
    seealso = str(r['seealso']) 
    deprecated = str(r['deprecated'])

    # Skip deprecated concepts
    if deprecated == "true":
        continue
            
  # print(id, "(", term, ")   ", seealso, "   ", str(r['seealso']))

    # id is assingned to both the key and value of the 'ids' hash table
    # Later on, just the key is used
    ids[id] = id
    terms[id] = term
    errs[id] = True    
      
    # "None" string is return from SPARQL query where value was not found
    if seealso != "None" and "wikipedia" in seealso:
        errs[id] = False
    
for key in ids:
    if errs[key]:
        errfound = True
        report.append("Missing wikipedia link ::: " + key +  ' (' + terms[key] + ')')

Write report and return approriate value.

In [20]:
# Return exit code (raises exception)
if errfound:
    print('"Test name": ' + '"wikipediaLinkMissing", ' +\
          '"Status": ' + '"' + INFO + '", ' +\
          '"Reason": ' + '"' + '\n'.join(report) + '"')

    # print("\n".join(report))
    # sys.exit(INFO)
else:
    print('"Test name": ' + '"wikipediaLinkMissing", ' +\
          '"Status": ' + '"' + NOERR + '", ' +\
          '"Reason": ' + '"-"')

    # print("No issues found.")
    # sys.exit(INFO)



Missing wikepedia link for these Topic concepts:
Missing wikipedia link ::: http://edamontology.org/topic_3892 (Biomolecular simulation)
Missing wikipedia link ::: http://edamontology.org/topic_3071 (Biological databases)
Missing wikipedia link ::: http://edamontology.org/topic_3322 (Respiratory medicine)
Missing wikipedia link ::: http://edamontology.org/topic_0123 (Protein properties)
Missing wikipedia link ::: http://edamontology.org/topic_3421 (Surgery)
Missing wikipedia link ::: http://edamontology.org/topic_0166 (Protein structural motifs and surfaces)
Missing wikipedia link ::: http://edamontology.org/topic_3673 (Whole genome sequencing)
Missing wikipedia link ::: http://edamontology.org/topic_3967 (Immunomics)
Missing wikipedia link ::: http://edamontology.org/topic_3371 (Synthetic chemistry)
Missing wikipedia link ::: http://edamontology.org/topic_0804 (Immunology)
Missing wikipedia link ::: http://edamontology.org/topic_3377 (Safety sciences)
Missing wikipedia link ::: http:/

SystemExit: 3