**wikiepediaLinkMissing.ipynb**

Wikipedia link missing from EDAM Topic concept.

**Documentation:** https://github.com/edamontology/edamverify/blob/master/docs/wikiepediaLinkMissing.md

<oboInOwl:hasDbXref rdf:resource="http://en.wikipedia.org/wiki/Simplified_molecular_input_line_entry_specification"/> 
<rdfs:seeAlso rdf:resource="https://en.wikipedia.org/wiki/Information_Hyperlinked_over_Proteins"/>       
<rdfs:seeAlso>http://en.wikipedia.org/wiki/List_of_file_formats</rdfs:seeAlso>        
<documentation rdf:resource="https://en.wikipedia.org/wiki/Sun_Raster"/> 
<oboInOwl:hasExactSynonym rdf:resource="https://en.wikipedia.org/wiki/Pathology"/>  
  
 

Set constants for script return values. Load EDAM_dev.owl from GitHub into an RDF graph.

In [8]:
import sys
from rdflib import ConjunctiveGraph, Namespace

# Constants for script return value as per https://github.com/edamontology/edamverify.
NOERR = 0
INFO  = 1
WARN  = 2
ERROR = 3

#Load EDAM_dev.owl from GitHub into an RDF graph.
print("Loading graph ...", end="")
g = ConjunctiveGraph()
# g.load('https://raw.githubusercontent.com/edamontology/edamontology/master/EDAM_dev.owl', format='xml')
g.load('EDAM_dev.owl')
g.bind('edam', Namespace('http://edamontology.org#'))
print("done!")

Loading graph ...done!


Define SPARQL query to extract ID, term and subset of all concepts. Run the query.

In [9]:
# Compile SPARQL query
query_term = """
SELECT ?id ?term ?seealso ?deprecated WHERE
{
?id rdfs:label ?term .
OPTIONAL {?id rdfs:seeAlso ?seealso .}
OPTIONAL {?id owl:deprecated ?deprecated .}
FILTER regex(str(?id), "topic_")
}
"""
# Declare hash tables for results
ids = {}
terms = {}
errs = {}

# Run SPARQL query and collate results
errfound = False    
report = list()
results = g.query(query_term)

Analyse results of query.

In [10]:
for r in results :
    
    id      = str(r['id'])
    term    = str(r['term'])
    seealso = str(r['seealso']) 
    deprecated = str(r['deprecated'])

    # Skip deprecated concepts
    if deprecated == "true":
        continue
            
    print(id, "(", term, ")   ", seealso, "   ", str(r['seealso']))

    ids[id] = id
    terms[id] = term
    errs[id] = True    
      
    # "None" string is return from SPARQL query where value was not found
    if seealso != "None" and "wikipedia" in seealso:
        errs[id] = False
    
    for key in ids:
        if errs[key]:
            errfound = True
            report.append("Missing wikipedia link ::: " + key +  ' (' + terms[key] + ')')

http://edamontology.org/topic_3047 ( Molecular biology )    https://en.wikipedia.org/wiki/Molecular_biology     https://en.wikipedia.org/wiki/Molecular_biology
http://edamontology.org/topic_0209 ( Medicinal chemistry )    https://en.wikipedia.org/wiki/Medicinal_chemistry     https://en.wikipedia.org/wiki/Medicinal_chemistry
http://edamontology.org/topic_3517 ( GWAS study )    https://en.wikipedia.org/wiki/Genome-wide_association_study     https://en.wikipedia.org/wiki/Genome-wide_association_study
http://edamontology.org/topic_1775 ( Function analysis )    None     None
http://edamontology.org/topic_0637 ( Taxonomy )    https://en.wikipedia.org/wiki/Taxonomy_(biology)     https://en.wikipedia.org/wiki/Taxonomy_(biology)
http://edamontology.org/topic_2815 ( Human biology )    https://en.wikipedia.org/wiki/Human_biology     https://en.wikipedia.org/wiki/Human_biology
http://edamontology.org/topic_0219 ( Data submission, annotation and curation )    None     None
http://edamontology.org/t

Write report and return approriate value.

In [11]:
# Return exit code (raises exception)
if errfound:
    print("Missing wikepedia link for these Topic concepts:")
    print("\n".join(report))
    sys.exit(ERROR)
else:
    print("No issues found.")
    sys.exit(NOERR)



Missing wikepedia link for these Topic concepts:
Missing wikipedia link ::: http://edamontology.org/topic_1775 (Function analysis)
Missing wikipedia link ::: http://edamontology.org/topic_1775 (Function analysis)
Missing wikipedia link ::: http://edamontology.org/topic_1775 (Function analysis)
Missing wikipedia link ::: http://edamontology.org/topic_1775 (Function analysis)
Missing wikipedia link ::: http://edamontology.org/topic_0219 (Data submission, annotation and curation)
Missing wikipedia link ::: http://edamontology.org/topic_1775 (Function analysis)
Missing wikipedia link ::: http://edamontology.org/topic_0219 (Data submission, annotation and curation)
Missing wikipedia link ::: http://edamontology.org/topic_1775 (Function analysis)
Missing wikipedia link ::: http://edamontology.org/topic_3382 (Imaging)
Missing wikipedia link ::: http://edamontology.org/topic_0219 (Data submission, annotation and curation)
Missing wikipedia link ::: http://edamontology.org/topic_3068 (Literatur

SystemExit: 3

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
