## Loading EDAM into an RDFlib graph

In [2]:
from rdflib import ConjunctiveGraph

Here we initialize the graph. 

In [3]:
kg = ConjunctiveGraph()

def print_size():
    print(f"The knowledge graph has {len(kg)} triples")

Here we load the EDAM ontology into the graph. 

In [4]:
kg.load('http://edamontology.org/EDAM.owl', format='xml')
print_size()

The knowledge graph has 36884 triples


In [22]:
kg.serialize("edam.json", format="json-ld")
kg.serialize("edam.ttl", format="turtle")

<Graph identifier=N9aaf1d9771d44cc4bcfd59fde9d6ce18 (<class 'rdflib.graph.ConjunctiveGraph'>)>

In [6]:
# a single function to load EDAM and get the graph object as a result
def load_EDAM():
    g = ConjunctiveGraph()
    g.load('http://edamontology.org/EDAM.owl', format='xml')
    return g

G = load_EDAM()
print(len(G))

36884


## Listing the 100 first triples  

In [7]:
i = 0

for subject,predicate,obj in kg:
    print(f'({subject}, {predicate}, {obj})')
    i+=1
    
    if i > 99:
        break
    

(http://edamontology.org/topic_0659, http://www.geneontology.org/formats/oboInOwl#hasNarrowSynonym, Non-coding RNA)
(http://edamontology.org/operation_2222, http://www.w3.org/2000/01/rdf-schema#subClassOf, http://www.w3.org/2002/07/owl#DeprecatedClass)
(http://edamontology.org/operation_0316, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Class)
(http://edamontology.org/operation_0478, http://www.geneontology.org/formats/oboInOwl#hasDefinition, Model the structure of a protein in complex with a small molecule or another macromolecule.)
(http://edamontology.org/data_1235, http://www.geneontology.org/formats/oboInOwl#inSubset, http://purl.obolibrary.org/obo/edam#edam)
(http://edamontology.org/operation_2405, http://www.w3.org/2000/01/rdf-schema#label, Protein interaction data processing)
(http://edamontology.org/data_1384, http://www.geneontology.org/formats/oboInOwl#inSubset, http://purl.obolibrary.org/obo/edam#data)
(http://edamontology.org/data_2128, ht

In [8]:
from rdflib.namespace import RDF, RDFS, OWL 

i = 0


for s in kg.triples((None, RDF.type, OWL.Class)):
    for label in kg.triples((s, RDFS.label, None)):
        print(kg.value(s, RDFS.label))
        i +=1
    
    if i > 99:
        break



## Evaluating a basic SPARQL query
Aim: printing the label and the URI of the direct subclasses of the generic "EDAM operation" concept.

In [9]:
query = """
SELECT ?x ?label WHERE {
    ?x rdfs:subClassOf <http://edamontology.org/operation_0004> . 
    ?x rdfs:label ?label .
}
"""

results = kg.query(query)

for r in results :
    print(f"{r['label']} is identified in EDAM with concept {r['x']}") 

Annotation is identified in EDAM with concept http://edamontology.org/operation_0226
Indexing is identified in EDAM with concept http://edamontology.org/operation_0227
Visualisation is identified in EDAM with concept http://edamontology.org/operation_0337
Data handling is identified in EDAM with concept http://edamontology.org/operation_2409
Prediction and recognition is identified in EDAM with concept http://edamontology.org/operation_2423
Comparison is identified in EDAM with concept http://edamontology.org/operation_2424
Optimisation and refinement is identified in EDAM with concept http://edamontology.org/operation_2425
Modelling and simulation is identified in EDAM with concept http://edamontology.org/operation_2426
Validation is identified in EDAM with concept http://edamontology.org/operation_2428
Mapping is identified in EDAM with concept http://edamontology.org/operation_2429
Design is identified in EDAM with concept http://edamontology.org/operation_2430
Alignment is identifi

## Evaluating another SPARQL query
Aim: printing the label and the URI of all direct and indirect subclasses of *Alignment* operations (Class <http://edamontology.org/operation_2928>)

In [10]:
query = """
SELECT ?x ?label WHERE {
    ?x rdfs:subClassOf+ <http://edamontology.org/operation_2928> . 
    ?x rdfs:label ?label .
}
"""

results = kg.query(query)

for r in results :
    print(f"{r['label']} is identified is a kind of alignment operation") 

Sequence alignment is identified is a kind of alignment operation
Structure-based sequence alignment is identified is a kind of alignment operation
Sequence profile alignment is identified is a kind of alignment operation
Pairwise sequence alignment is identified is a kind of alignment operation
Multiple sequence alignment is identified is a kind of alignment operation
Local alignment is identified is a kind of alignment operation
Global alignment is identified is a kind of alignment operation
Tree-based sequence alignment is identified is a kind of alignment operation
Genome alignment is identified is a kind of alignment operation
Structure alignment is identified is a kind of alignment operation
Pairwise structure alignment is identified is a kind of alignment operation
Multiple structure alignment is identified is a kind of alignment operation
Local structure alignment is identified is a kind of alignment operation
Global structure alignment is identified is a kind of alignment oper

## Sort EDAM operations 

In [14]:
query = """
SELECT ?x ?label WHERE {
    ?x rdfs:subClassOf+ <http://edamontology.org/operation_2928> . 
    ?x rdfs:label ?label .
}
ORDER BY ASC(str(?label))
"""

results = kg.query(query)

for r in results :
    print(f"{r['label']} : {r['x']}") 

Fold recognition : http://edamontology.org/operation_0303
Genome alignment : http://edamontology.org/operation_3182
Global alignment : http://edamontology.org/operation_0496
Global structure alignment : http://edamontology.org/operation_0510
Local alignment : http://edamontology.org/operation_0495
Local structure alignment : http://edamontology.org/operation_0509
Multiple sequence alignment : http://edamontology.org/operation_0492
Multiple structure alignment : http://edamontology.org/operation_0504
Pairwise sequence alignment : http://edamontology.org/operation_0491
Pairwise structure alignment : http://edamontology.org/operation_0503
Protein threading : http://edamontology.org/operation_0302
Sequence alignment : http://edamontology.org/operation_0292
Sequence profile alignment : http://edamontology.org/operation_0300
Structure alignment : http://edamontology.org/operation_0295
Structure-based sequence alignment : http://edamontology.org/operation_0294
Tree-based sequence alignment : 

In [26]:
query = """
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
PREFIX edam: <http://purl.obolibrary.org/obo/edam#> 

SELECT ?x ?label WHERE {
    ?x oboInOwl:inSubset edam:operations . 
    ?x rdfs:label ?label .
}
ORDER BY ASC(str(?x))
"""

results = kg.query(query)

for r in results :
    print(f"{r['label']} : {r['x']}") 

Operation : http://edamontology.org/operation_0004
Query and retrieval : http://edamontology.org/operation_0224
Annotation : http://edamontology.org/operation_0226
Indexing : http://edamontology.org/operation_0227
Sequence generation : http://edamontology.org/operation_0230
Sequence editing : http://edamontology.org/operation_0231
Sequence merging : http://edamontology.org/operation_0232
Sequence conversion : http://edamontology.org/operation_0233
Sequence complexity calculation : http://edamontology.org/operation_0234
Sequence ambiguity calculation : http://edamontology.org/operation_0235
Sequence composition calculation : http://edamontology.org/operation_0236
Repeat sequence analysis : http://edamontology.org/operation_0237
Sequence motif discovery : http://edamontology.org/operation_0238
Sequence motif recognition : http://edamontology.org/operation_0239
Sequence motif comparison : http://edamontology.org/operation_0240
Simulation analysis : http://edamontology.org/operation_0244
S

In [56]:
query = """
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
PREFIX edam: <http://purl.obolibrary.org/obo/edam#> 

SELECT ?x ?label WHERE {
    ?x rdf:type owl:AnnotationProperty .
    OPTIONAL {?x rdfs:label ?label .}
}
ORDER BY ASC(str(?x))
"""

results = kg.query(query)
annots = []
for r in results :
    print(f"{r['label']} : {r['x']}") 
    annots.append(str(r['x']))

Citation : http://edamontology.org/citation
None : http://edamontology.org/comment_handle
Created in : http://edamontology.org/created_in
deprecation_comment : http://edamontology.org/deprecation_comment
Documentation : http://edamontology.org/documentation
Example : http://edamontology.org/example
File extension : http://edamontology.org/file_extension
hasHumanReadableId : http://edamontology.org/hasHumanReadableId
Information standard : http://edamontology.org/information_standard
deprecation_candidate : http://edamontology.org/is_deprecation_candidate
refactor_candidate : http://edamontology.org/is_refactor_candidate
isdebtag : http://edamontology.org/isdebtag
Media type : http://edamontology.org/media_type
None : http://edamontology.org/next_id
notRecommendedForAnnotation : http://edamontology.org/notRecommendedForAnnotation
Obsolete since : http://edamontology.org/obsolete_since
Old parent : http://edamontology.org/oldParent
Old related : http://edamontology.org/oldRelated
Ontolog

In [57]:
annots

['http://edamontology.org/citation',
 'http://edamontology.org/comment_handle',
 'http://edamontology.org/created_in',
 'http://edamontology.org/deprecation_comment',
 'http://edamontology.org/documentation',
 'http://edamontology.org/example',
 'http://edamontology.org/file_extension',
 'http://edamontology.org/hasHumanReadableId',
 'http://edamontology.org/information_standard',
 'http://edamontology.org/is_deprecation_candidate',
 'http://edamontology.org/is_refactor_candidate',
 'http://edamontology.org/isdebtag',
 'http://edamontology.org/media_type',
 'http://edamontology.org/next_id',
 'http://edamontology.org/notRecommendedForAnnotation',
 'http://edamontology.org/obsolete_since',
 'http://edamontology.org/oldParent',
 'http://edamontology.org/oldRelated',
 'http://edamontology.org/ontology_used',
 'http://edamontology.org/organisation',
 'http://edamontology.org/refactor_comment',
 'http://edamontology.org/regex',
 'http://edamontology.org/repository',
 'http://edamontology.or

In [58]:
from rdflib import Namespace

OBOOTHER = Namespace("http://purl.obolibrary.org/obo/")
kg.bind("oboOther", OBOOTHER, override=True)

classes = ["http://edamontology.org/citation", "http://edamontology.org/created_in"] 

for s in annots:
    q = """
CONSTRUCT {
    <"""+s+"""> ?p ?o 
} WHERE {
    <"""+s+"""> ?p ?o 
}
"""
#    print(q)
    kg_res = kg.query(q)
    for l in kg_res.serialize(format="turtle").decode().split("\n"):
        if not "@prefix" in l:
            print(l)


ns3:citation a owl:AnnotationProperty ;
    rdfs:label "Citation" ;
    ns3:created_in "1.13" ;
    ns1:is_metadata_tag "true" ;
    ns2:hasBroadSynonym "Publication reference" ;
    ns2:hasDefinition "'Citation' concept property ('citation' metadata tag) contains a dereferenceable URI, preferrably including a DOI, pointing to a citeable publication of the given data format." ;
    ns2:hasRelatedSynonym "Publication" ;
    ns2:inSubset "concept_properties" .



<http://edamontology.org/comment_handle> a owl:AnnotationProperty .



<http://edamontology.org/created_in> a owl:AnnotationProperty ;
    rdfs:label "Created in" ;
    ns2:is_metadata_tag "true" ;
    ns1:hasDefinition "Version in which a concept was created." ;
    ns1:inSubset "concept_properties" .



<http://edamontology.org/deprecation_comment> a owl:AnnotationProperty ;
    rdfs:label "deprecation_comment" ;
    ns2:is_metadata_tag "true" ;
    ns1:hasDefinition "A comment explaining why the comment should be or was depr