# basic intro stuff

require ARQ (Jena)

Jena can be installed using homebrew

## Table of contents

<div><a href='#cell1'>print all classes and their labels</a></div>
<div><a href='#cell2'>print all properties and their labels</a></div>
<div><a href='#cell3'>print direct subclasses of 'plant morphology trait'</a></div>
<div><a href='#cell4'>count the number of subclasses of a given classes</a></div>
<div><a href='#cell5'>find PATO classes used in TO</a></div>
<div><a href='#cell6'>return the category of the TO terms</a></div>
<div><a href='#cell7'>print TO terms with more than 1 category</a></div>
<div><a href='#cell8'>return classes with multiple parents</a></div>
<div><a href='#cell9'>export TO classes and DPs if any</a></div>
<div><a href='#cell10'>return TO deprecated classes</a></div>

<div id='cell1' />
### print all the classes and their labels

%%script bash

arq --data https://raw.githubusercontent.com/Planteome/plant-trait-ontology/master/plant-trait-ontology.obo.owl '
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>  
PREFIX obo: <http://purl.obolibrary.org/obo/>  
        
SELECT ?object ?subject
WHERE {?subject rdfs:label ?object}
'

<div id='cell2' />

### select all the properties and their labels (if any)

In [2]:
%%script bash

arq --data https://raw.githubusercontent.com/Planteome/plant-trait-ontology/master/plant-trait-ontology.obo.owl '
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>  
PREFIX obo: <http://purl.obolibrary.org/obo/> 
        
select distinct ?properties ?name
where{?subject ?properties ?object .
      optional {?properties rdfs:label ?name}}
'        

--------------------------------------------------------
| properties              | name                       |
| owl:onProperty          |                            |
| rdf:type                |                            |
| owl:someValuesFrom      |                            |
| oio:hasOBONamespace     | "has_obo_namespace"        |
| rdfs:subClassOf         |                            |
| oio:id                  |                            |
| oio:hasRelatedSynonym   | "has_related_synonym"      |
| oio:created_by          |                            |
| obo:IAO_0000115         | "definition"               |
| oio:hasDbXref           | "database_cross_reference" |
| owl:equivalentClass     |                            |
| rdfs:label              |                            |
| owl:annotatedTarget     |                            |
| owl:annotatedSource     |                            |
| owl:annotatedProperty   |                            |
| rdf:rest                |    

<div id='cell3' />
### direct subclasses of 'plant morphology trait' 
there are 2 ways of doing it: filtering the label or using the class URI

In [3]:
%%script bash

arq --data https://raw.githubusercontent.com/Planteome/plant-trait-ontology/master/plant-trait-ontology.obo.owl '
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>  
PREFIX obo: <http://purl.obolibrary.org/obo/> 
        
select ?object ?name
where{?object rdfs:subClassOf ?something . 
     ?something rdfs:label "plant morphology trait" . 
     ?object rdfs:label ?name }
        
'

-------------------------------------------------------
| object         | name                               |
| obo:TO_0000839 | "plant structure morphology trait" |
-------------------------------------------------------


In [4]:
%%script bash

arq --data https://raw.githubusercontent.com/Planteome/plant-trait-ontology/master/plant-trait-ontology.obo.owl '
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>  
PREFIX obo: <http://purl.obolibrary.org/obo/> 
        
select ?object ?name
where{?object rdfs:subClassOf obo:TO_0000017 . 
    
     ?object rdfs:label ?name }
        
'

-------------------------------------------------------
| object         | name                               |
| obo:TO_0000839 | "plant structure morphology trait" |
-------------------------------------------------------


<div id='cell4' />
### count the number of subclasses of a given classes
count direct and indirect subclasses of the 'plant morphology trait'

uses the property paths feature

In [5]:
%%script bash

arq --data https://raw.githubusercontent.com/Planteome/plant-trait-ontology/master/plant-trait-ontology.obo.owl '
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>  
PREFIX obo: <http://purl.obolibrary.org/obo/> 
        
select (count(distinct ?subject)as ?count)
where {?subject rdfs:subClassOf* obo:TO_0000017 .
      ?subject rdfs:label ?name}
        
'

---------
| count |
| 710   |
---------


<div id='cell5' />
### find pato classes used in TO
filter based on the resource URI

In [6]:
%%script bash

arq --data https://raw.githubusercontent.com/Planteome/plant-trait-ontology/master/plant-trait-ontology.obo.owl '
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>  
PREFIX obo: <http://purl.obolibrary.org/obo/> 
        
select ?subject ?name
where {
    {?subject ?x ?object .
      filter(regex(str(?subject), "PATO"))}
    union 
       {?y ?x ?subject .
        ?subject rdf:type owl:Class . 
      filter(regex(str(?subject), "PATO"))}
}
      
        
'

---------------------------
| subject          | name |
| obo:PATO_0000122 |      |
| obo:PATO_0000921 |      |
| obo:PATO_0000070 |      |
| obo:PATO_0000070 |      |
| obo:PATO_0000921 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000921 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000921 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000921 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000070 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000921 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000122 |      |
| obo:PATO_0000070 |      |
| obo:PATO_0000921 |

<div id='cell6' />
### To which category belong a TO Term?

In [7]:
%%script bash
arq --data https://raw.githubusercontent.com/Planteome/plant-trait-ontology/master/plant-trait-ontology.obo.owl '
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>  
PREFIX obo: <http://purl.obolibrary.org/obo/>  
        
SELECT distinct ?to ?label ?tcLabel
WHERE { 
    ?to rdfs:subClassOf* ?topCat .
    ?topCat rdfs:subClassOf <http://purl.obolibrary.org/obo/TO_0000387> .
    ?topCat rdfs:label ?tcLabel .
    ?to rdfs:label ?label 
} 
order by ?topCat
'

------------------------------------------------------------------------------------------------------------------------------
| to             | label                                                              | tcLabel                              |
| obo:TO_0002747 | "10-dehulled grain weight"                                         | "plant morphology trait"             |
| obo:TO_0000591 | "100-dehulled grain weight"                                        | "plant morphology trait"             |
| obo:TO_0000269 | "100-seed weight"                                                  | "plant morphology trait"             |
| obo:TO_0000592 | "1000-dehulled grain weight"                                       | "plant morphology trait"             |
| obo:TO_0000382 | "1000-seed weight"                                                 | "plant morphology trait"             |
| obo:TO_0000529 | "abaxial stomatal frequency"                                       | "plant morphology trait

<div id='cell7' />
### TO terms with more than 1 category

In [8]:
%%script bash
arq --results TSV --data https://raw.githubusercontent.com/Planteome/plant-trait-ontology/master/plant-trait-ontology.obo.owl '
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>  
PREFIX obo: <http://purl.obolibrary.org/obo/>  
        
SELECT ?to ?label (group_concat(distinct ?tcLabel ; separator = " AND ") AS ?topCatTot)
WHERE { 
    ?to rdfs:subClassOf* ?topCat .
    ?topCat rdfs:subClassOf <http://purl.obolibrary.org/obo/TO_0000387> .
    ?topCat rdfs:label ?tcLabel .
    ?to rdfs:label ?label 
} 
group by ?to ?label
having (COUNT(?topCat) > 1)
order by ?tcLabel

'

?to	?label	?topCatTot
<http://purl.obolibrary.org/obo/TO_0000138>	"brown rice protein"	"biochemical trait AND quality trait"
<http://purl.obolibrary.org/obo/TO_0002653>	"endosperm storage protein content"	"biochemical trait AND quality trait"
<http://purl.obolibrary.org/obo/TO_0000107>	"endosperm storage protein-1 content"	"biochemical trait AND quality trait"
<http://purl.obolibrary.org/obo/TO_0000109>	"endosperm storage protein-2 content"	"biochemical trait AND quality trait"
<http://purl.obolibrary.org/obo/TO_0000710>	"globulin protein content"	"biochemical trait AND quality trait"
<http://purl.obolibrary.org/obo/TO_0000410>	"polished rice protein content"	"biochemical trait AND quality trait"
<http://purl.obolibrary.org/obo/TO_0000610>	"soluble to total protein ratio"	"biochemical trait AND quality trait"
<http://purl.obolibrary.org/obo/TO_0000414>	"white rice protein content"	"biochemical trait AND quality trait"
<http://purl.obolibrary.org/obo/TO_0000605>	"hydrogen peroxide conte

<div id='cell8' />
### return classes with multiple parents
return direct parents + categories of the parents

In [28]:
%%script bash
arq --results TSV --data https://raw.githubusercontent.com/Planteome/plant-trait-ontology/master/plant-trait-ontology.obo.owl '
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>  
PREFIX obo: <http://purl.obolibrary.org/obo/>  
        
SELECT ?to ?label ?directP ?directPLabel ?topCat ?tcLabel 
WHERE {
    { select ?to ?label
        where {
            ?to rdfs:subClassOf ?directP . 
            ?to rdfs:label ?label . 
            FILTER (!isBlank(?directP))
            FILTER (!regex(str(?directP), "PATO_"))
            } 
            group by ?to ?label 
            having (COUNT(?directP) > 1)
                 
    }
    ?to rdfs:subClassOf ?directP .
    ?directP rdfs:label ?directPLabel .
    ?directP rdfs:subClassOf* ?topCat .
    ?topCat rdfs:subClassOf <http://purl.obolibrary.org/obo/TO_0000387> .
    ?topCat rdfs:label ?tcLabel .
    
}
order by ?to
' > TO_MultipleParents.tsv

In [30]:
%%script bash
arq --results TSV --data https://raw.githubusercontent.com/Planteome/plant-trait-ontology/master/plant-trait-ontology.obo.owl '
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>  
PREFIX obo: <http://purl.obolibrary.org/obo/>  
        
SELECT (count(distinct ?to) as ?count)
WHERE {
    { select ?to ?label
        where {
            ?to rdfs:subClassOf ?directP . 
            ?to rdfs:label ?label . 
            FILTER (!isBlank(?directP))
            FILTER (!regex(str(?directP), "PATO_"))
            } 
            group by ?to ?label 
            having (COUNT(?directP) > 1)
                 
    }
    ?to rdfs:subClassOf ?directP .
    ?directP rdfs:label ?directPLabel .
    ?directP rdfs:subClassOf* ?topCat .
    ?topCat rdfs:subClassOf <http://purl.obolibrary.org/obo/TO_0000387> .
    ?topCat rdfs:label ?tcLabel .
    
}
order by ?to
'

?count
104


<div id='cell9' />
### export TO classes and DPs if any
the output file can be used to implement DOSDPs

In [6]:
%%script bash
arq --results JSON --data https://raw.githubusercontent.com/Planteome/plant-trait-ontology/master/plant-trait-ontology.obo.owl '
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>  
PREFIX obo: <http://purl.obolibrary.org/obo/>  
        
SELECT ?to ?label ?pato (group_concat(distinct ?extClass ; separator = "|") AS ?dp) (group_concat(distinct ?prop ; separator = "|") AS ?dpProp) 
WHERE {
    ?to rdfs:label ?label .
    FILTER (!regex(str(?to), "PATO_"))
    FILTER (regex(str(?to), "TO_"))
    OPTIONAL { 
        ?to (owl:equivalentClass/(owl:intersectionOf/rdf:rest*/rdf:first))* ?pato .
        ?to (owl:equivalentClass/(owl:intersectionOf/rdf:rest*/rdf:first/owl:someValuesFrom))* ?extClass .
        ?to (owl:equivalentClass/(owl:intersectionOf/rdf:rest*/rdf:first/owl:onProperty))* ?prop
        filter(!isblank(?extClass) && !isblank(?pato))
        filter (?to != ?pato)
        filter (?to != ?extClass)
        filter (?to != ?prop)
    }
    FILTER NOT EXISTS {
        ?to owl:deprecated ?bool .
    }
    
    
}
group by ?to ?label ?pato
order by ?to
' > DP.json

In [7]:
# create a tsv file 
import json

with open('DP.json') as data_file:    
    data = json.load(data_file)
    list = data["results"]["bindings"]
    
    with open('DP.tsv', 'w') as tsvfile:
        output = []
        #header
        output.append("toId"+"\t"+"toName"+"\t"+"pato"+"\t"+"Class 1"+"\t"+"Class 2"+"\t"+"Class 3"+"\t"+"prop 1"+"\t"+"prop 2")
        for entry in list:
            toId = entry["to"]["value"]
            toName = entry["label"]["value"]
            pato = ""
            extClass = ""
            prop = ""
            if "pato" in entry:
                pato = entry["pato"]["value"]
                if(len(entry["dp"]["value"].split("|"))==1):
                    extClass = '\t'.join(entry["dp"]["value"].split("|"))+"\t"+"\t"
                elif(len(entry["dp"]["value"].split("|"))==2):
                    extClass = '\t'.join(entry["dp"]["value"].split("|"))+"\t"
                elif(len(entry["dp"]["value"].split("|"))==3):
                    extClass = '\t'.join(entry["dp"]["value"].split("|"))
                prop = '\t'.join(entry["dpProp"]["value"].split("|"))
            output.append(toId+"\t"+toName+"\t"+pato+"\t"+extClass+"\t"+prop)
        tsvfile.write('\n'.join(output))
        

### create tsv file by pato term

In [19]:
%%script bash
arq --results JSON --data https://raw.githubusercontent.com/Planteome/plant-trait-ontology/master/plant-trait-ontology.obo.owl --data https://raw.githubusercontent.com/pato-ontology/pato/master/pato.owl '
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>  
PREFIX obo: <http://purl.obolibrary.org/obo/>  
        
SELECT ?to ?label ?pato ?patoLabel (group_concat(distinct ?extClass ; separator = "|") AS ?dp) (group_concat(distinct ?prop ; separator = "|") AS ?dpProp) 
WHERE {
    ?to rdfs:label ?label .
    FILTER (!regex(str(?to), "PATO_"))
    FILTER (regex(str(?to), "TO_"))
    OPTIONAL { 
        ?to (owl:equivalentClass/(owl:intersectionOf/rdf:rest*/rdf:first))* ?pato .
        ?to (owl:equivalentClass/(owl:intersectionOf/rdf:rest*/rdf:first/owl:someValuesFrom))* ?extClass .
        ?to (owl:equivalentClass/(owl:intersectionOf/rdf:rest*/rdf:first/owl:onProperty))* ?prop
        filter(!isblank(?extClass) && !isblank(?pato))
        filter (?to != ?pato)
        filter (?to != ?extClass)
        filter (?to != ?prop)
        OPTIONAL { ?pato rdfs:label ?patoLabel }
    }
    FILTER NOT EXISTS {
        ?to owl:deprecated ?bool .
    }
    
    
}
group by ?to ?label ?pato ?patoLabel
order by DESC(?pato)
' > DP.json

In [24]:
# create a tsv file 
import json

with open('DP.json') as data_file:    
    data = json.load(data_file)
    list = data["results"]["bindings"]
    
    
output = []
## create a differnt tsv file by pato term
previousPato = ""
#header
output.append("iri"+"\t"+"iri label"+"\t"+"attribute"+"\t"+"attribute label"+"\t"+"entity"+"\t"+"entity label"+"\t")
for entry in list:
    toId = entry["to"]["value"]
    toName = entry["label"]["value"]
    pato = ""
    extClass = ""
    prop = ""
    if "pato" in entry and entry["patoLabel"]["value"]==previousPato:
        pato = entry["pato"]["value"]
        patoLabel = entry["patoLabel"]["value"]
        if(len(entry["dp"]["value"].split("|"))==1):
            extClass = '\t'.join(entry["dp"]["value"].split("|"))
            output.append(toId+"\t"+toName+"\t"+pato+"\t"+patoLabel+"\t"+extClass+"\t")
    else:
        ##create the file
        if previousPato=="":
            ##first round, let's assume that pato won't be empty
            pato = entry["pato"]["value"]
            patoLabel = entry["patoLabel"]["value"]
            if(len(entry["dp"]["value"].split("|"))==1):
                extClass = '\t'.join(entry["dp"]["value"].split("|"))
                output.append(toId+"\t"+toName+"\t"+pato+"\t"+patoLabel+"\t"+extClass+"\t")
            previousPato = entry["patoLabel"]["value"]
            #print(previousPato)
            
        else:       
            with open('patterns/'+previousPato.replace(" ", "_")+'.tsv', 'w') as tsvfile:
                #header
                output.insert(0, "iri"+"\t"+"iri label"+"\t"+"attribute"+"\t"+"attribute label"+"\t"+"entity"+"\t"+"entity label"+"\t")
                #output.append("iri"+"\t"+"iri label"+"\t"+"attribute"+"\t"+"attribute label"+"\t"+"entity"+"\t"+"entity label"+"\t")
                tsvfile.write('\n'.join(output))
                ##reset the output
                output = []
                ##add the current trait
                previousPato = entry["patoLabel"]["value"]
                pato = entry["pato"]["value"]
                patoLabel = entry["patoLabel"]["value"]
                if(len(entry["dp"]["value"].split("|"))==1):
                    extClass = '\t'.join(entry["dp"]["value"].split("|"))
                    output.append(toId+"\t"+toName+"\t"+pato+"\t"+patoLabel+"\t"+extClass+"\t")
    
            
        
        

{'patoLabel': {'type': 'literal', 'value': 'quality trait'}, 'dp': {'type': 'literal', 'value': 'http://purl.obolibrary.org/obo/PO_0009001'}, 'pato': {'type': 'uri', 'value': 'http://purl.obolibrary.org/obo/TO_0000597'}, 'label': {'type': 'literal', 'value': 'fruit quality trait'}, 'to': {'type': 'uri', 'value': 'http://purl.obolibrary.org/obo/TO_0002728'}, 'dpProp': {'type': 'literal', 'value': 'http://purl.obolibrary.org/obo/RO_0000052'}}

quality trait
{'patoLabel': {'type': 'literal', 'value': 'quality trait'}, 'dp': {'type': 'literal', 'value': 'http://purl.obolibrary.org/obo/PO_0009089'}, 'pato': {'type': 'uri', 'value': 'http://purl.obolibrary.org/obo/TO_0000597'}, 'label': {'type': 'literal', 'value': 'endosperm quality'}, 'to': {'type': 'uri', 'value': 'http://purl.obolibrary.org/obo/TO_0000587'}, 'dpProp': {'type': 'literal', 'value': 'http://purl.obolibrary.org/obo/RO_0000052'}}
quality trait
quality trait
{'patoLabel': {'type': 'literal', 'value': 'stress trait'}, 'dp': {'t

KeyError: 'patoLabel'

<div id='cell10' />
### get TO deprecated classes 

In [5]:
%%script bash

arq --data https://raw.githubusercontent.com/Planteome/plant-trait-ontology/master/plant-trait-ontology.obo.owl '
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>  
PREFIX obo: <http://purl.obolibrary.org/obo/> 
        
select ?subject
where {
    ?subject owl:deprecated ?bool .
}        
'

------------------
| subject        |
| obo:TO_0000272 |
| obo:TO_0000341 |
| obo:TO_0000171 |
| obo:TO_1000001 |
| obo:TO_0000076 |
| obo:TO_0000066 |
| obo:TO_0001036 |
| obo:TO_0000407 |
| obo:TO_0000174 |
| obo:TO_0000393 |
| obo:TO_0000216 |
| obo:TO_0000037 |
| obo:TO_0000091 |
| obo:TO_0000002 |
| obo:TO_0000239 |
| obo:TO_0000362 |
| obo:TO_0000302 |
| obo:TO_0000256 |
| obo:TO_0000186 |
| obo:TO_0000039 |
| obo:TO_0000334 |
| obo:TO_0000380 |
| obo:TO_0000282 |
| obo:TO_0000596 |
| obo:TO_0000219 |
------------------
