## Wheat-KG SPARQL Endpoint

In [19]:
%endpoint http://d2kab.i3s.unice.fr/sparql

In [20]:
%show 20
# Request whatever format is appropriate for the query type
%format default

# Activate table output
%display table

## Prefixes of Used Ontologies and Vocabularies

In [21]:
%prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
%prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> 
%prefix xsd:     <http://www.w3.org/2001/XMLSchema#> 
%prefix schema:  <http://schema.org/> 
%prefix owl:     <http://www.w3.org/2002/07/owl#> 
%prefix skos:    <http://www.w3.org/2004/02/skos/core#> 
%prefix oa:      <http://www.w3.org/ns/oa#> 
%prefix ncbi:    <http://identifiers.org/taxonomy/> 
%prefix dct:     <http://purl.org/dc/terms/> 
%prefix frbr:    <http://purl.org/vocab/frbr/core#> 
%prefix fabio:   <http://purl.org/spar/fabio/> 
%prefix obo:     <http://purl.obolibrary.org/obo/> 
%prefix bibo:    <http://purl.org/ontology/bibo/> 
%prefix d2kab:   <http://ns.inria.fr/d2kab/> 
%prefix dc:      <http://purl.org/dc/terms/> 

## CQ 1.

The first SPARQL query allows scientists to retrieve genes that are mentioned proximal to the a given phenotype (resistance to leaf rust in this example). The query counts the number of times that a gene is cited in the PubMed corpus proximal to the phenotype. The results of this query confirms that Lr34 is one most frequent genes mentionned  proximal to the resistance to leaf rust phenotype. Lr10, Lr26 and Lr24 genes appear also in the top of the list. 

In [22]:
SELECT ?GeneName (count(distinct ?paper) as ?NbOcc) WHERE {

   ?a1 a oa:Annotation; 
      oa:hasTarget [ oa:hasSource ?source1 ] ;  
      oa:hasBody [ a d2kab:Phenotype; skos:prefLabel "resistance to Leaf Rust"] .

   ?source1 frbr:partOf+ ?paper .
    
   ?a a oa:Annotation ; 
      oa:hasTarget [ oa:hasSource ?source ] ;
      oa:hasBody [ a d2kab:Gene; skos:prefLabel ?GeneName ].

   ?source frbr:partOf+ ?paper.

   ?paper a fabio:ResearchPaper.
}
GROUP BY ?GeneName 
HAVING (count(distinct ?paper) > 1)
ORDER BY DESC(?NbOcc)

GeneName,NbOcc
Lr34,34
Lr10,33
Lr1,33
Lr,24
Lr26,22
Lr24,20
Lr9,19
Lr28,19
Lr21,19
Lr16,18


## CQ2.

The SPARQL query allows to retrieve genetic markers mentioned proximal to a gene which is in turn mentioned proximal to a phenotype ("resistance to Stripe Rust" in this example) considering the same scientific publication. The results of this query returns 30 scientific publications that list several genetic markers related to different genes which are mentioned proximal to the <i> resistance to Stripe Rust</i> phenotype.

In [5]:
SELECT distinct ?GeneName (GROUP_CONCAT(distinct ?marker; SEPARATOR="-") as ?markers) ?paper ?year 
WHERE {

   ?a1 a oa:Annotation ;
      oa:hasTarget [ oa:hasSource ?source1 ] ;
      oa:hasBody [ a d2kab:Gene ; skos:prefLabel ?GeneName] .

   ?source1 frbr:partOf+ ?paper .

   ?a2 a oa:Annotation ;
      oa:hasTarget [ oa:hasSource ?source2 ] ;
      oa:hasBody [ a d2kab:Marker ; skos:prefLabel ?marker ]. 

   ?source2 frbr:partOf+ ?paper .

   ?a3 a oa:Annotation ; 
      oa:hasTarget [ oa:hasSource ?source3 ] ;
      oa:hasBody [ skos:prefLabel "resistance to Stripe Rust"; a d2kab:Phenotype ] .

   ?source3 frbr:partOf+ ?paper . 

   ?paper a fabio:ResearchPaper ;  dct:title ?source3; dct:issued ?year .
   FILTER (?year >= "2010"^^xsd:gYear)
}
GROUP BY ?GeneName?paper ?year

GeneName,markers,paper,year
Lr52,cfb309-gwm234,https://pubmed.ncbi.nlm.nih.gov/21344185,2011
Yr18,Xbarc98-Xgwm165-Xgwm192,https://pubmed.ncbi.nlm.nih.gov/20848270,2011
Gc,gwm148,https://pubmed.ncbi.nlm.nih.gov/27795677,2016
Yr65,Xgdm33-Xgwm11-Xgwm18-Xgwm413,https://pubmed.ncbi.nlm.nih.gov/25142874,2014
LrW1,cfb309-gwm234,https://pubmed.ncbi.nlm.nih.gov/21344185,2011
Yr,Xbarc8-Xgwm493,https://pubmed.ncbi.nlm.nih.gov/27818611,2015
Yr26,Xbarc187-Xgwm11-Xgwm18,https://pubmed.ncbi.nlm.nih.gov/24487977,2014
Yr47,cfb309-gwm234,https://pubmed.ncbi.nlm.nih.gov/21344185,2011
Yr10,Xgwm273,https://pubmed.ncbi.nlm.nih.gov/26649867,2016
Yr24,Xbarc137-Xbarc187-Xbarc240-Xgwm11-Xgwm18-Xgwm273,https://pubmed.ncbi.nlm.nih.gov/22967144,2012


## CQ2 bis.
The SPARQL query retrieves couples of scientific publications such as a first publication mentions a given phenotype and a gene and the second one mentions the same gene name with a genetic marker. To reduce the number of results, the following query retrieves only publications which mention the <i>resistance to Stripe Rust</i> phenotype in their title along with genetic markers and genes in their abstract.  

In [6]:
SELECT distinct ?geneName ?paper1 ?marker ?paper2 WHERE {
   {
    SELECT distinct ?geneName ?gene ?paper1 WHERE {
       ?a1 a oa:Annotation ; 
          oa:hasTarget [ oa:hasSource ?source1 ] ;
          oa:hasBody [ skos:prefLabel "resistance to Stripe Rust" ] .

       ?a2 a oa:Annotation ;
          oa:hasTarget [ oa:hasSource ?source2 ] ;
          oa:hasBody ?gene .
          ?gene a d2kab:Gene ; skos:prefLabel ?geneName . 
          ?source1 frbr:partOf+ ?paper1 .
          ?source2 frbr:partOf+ ?paper1 .
          ?paper1 a fabio:ResearchPaper ; dct:title ?source1 .
    }
   }
   ?a3 a oa:Annotation ;
      oa:hasTarget [ oa:hasSource ?source3 ] ;
      oa:hasBody [a d2kab:Marker ; skos:prefLabel ?marker ] .
 
   ?a4 a oa:Annotation ;
      oa:hasTarget [ oa:hasSource ?source4 ] ;
      oa:hasBody ?gene .
 
   ?source3 frbr:partOf+ ?paper2 .
   ?source4 frbr:partOf+ ?paper2 .
   ?paper2 a fabio:ResearchPaper .
   FILTER (URI(?paper1) != URI(?paper2))
}

geneName,paper1,marker,paper2
R2,https://pubmed.ncbi.nlm.nih.gov/15841362,mta9,https://pubmed.ncbi.nlm.nih.gov/12582867
R2,https://pubmed.ncbi.nlm.nih.gov/17989954,mta9,https://pubmed.ncbi.nlm.nih.gov/12582867
Yr18,https://pubmed.ncbi.nlm.nih.gov/23558982,Xgwm295,https://pubmed.ncbi.nlm.nih.gov/15965649
Yr18,https://pubmed.ncbi.nlm.nih.gov/23177146,Xgwm295,https://pubmed.ncbi.nlm.nih.gov/15965649
Yr18,https://pubmed.ncbi.nlm.nih.gov/21104373,Xgwm295,https://pubmed.ncbi.nlm.nih.gov/15965649
Yr18,https://pubmed.ncbi.nlm.nih.gov/20848270,Xgwm295,https://pubmed.ncbi.nlm.nih.gov/15965649
Yr18,https://pubmed.ncbi.nlm.nih.gov/19638674,Xgwm295,https://pubmed.ncbi.nlm.nih.gov/15965649
Yr18,https://pubmed.ncbi.nlm.nih.gov/19638674,Xgwm1220,https://pubmed.ncbi.nlm.nih.gov/15965649
Yr18,https://pubmed.ncbi.nlm.nih.gov/23558982,Xgwm1220,https://pubmed.ncbi.nlm.nih.gov/15965649
Yr18,https://pubmed.ncbi.nlm.nih.gov/21104373,Xgwm1220,https://pubmed.ncbi.nlm.nih.gov/15965649


## CQ 3. 

This SPARQL query allows scientists to retrieve detected wheat varieties that have specific phenotypes.

In [10]:
SELECT distinct ?variety ?label ?paper WHERE {

   ?rel1 d2kab:hasVariety ?a1 ; d2kab:hasPhenotype ?a2 .
  
   ?a1 a oa:Annotation ;
      oa:hasTarget [ oa:hasSource ?source1 ] ;  
      oa:hasBody [ a d2kab:Variety ; skos:prefLabel ?variety ] .

   ?a2 a oa:Annotation ;
      oa:hasTarget [ oa:hasSource ?source2 ] ;
      oa:hasBody [ a d2kab:Phenotype ; skos:prefLabel ?label ] .

   ?source1 frbr:partOf+ ?paper .
   ?source2 frbr:partOf+ ?paper . 
   ?paper a fabio:ResearchPaper .
}
ORDER BY ?variety

variety,label,paper
Apache,resistance to Fusarium head blight,https://pubmed.ncbi.nlm.nih.gov/21655994
Apache,resistance to septoria,https://pubmed.ncbi.nlm.nih.gov/21655994
Apache,plant height,https://pubmed.ncbi.nlm.nih.gov/31646363
Apache,resistance to Fusarium head blight,https://pubmed.ncbi.nlm.nih.gov/31646363
Apache,heat resistance,https://pubmed.ncbi.nlm.nih.gov/31646363
Arina,resistance to rust,https://pubmed.ncbi.nlm.nih.gov/24794977
Arina,resistance to Stem Rust,https://pubmed.ncbi.nlm.nih.gov/24794977
Arina,resistance to Leaf Rust,https://pubmed.ncbi.nlm.nih.gov/24173052
Arina,resistance to Leaf Rust,https://pubmed.ncbi.nlm.nih.gov/27659842
Arina,pathogen resistance,https://pubmed.ncbi.nlm.nih.gov/27659842


## CQ 4.

First we query all phenotypes defined in WTO as sub-classes of the "resistance to a fungal pathogen" Class.

In [17]:
SELECT distinct ?phenotype ?phenotypeLabel WHERE {
?e skos:prefLabel "resistance to a fungal pathogen" ; 
   skos:narrower* ?phenotype .
?phenotype skos:prefLabel ?phenotypeLabel
}


phenotype,phenotypeLabel
http://opendata.inrae.fr/wto/v3.0/thesaurus/WTO_0000340,resistance to a fungal pathogen
http://opendata.inrae.fr/wto/v3.0/thesaurus/WTO_0000465,late blight resistance
http://opendata.inrae.fr/wto/v3.0/thesaurus/WTO_0000471,resistance to Alternaria Leaf Blight
http://opendata.inrae.fr/wto/v3.0/thesaurus/WTO_0000474,resistance to Anthracnose
http://opendata.inrae.fr/wto/v3.0/thesaurus/WTO_0000475,resistance to Ascochyta Leaf Spot
http://opendata.inrae.fr/wto/v3.0/thesaurus/WTO_0000476,resistance to Black Point
http://opendata.inrae.fr/wto/v3.0/thesaurus/WTO_0000477,resistance to Bunt
http://opendata.inrae.fr/wto/v3.0/thesaurus/WTO_0000478,resistance to Cephalosporium Leaf Stripe
http://opendata.inrae.fr/wto/v3.0/thesaurus/WTO_0000480,resistance to Ergot
http://opendata.inrae.fr/wto/v3.0/thesaurus/WTO_0000482,resistance to Eyespot


The following SPARQL query implements CQ4 and allows scientists to retrieve publications in which genes are mentioned proximal to phenotypes from a specific class (considering its sub-classes), e.g., all phenotypes related to fungal pathogen resistance. 

In [23]:
SELECT ?GeneName ?LPhenotype ?paper WHERE {

   ?aa1 a oa:Annotation; 
      oa:hasTarget [ oa:hasSource ?source1 ];
      oa:hasBody [ a d2kab:Gene; skos:prefLabel ?GeneName ] .
  
   ?source1 frbr:partOf+ ?paper . 

   ?aa2 a oa:Annotation; 
      oa:hasTarget [ oa:hasSource ?source2 ] ; 
      oa:hasBody ?Phenotype .
   
   ?source2 frbr:partOf+ ?paper .
   
   ?Phenotype a d2kab:Phenotype ; skos:prefLabel ?LPhenotype .
   ?e2 skos:prefLabel "resistance to a fungal pathogen" ; skos:narrower* ?Phenotype .

   ?paper a fabio:ResearchPaper ; dct:title ?source2 .
}
GROUP BY ?paper ?Phenotype
ORDER BY ?Phenotype

GeneName,LPhenotype,paper
H26,resistance to a fungal pathogen,https://pubmed.ncbi.nlm.nih.gov/20128702
H6,resistance to a fungal pathogen,https://pubmed.ncbi.nlm.nih.gov/20128702
H9,resistance to a fungal pathogen,https://pubmed.ncbi.nlm.nih.gov/20128702
H13,resistance to a fungal pathogen,https://pubmed.ncbi.nlm.nih.gov/20128702
r2,resistance to Fusarium head blight,https://pubmed.ncbi.nlm.nih.gov/17426773
Fr,resistance to Fusarium head blight,https://pubmed.ncbi.nlm.nih.gov/25726000
St,resistance to Fusarium head blight,https://pubmed.ncbi.nlm.nih.gov/31881925
B1,resistance to Fusarium head blight,https://pubmed.ncbi.nlm.nih.gov/12671743
Vrn-1,resistance to Fusarium head blight,https://pubmed.ncbi.nlm.nih.gov/32556394
Kb,resistance to Fusarium head blight,https://pubmed.ncbi.nlm.nih.gov/23071572


## Federated Query.
 
This query allows scientists to jointly exploit both KGs to retrieve publications in PubMed and PHB bulletins mentioning the same taxon ("Triticum aestivum" in the example SPARQL query below).
As each corpus uses different semantic resources to annotate taxon entities (NCBI taxonomy in the WheatKG graph, and FCU thesaurus in PHB graph), the query exploits a third KG, TaxRef-LD\footnote{TaxRef-LD is a a Linked Data knowledge graph representing TAXREF, the French national taxonomical register for fauna, flora and fungus, that covers mainland France and overseas territories. 

In [25]:
%prefix d2kab-bsv:   <http://ontology.inrae.fr/bsv/ontology/>
%prefix dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
%prefix dct:     <http://purl.org/dc/terms/> 
%prefix taxref: <http://taxref.mnhn.fr/lod/property/>

In [41]:
%show all

In [42]:
SELECT distinct ?paper ?bsv ?taxLabel ?fcuCropName ?taxrefClass WHERE {
   {
    SELECT distinct ?paper ?taxon WHERE {       
      ?annot a oa:Annotation; oa:hasTarget [ oa:hasSource ?source ] ; oa:hasBody ?taxon .
      ?taxon a d2kab:Taxon; skos:prefLabel ?label .
      ?source frbr:partOf+ ?paper .
      ?paper a fabio:ResearchPaper ; dct:title ?source .
      FILTER(CONTAINS(?label, "Triticum aestivum"))
    }
    LIMIT 100
    }
    
   SERVICE <http://taxref.i3s.unice.fr/sparql> {
      ?taxrefClass owl:equivalentClass ?taxon ; rdfs:label ?taxLabel . 
   }
   ?fcuCropName taxref:candidateAlignment_eppo|taxref:candidateAlignment_geves ?taxrefClass .  
    
   SERVICE <http://ontology.inrae.fr/bsv/sparql> { 
      ?bsv a d2kab-bsv:Bulletin ; dul:isRealizedBy ?s ; dct:spatial ?w  ; dct:date ?date_bsv .
      ?aa a oa:Annotation ; oa:hasTarget [ oa:hasSource ?s ]  ; oa:hasBody ?fcuCropName .
   }      
}
LIMIT 100

paper,bsv,taxLabel,fcuCropName,taxrefClass
https://pubmed.ncbi.nlm.nih.gov/32448445,http://ontology.inrae.fr/bsv/resources/Q16961/2010/BSV_2_cereales_Normandie_R2011_cle02a37b,Triticum aestivum,http://ontology.inrae.fr/frenchcropusage/Bles_tendres,http://taxref.mnhn.fr/lod/taxon/127692
https://pubmed.ncbi.nlm.nih.gov/32448445,http://ontology.inrae.fr/bsv/resources/Q16961/2010/BSV_9_cereales_Normandie_cle81b469,Triticum aestivum,http://ontology.inrae.fr/frenchcropusage/Bles_tendres,http://taxref.mnhn.fr/lod/taxon/127692
https://pubmed.ncbi.nlm.nih.gov/32448445,http://ontology.inrae.fr/bsv/resources/Q18678265/2019/bsv_gc_mp_n27_16052019_cle046a11,Triticum aestivum,http://ontology.inrae.fr/frenchcropusage/Bles_tendres,http://taxref.mnhn.fr/lod/taxon/127692
https://pubmed.ncbi.nlm.nih.gov/32448445,http://ontology.inrae.fr/bsv/resources/Q13917/2010/pdf_BSV_no14_du_27_mai_2010_cle813bfb-1,Triticum aestivum,http://ontology.inrae.fr/frenchcropusage/Bles_tendres,http://taxref.mnhn.fr/lod/taxon/127692
https://pubmed.ncbi.nlm.nih.gov/32448445,http://ontology.inrae.fr/bsv/resources/Q16994/2011/bsv_grandescultures_20110503_24__cle083e91,Triticum aestivum,http://ontology.inrae.fr/frenchcropusage/Bles_tendres,http://taxref.mnhn.fr/lod/taxon/127692
https://pubmed.ncbi.nlm.nih.gov/32448445,http://ontology.inrae.fr/bsv/resources/Q18677983/2020/BSV05_GC_LOR_S13_2020_cle4419d7,Triticum aestivum,http://ontology.inrae.fr/frenchcropusage/Bles_tendres,http://taxref.mnhn.fr/lod/taxon/127692
https://pubmed.ncbi.nlm.nih.gov/32448445,http://ontology.inrae.fr/bsv/resources/Q13947/2019/BSV_cereales_paille_06_du_26-11-19_cle81855b,Triticum aestivum,http://ontology.inrae.fr/frenchcropusage/Bles_tendres,http://taxref.mnhn.fr/lod/taxon/127692
https://pubmed.ncbi.nlm.nih.gov/32448445,http://ontology.inrae.fr/bsv/resources/Q18678265/2019/bsv_gc_mp_n11_12122019_cle0bba99,Triticum aestivum,http://ontology.inrae.fr/frenchcropusage/Bles_tendres,http://taxref.mnhn.fr/lod/taxon/127692
https://pubmed.ncbi.nlm.nih.gov/32448445,http://ontology.inrae.fr/bsv/resources/Q1152/2014/BSV_AUVERGNE_N_5_du_11_03_14_cle8461ba,Triticum aestivum,http://ontology.inrae.fr/frenchcropusage/Bles_tendres,http://taxref.mnhn.fr/lod/taxon/127692
https://pubmed.ncbi.nlm.nih.gov/32448445,http://ontology.inrae.fr/bsv/resources/Q463/2012/BSV_RA_GC_no03_du_08_03_2012_cle81452d,Triticum aestivum,http://ontology.inrae.fr/frenchcropusage/Bles_tendres,http://taxref.mnhn.fr/lod/taxon/127692
