UniProt SPARQL Endpoint:  http://sparql.uniprot.org/sparql

In [1]:
%endpoint https://sparql.uniprot.org/sparql
%format JSON

Q1: 1 POINT  How many protein records are in UniProt? 

In [2]:
PREFIX up: <http://purl.uniprot.org/core/>

SELECT (COUNT (?protein) AS ?ProtCount)

WHERE
{
    ?protein a up:Protein .
}

ProtCount
360157660


Q2: 1 POINT How many Arabidopsis thaliana protein records are in UniProt? 

In [3]:
PREFIX up:<http://purl.uniprot.org/core/> 
PREFIX taxon:<http://purl.uniprot.org/taxonomy/> 

SELECT (COUNT(DISTINCT ?protein) AS ?ProteinCount)
WHERE 
{
	?protein a up:Protein .
	?protein up:organism taxon:3702 .
}

ProteinCount
136782


Q3: 1 POINT retrieve pictures of Arabidopsis thaliana from UniProt? 

In [4]:
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

select distinct ?image
where
{
	?organism up:scientificName "Arabidopsis thaliana" .
	?organism foaf:depiction ?image .
}

image
https://upload.wikimedia.org/wikipedia/commons/3/39/Arabidopsis.jpg
https://upload.wikimedia.org/wikipedia/commons/thumb/6/60/Arabidopsis_thaliana_inflorescencias.jpg/800px-Arabidopsis_thaliana_inflorescencias.jpg


Q4: 1 POINT:  What is the description of the enzyme activity of UniProt Protein Q9SZZ8  

In [5]:
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> 
PREFIX up:<http://purl.uniprot.org/core/>
PREFIX uniprotkb:<http://purl.uniprot.org/uniprot/>
PREFIX skos:<http://www.w3.org/2004/02/skos/core#> 

SELECT ?name ?comment ?reaction ?activity
WHERE
{
	uniprotkb:Q9SZZ8 a up:Protein .
	
	uniprotkb:Q9SZZ8 up:annotation  ?annotation . 
	?annotation a up:Function_Annotation .
	?annotation rdfs:comment ?comment .
	
	uniprotkb:Q9SZZ8 up:enzyme ?enzyme .
	?enzyme skos:prefLabel ?name .
	?enzyme up:activity ?activity .
	?activity rdfs:label ?reaction .
}


name,comment,reaction,activity
Beta-carotene 3-hydroxylase,Nonheme diiron monooxygenase involved in the biosynthesis of xanthophylls. Specific for beta-ring hydroxylations of beta-carotene. Has also a low activity toward the beta- and epsilon-rings of alpha-carotene. No activity with acyclic carotenoids such as lycopene and neurosporene. Uses ferredoxin as an electron donor.,Beta-carotene + 4 reduced ferredoxin [iron-sulfur] cluster + 2 H(+) + 2 O(2) = zeaxanthin + 4 oxidized ferredoxin [iron-sulfur] cluster + 2 H(2)O.,http://purl.uniprot.org/enzyme/1.14.15.24#SIPF8A63F68B2741FFE


Q5: 1 POINT:  Retrieve the proteins ids, and date of submission, for proteins that have been added to UniProt this year   (HINT Google for “SPARQL FILTER by date”)

In [11]:
PREFIX up:<http://purl.uniprot.org/core/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?protein ?date
WHERE 
{
    ?protein a up:Protein ;
             up:created ?date .
    FILTER ( ?date >= "2021-01-01"^^xsd:date  && ?date < "2022-01-01"^^xsd:date) .
} 

LIMIT 50  


protein,date
http://purl.uniprot.org/uniprot/A0A1H7ADE3,2021-06-02
http://purl.uniprot.org/uniprot/A0A1V1AIL4,2021-06-02
http://purl.uniprot.org/uniprot/A0A2Z0L603,2021-06-02
http://purl.uniprot.org/uniprot/A0A4J5GG53,2021-04-07
http://purl.uniprot.org/uniprot/A0A6G8SU52,2021-02-10
http://purl.uniprot.org/uniprot/A0A6G8SU69,2021-02-10
http://purl.uniprot.org/uniprot/A0A7C9JLR7,2021-02-10
http://purl.uniprot.org/uniprot/A0A7C9JMZ7,2021-02-10
http://purl.uniprot.org/uniprot/A0A7C9KUQ4,2021-02-10
http://purl.uniprot.org/uniprot/A0A7D4HP61,2021-02-10


Q6: 1 POINT How  many species are in the UniProt taxonomy?

In [12]:
PREFIX up:<http://purl.uniprot.org/core/> 
 
SELECT (COUNT (DISTINCT ?species) AS ?speciescount)
WHERE
{
    ?species a up:Taxon ;
             up:rank up:Species .
}

speciescount
2029846


Q7: 2 POINT  How many species have at least one protein record? (this might take a long time to execute, so do this one last!)

In [2]:
PREFIX up: <http://purl.uniprot.org/core/>

SELECT (COUNT (DISTINCT ?species) AS ?count)

WHERE
{
    ?protein a up:Protein;
        up:organism ?species .
    ?species a up:Taxon;
        up:rank up:Species .
}

count
1057158


Q8: 3 points:  find the AGI codes and gene names for all Arabidopsis thaliana  proteins that have a protein function annotation description that mentions “pattern formation”

In [2]:
PREFIX up:<http://purl.uniprot.org/core/>
PREFIX taxon:<http://purl.uniprot.org/taxonomy/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos:<http://www.w3.org/2004/02/skos/core#>

SELECT ?agicode ?name 
WHERE
{
    ?protein a up:Protein ;
             up:organism ?taxon ;
             up:encodedBy ?gene ; 
             up:annotation  ?annotation . 
    
    ?taxon a up:Taxon ;
           up:scientificName "Arabidopsis thaliana" .

    ?annotation a up:Function_Annotation ; 
                rdfs:comment ?annot_comment . 
    
    ?gene a up:Gene ;
          up:locusName ?agicode ; 
          skos:prefLabel ?name . 

	FILTER CONTAINS(STR(?annot_comment), 'pattern formation') .     
}
 
    

agicode,name
At3g54220,SCR
At4g21750,ATML1
At1g13980,GN
At5g40260,SWEET8
At1g69670,CUL3B
At1g63700,YDA
At2g46710,ROPGAP3
At1g26830,CUL3A
At3g09090,DEX1
At4g37650,SHR


From the MetaNetX metabolic networks for metagenomics database SPARQL Endpoint: https://rdf.metanetx.org/sparql
(this slide deck will make it much easier for you!  https://www.metanetx.org/cgi-bin/mnxget/mnxref/MetaNetX_RDF_schema.pdf)

Q9: 4 POINTS:  what is the MetaNetX Reaction identifier (starts with “mnxr”) for the UniProt Protein uniprotkb:Q18A79

In [5]:
%endpoint https://rdf.metanetx.org/sparql

In [10]:
PREFIX mnx: <https://rdf.metanetx.org/schema/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>

SELECT DISTINCT ?mnxr_lb

WHERE
{
    ?protein mnx:peptXref uniprotkb:Q18A79 .
    ?cata a mnx:CATA;
        mnx:pept ?protein .
    ?gpr mnx:cata ?cata ;
        mnx:reac ?reaction .
    ?reaction a mnx:REAC ; 
        mnx:mnxr ?mnxr .
    ?mnxr rdfs:label ?mnxr_lb .
}

mnxr_lb
MNXR145046
MNXR165934


FEDERATED QUERY - UniProt and MetaNetX

Q10: 5 POINTS:  What is the official Gene ID (UniProt calls this a “mnemonic”) and the MetaNetX Reaction identifier (mnxr…..) for the protein that has “Starch synthase” catalytic activity in Clostridium difficile (taxon 272563).

In [12]:
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX mnx: <https://rdf.metanetx.org/schema/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT ?ID ?MNXID ?activity
WHERE
{
  service <http://sparql.uniprot.org/sparql> {
    ?protein a up:Protein ;
    	up:organism taxon:272563 ;
    	up:mnemonic ?ID ;
     	up:classifiedWith ?GO .
    ?GO rdfs:label ?activity .
    filter contains(?activity, "starch synthase")
    bind (substr(str(?protein),33) as ?prot_ac)
    bind (IRI(CONCAT(uniprotkb:,?prot_ac)) as ?uniprotRef)
  }
  service <https://rdf.metanetx.org/sparql> {
    ?pept mnx:peptXref ?uniprotRef .
    ?cata mnx:pept ?pept .
    ?gpr mnx:cata ?cata ;
         mnx:reac ?reac .
    ?reac rdfs:label ?MNXID .
  }
}
   

ID,MNXID,activity
GLGA_CLOD6,mnxr165934,starch synthase activity
GLGA_CLOD6,mnxr145046c3,starch synthase activity
