
UniProt SPARQL Endpoint:  http://sparql.uniprot.org/sparql  (note that you need to configure the endpoint to GET if you’re using YASGUI)

In [2]:
%endpoint https://sparql.uniprot.org/sparql
%format JSON

Q1: 1 POINT  How many protein records are in UniProt? 

In [6]:
PREFIX core:<http://purl.uniprot.org/core/> 

SELECT (COUNT(?protein) AS ?Total) 
WHERE{ 
        ?protein a core:Protein .
}

Total
378979161


Q2: 1 POINT How many Arabidopsis thaliana protein records are in UniProt? 

In [8]:
PREFIX core:<http://purl.uniprot.org/core/> 
PREFIX taxon:<http://purl.uniprot.org/taxonomy/>

SELECT (COUNT(DISTINCT ?protein) AS ?Total)
WHERE{ 
        ?protein a core:Protein .         
        ?protein core:organism taxon:3702 .
}


Total
136447


Q3: 1 POINT retrieve pictures of Arabidopsis thaliana from UniProt? 

In [10]:

PREFIX up:<http://purl.uniprot.org/core/> 
PREFIX foaf:<http://xmlns.com/foaf/0.1/>

SELECT ?Name ?image
WHERE {
       ?taxon    foaf:depiction  ?image .
       ?taxon    up:scientificName   ?Name .
       FILTER(CONTAINS(?Name, "Arabidopsis thaliana"))
}

Name,image
Arabidopsis thaliana,https://upload.wikimedia.org/wikipedia/commons/3/39/Arabidopsis.jpg
Arabidopsis thaliana,https://upload.wikimedia.org/wikipedia/commons/thumb/6/60/Arabidopsis_thaliana_inflorescencias.jpg/800px-Arabidopsis_thaliana_inflorescencias.jpg


Q4: 1 POINT:  What is the description of the enzyme activity of UniProt Protein Q9SZZ8

In [11]:
PREFIX core:<http://purl.uniprot.org/core/> 
PREFIX uniprot:<http://purl.uniprot.org/uniprot/> 

SELECT ?description
WHERE {
  uniprot:Q9SZZ8 a core:Protein ;          
                   core:enzyme ?enzyme .
  ?enzyme core:activity ?activity .        
  ?activity rdfs:label ?description
}

description
all-trans-beta-carotene + 4 H(+) + 2 O2 + 4 reduced [2Fe-2S]-[ferredoxin] = all-trans-zeaxanthin + 2 H2O + 4 oxidized [2Fe-2S]-[ferredoxin].


Q5: 1 POINT:  Retrieve the proteins ids, and date of submission, for 5 proteins that have been added to UniProt this year   (HINT Google for “SPARQL FILTER by date”)

This explination helped: https://stackoverflow.com/questions/24051435/filter-by-date-range-in-sparql

In [12]:
PREFIX core:<http://purl.uniprot.org/core/> 

SELECT ?id ?date
WHERE{
  ?protein a core:Protein .
  ?protein core:mnemonic ?id .
  ?protein core:created ?date .
  FILTER (?date > "2023-01-01"^^xsd:dateTime)
} LIMIT 5


id,date


Zero results, although... I think this assignment was originally due in 2022 so:

In [13]:
PREFIX core:<http://purl.uniprot.org/core/> 

SELECT ?id ?date
WHERE{
  ?protein a core:Protein .
  ?protein core:mnemonic ?id .
  ?protein core:created ?date .
  FILTER (?date > "2022-01-01"^^xsd:dateTime)
} LIMIT 5

id,date
A0A8E0N8L5_ECOLX,2022-01-19
A0A8F9CQZ7_ECOLX,2022-01-19
A0A8F9ICG9_ECOLX,2022-01-19
A0A8F8WH98_PSEAI,2022-01-19
A0A8F9NZK3_PSEAI,2022-01-19


Q6: 1 POINT How  many species are in the UniProt taxonomy?

In [14]:
PREFIX core:<http://purl.uniprot.org/core/> 

SELECT (COUNT(DISTINCT ?taxon) AS ?Total)
WHERE{
  ?taxon a core:Taxon .
  ?taxon core:rank core:Species
}

Total
1995728


Q7: 2 POINT  How many species have at least one protein record? (this might take a long time to execute, so do this one last!)

In [None]:
PREFIX core: <http://purl.uniprot.org/core/>

SELECT (COUNT(DISTINCT ?species) AS ?Total)
WHERE 
{
    ?protein a core:Protein .           # Select all protein records from uniprot
    ?protein core:organism ?species .   # Select all the species present on those proteins
    ?species a core:Taxon .             # (species are a taxon)
    ?species core:rank core:Species .   # a taxon with level = species
}


Q8: 3 points:  find the AGI codes and gene names for all Arabidopsis thaliana  proteins that have a protein function annotation description that mentions “pattern formation”

In [15]:
PREFIX skos:<http://www.w3.org/2004/02/skos/core#> 
PREFIX core:<http://purl.uniprot.org/core/> 
PREFIX taxon:<http://purl.uniprot.org/taxonomy/> 

SELECT ?agi_code ?gene_name
WHERE{ 
    
    ?protein core:organism taxon:3702 .     #From A. Thaliana
    ?protein a core:Protein .               # Is a protein
    ?protein core:annotation ?annotation .  
    ?annotation a core:Function_Annotation . #Has an annotation
    ?annotation rdfs:comment ?description . # has a description
    ?protein core:encodedBy ?gene .         #Fine gene inciding it
    ?gene core:locusName ?agi_code .        #Get its AGI
    ?gene skos:prefLabel ?gene_name .       #Get its name

    FILTER CONTAINS(?description, "pattern formation")
    
}

agi_code,gene_name
At1g13980,GN
At3g02130,RPK2
At1g69270,RPK1
At5g37800,RSL1
At1g26830,CUL3A
At1g66470,RHD6
At3g09090,DEX1
At5g55250,IAMT1
At1g63700,YDA
At4g21750,ATML1


Q9: 4 POINTS:  what is the MetaNetX Reaction identifier (starts with “mnxr”) for the UniProt Protein uniprotkb:Q18A79

In [16]:
%endpoint https://rdf.metanetx.org/sparql 

In [18]:

PREFIX meta: <https://rdf.metanetx.org/schema/>
PREFIX uniprot: <http://purl.uniprot.org/uniprot/>

SELECT DISTINCT ?pept ?reaction_identifier
WHERE{
    ?pept meta:peptXref uniprot:Q18A79 . 
    ?catalyzes meta:pept ?pept .
    ?gpr meta:cata ?catalyzes ;
         meta:reac ?reaction .
    ?reaction rdfs:label ?reaction_identifier . 
    FILTER CONTAINS(?reaction_identifier, 'mnxr')
}

pept,reaction_identifier
https://rdf.metanetx.org/pept/GLGA_CLOD6,mnxr165934
https://rdf.metanetx.org/pept/GLGA_CLOD6,mnxr145046c3
