In [1]:
%endpoint http://sparql.uniprot.org/sparql
%format JSON

### 1. How many protein records are in UniProt?

In [60]:
prefix core:<http://purl.uniprot.org/core/> 
select (count(?protein) as ?Total_Proteins)
where{ 
    
    ?protein a core:Protein. 
}


Total_Proteins
378979161


### 2. How many Arabidopsis thaliana protein records are in UniProt?

In [61]:
prefix core:<http://purl.uniprot.org/core/> 
prefix tax:<http://purl.uniprot.org/taxonomy/>

select (count(?protein) as ?Total_Proteins_in_Arabidopsis_thaliana)
where{ 
    ?protein a core:Protein . 
    ?protein core:organism tax:3702 . 
    }


Total_Proteins_in_Arabidopsis_thaliana
136447


### 3. Retrieve pictures of Arabidopsis thaliana from UniProt?

In [62]:
prefix foaf: <http://xmlns.com/foaf/0.1/>     
prefix core: <http://purl.uniprot.org/core/>
select ?organism ?Picture_URL                             
where {
    ?taxon  foaf:depiction  ?Picture_URL.       
    ?taxon  core:scientificName ?organism.    
    filter regex(?organism, '^Arabidopsis thaliana$', 'i'). 
#To filter I visited this web https://stackoverflow.com/questions/39413155/sparql-on-regex-filter-name
}

organism,Picture_URL
Arabidopsis thaliana,https://upload.wikimedia.org/wikipedia/commons/3/39/Arabidopsis.jpg
Arabidopsis thaliana,https://upload.wikimedia.org/wikipedia/commons/thumb/6/60/Arabidopsis_thaliana_inflorescencias.jpg/800px-Arabidopsis_thaliana_inflorescencias.jpg


### 4. What is the description of the enzyme activity of UniProt Protein Q9SZZ8 

In [63]:
prefix core:<http://purl.uniprot.org/core/> 
prefix uniprot:<http://purl.uniprot.org/uniprot/> 

select ?activity_description
where {
    
    uniprot:Q9SZZ8 a core:Protein ;core:enzyme ?enz.
    ?enz core:activity ?act.       
    ?act rdfs:label ?activity_description.        
}

activity_description
all-trans-beta-carotene + 4 H(+) + 2 O2 + 4 reduced [2Fe-2S]-[ferredoxin] = all-trans-zeaxanthin + 2 H2O + 4 oxidized [2Fe-2S]-[ferredoxin].


### 5. Retrieve the proteins ids, and date of submission, for 5 proteins that have been added to UniProt this year   (HINT Google for “SPARQL FILTER by date”)


In [64]:
prefix core:<http://purl.uniprot.org/core/> 


select ?id ?date_of_submission
where{
    ?protein a core:Protein.             
    ?protein core:mnemonic ?id.          
    ?protein core:created ?date_of_submission.        
    filter contains(xsd:string(?date_of_submission),"2022"). #I tried 2023, but every time I execute the cell the appears a message that said the kernel have died
#To see contains command I use the web https://stackoverflow.com/questions/28628006/sparql-query-for-partial-match-of-statement-contains
#To convert into string I see the web https://graphdb.ontotext.com/documentation/10.1/sparql-functions-reference.html
} limit 5

id,date_of_submission
A0A8E0N8L5_ECOLX,2022-01-19
A0A8F9CQZ7_ECOLX,2022-01-19
A0A8F9ICG9_ECOLX,2022-01-19
A0A8F8WH98_PSEAI,2022-01-19
A0A8F9NZK3_PSEAI,2022-01-19


### 6. How  many species are in the UniProt taxonomy?

In [65]:
prefix core:<http://purl.uniprot.org/core/> 

select (count(?taxon) as ?Total)
where{
    ?taxon a core:Taxon.          
    ?taxon core:rank core:Species.      
}

Total
1995728


### 7. How many species have at least one protein record? (this might take a long time to execute, so do this one last!)

In [7]:
prefix core: <http://purl.uniprot.org/core/>

select (count(distinct ?species) as ?Specie_with_at_least_one_protein_Total)
where {
    
    ?prot a core:Protein.          
    ?prot core:organism ?species.   
    ?species a core:Taxon.             
    ?species core:rank core:Species.   
}

Specie_with_at_least_one_protein_Total
1078469


### 8. Find the AGI codes and gene names for all Arabidopsis thaliana  proteins that have a protein function annotation description that mentions “pattern formation”

In [2]:
prefix skos:<http://www.w3.org/2004/02/skos/core#> 
prefix core:<http://purl.uniprot.org/core/> 
prefix tax:<http://purl.uniprot.org/taxonomy/> 

select ?AGI ?Name
where{ 
    ?prot a core:Protein.                              
    ?prot core:organism tax:3702.                    
    ?prot core:annotation ?annotation.
    ?annotation a core:Function_Annotation.               
    ?annotation rdfs:comment ?comment.                
    ?prot core:encodedBy ?gene.                        
    ?gene core:locusName ?AGI.                      
    ?gene skos:prefLabel ?Name.                     
    filter regex( ?comment, 'pattern formation','i') .
    
} 

AGI,Name
At1g13980,GN
At3g02130,RPK2
At1g69270,RPK1
At5g37800,RSL1
At1g26830,CUL3A
At1g66470,RHD6
At3g09090,DEX1
At5g55250,IAMT1
At1g63700,YDA
At4g21750,ATML1


### 9. What is the MetaNetX Reaction identifier (starts with “mnxr”) for the UniProt Protein uniprotkb:Q18A79


In [4]:
%endpoint https://rdf.metanetx.org/sparql  

In [4]:
prefix mnx: <https://rdf.metanetx.org/schema/>
prefix uniprot: <http://purl.uniprot.org/uniprot/>

select distinct ?reaction_id #I use distintc to avoid the repetitions
where{
    ?pept mnx:peptXref uniprot:Q18A79 . 
    ?cata mnx:pept ?pept.       
    ?GPR mnx:cata ?cata; mnx:reac ?reaction.              
    ?reaction rdfs:label ?reaction_id. 
    
  
}

reaction_id
mnxr165934
mnxr145046c3


### 10. What is the official locus name, and the MetaNetX Reaction identifier (mnxr…..) for the protein that has “glycine reductase” catalytic activity in Clostridium difficile (taxon 272563).   (this must be executed on the https://rdf.metanetx.org/sparql   endpoint)


In [17]:
prefix mnx: <https://rdf.metanetx.org/schema/>
prefix core: <http://purl.uniprot.org/core/>
prefix tax: <http://purl.uniprot.org/taxonomy/>

#Prepare the selects
select distinct ?LocusName ?react_id 
where
{
    # The SERVICE function lets me use servers independently of the enpoint; great!
    service <http://sparql.uniprot.org/sparql> { 
        # Code derived from exercise 8
        ?prot a core:Protein .
        ?prot core:organism tax:272563.
        ?prot core:mnemonic ?LocusName.
        ?prot core:classifiedWith ?Term.
        ?Term rdfs:label ?act.
        FILTER regex( ?act, 'glycine reductase','i') .
    }
    
    service <https://rdf.metanetx.org/sparql> {
        # Code derived from exercise 9
        ?peptide mnx:peptXref ?prot . 
        ?cat mnx:pept ?peptide.
        ?gpr mnx:cata ?cat ;
             mnx:reac ?react .
        ?react rdfs:label ?react_id.
        
  }
} 

LocusName,react_id
Q185M4_CLOD6,mnxr157884c3
Q185M4_CLOD6,mnxr162774c3
Q185M6_CLOD6,mnxr157884c3
Q185M6_CLOD6,mnxr162774c3
Q185M3_CLOD6,mnxr157884c3
Q185M3_CLOD6,mnxr162774c3
Q185M5_CLOD6,mnxr157884c3
Q185M1_CLOD6,mnxr157884c3
Q185M1_CLOD6,mnxr162774c3
