# **Bioinformatics with Jupyter Notebooks for WormBase:**
## **Data Retrieval 3 - Accessing WormBase data through ParaSite**
Welcome to the third jupyter notebook in the WormBase tutorial series. Over this series of tutorials, we will write code in Python that allows us to retrieve and perform simple analyses with data available on the WormBase sites.

This tutorial will deal with accessing the WormBase ParaSite RESTful API endpoints and downloading any required data. Let's get started!

We start by installing and loading the libraries that are required for this tutorial. We also initialise the server variable to the parasite main website.

In [None]:
import requests
import sys
import json
server = "https://parasite.wormbase.org"

#### Information about the available data

In the cell below, there are several examples on how to use `/rest-15/info/`. Uncomment the request that suits your needs best and change the variables as needed. 

Get the currently available assemblies for a species in json format.

In [None]:
request = requests.get(server + "/rest-15/info/assembly/caenorhabditis_elegans?", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Get information about the specified sequence region for a species in json format.

In [None]:
request = requests.get(server + "/rest-15/info/assembly/caenorhabditis_elegans/I?", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Get information about a genome in json format.

In [None]:
request = requests.get(server + "/rest-15/info/genomes/caenorhabditis_elegans?", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Find information about all genomes in json format.

In [None]:
request = requests.get(server + "/rest-15/info/genomes/?", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Find information about a genome with a specified assembly in json format.

In [None]:
request = requests.get(server + "/rest-15/info/genomes/assembly/GCA_000002985.3?", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Find information about all genomes beneath a given node of the taxonomy in json format.

In [None]:
request = requests.get(server + "/rest-15/info/genomes/taxonomy/6239?", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Get the quality (CEGMA and BUSCO) scores for a specific genome in json format.

In [None]:
request = requests.get(server + "/rest-15/info/quality/caenorhabditis_elegans?", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Get the WormBase release number in json format.

In [None]:
request = requests.get(server + "/rest-15/info/version/", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Lists all available species, their aliases, available adaptor groups and data release in json format.

In [None]:
request = requests.get(server + "/rest-15/info/species?division=EnsemblParasite", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

#### Looking up information for single and several identifiers

In the cell below, there are several examples on how to use `/rest-15/lookup/`. Uncomment the request that suits your needs best and change the variables as needed. 

Find the species and database for a single identifier in json format.

In [None]:
request = requests.get(server + "/rest-15/lookup/id/WBGene00008422?", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Find the species and database for a single identifier (expanded information) in json format.

In [None]:
request = requests.get(server + "/rest-15/lookup/id/WBGene00008422?expand=1", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Find the species and database for a single identifier (condensed information) in json format.

In [None]:
request = requests.get(server + "/rest-15/lookup/id/WBGene00008422?format=condensed;db_type=core", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Find the species and database for several identifiers in json format.

In [None]:
request = requests.post(server + "/rest-15/lookup/id/", 
                        headers = {"Content-Type" : "application/json", "Accept" : "application/json"}, data='{ "ids" : ["WBGene00011532", "WBGene00008422" ] }')

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Find the species and database for a symbol in a linked external database in json format.

In [None]:
request = requests.get(server + "/rest-15/lookup/symbol/caenorhabditis_elegans_prjna13758/chaf-1?", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Find the species and database for a symbol in a linked external database in json format.

In [None]:
request = requests.get(server + "/rest-15/lookup/symbol/caenorhabditis_elegans_prjna13758/chaf-1?expand=1", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Find the species and database for a symbol in a linked external database in json format.

In [None]:
request = requests.get(server + "/rest-15/lookup/symbol/caenorhabditis_elegans_prjna13758/chaf-1?" \
                       + "format=condensed;db_type=core", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""})

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Find the species and database for a set of symbols in a linked external database in json format.

In [None]:
request = requests.post(server + "/rest-15/lookup/symbol/caenorhabditis_elegans_prjna13758/chaf-1", 
                        headers = {"Content-Type" : "application/json", "Accept" : "application/json"}, 
                        data = '{"symbols" : ["Bm994", "__VAR(gene_symbol2)__" ] }')

if not request.ok:
  request.raise_for_status()
  sys.exit()
decoded = request.json()
print(json.dumps(decoded, indent = 4))

#### Sequence Information from WormBase

In the cells below, there are several examples on how to use `/rest-15/sequence/`. Uncomment the request that suits your needs best and change the variables as needed. 

Request DNA sequence by WormBase Gene ID in plain text format.

In [None]:
request = requests.get(server + "/rest-15/sequence/id/WBGene00008422?" , 
                       headers = {"Content-Type" : "text/plain", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
print(request.text)

Request Genomic sequence by WormBase Gene ID in plain text format.

In [None]:
request = requests.get(server + "/rest-15/sequence/id/WBGene00008422?type=genomic" , 
                       headers = {"Content-Type" : "text/x-fasta", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
print(request.text)

Request Protein sequence (1 of multiple) by WormBase Gene ID in xml format.

In [None]:
request = requests.get(server + "/rest-15/sequence/id/WBGene00008422?multiple_sequences=1;type=protein" , 
                       headers = {"Content-Type" : "text/x-seqxml+xml", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
print(request.text)

Request cdna (or cds) transcript sequence by WormBase Entry ID in fasta format.

In [None]:
request = requests.get(server + "/rest-15/sequence/id/E02H9.4?object_type=transcript;type=cdna" , 
                       headers = {"Content-Type" : "text/x-fasta", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
print(request.text)

Get the genomic sequence of the specified region of a species in fasta format.

In [None]:
request = requests.get(server + "/rest-15/sequence/region/caenorhabditis_elegans/I:1-5000:1?" , 
                       headers = {"Content-Type" : "text/x-fasta", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
print(request.text)

Get the soft-masked genomic sequence of the specified region of a species in fasta format.

In [None]:
request = requests.get(server + "/rest-15/sequence/region/caenorhabditis_elegans/I:1-5000:1?mask=soft" , 
                       headers = {"Content-Type" : "text/x-fasta", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
print(request.text)

Request multiple types of sequence by a stable identifier list in json format.

In [None]:
request = requests.post(server + "/rest-15/sequence/id" , 
                        headers = {"Content-Type" : "application/json", "Accept" : "application/json"}, 
                        data = '{"ids" : ["WBGene00011532", "WBGene00008422" ] }') 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Get the genomic sequence of the specified region of a species in json format.

In [None]:
request = requests.get(server + "/rest-15/sequence/region/caenorhabditis_elegans/I:1-5000:1?" , 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Get the genomic sequences of multiple regions of a species in json format.

In [None]:
request = requests.post(server + "/rest-15/sequence/region/caenorhabditis_elegans" , 
                        headers = {"Content-Type" : "application/json", "Accept" : "application/json"}, 
                        data = '{"regions" : ["I:1-5000:1", "I:5600..85000"] }') 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

#### Map data from cdna, cds or protein coordinates to genomic coordinates

In the cell below, there are several examples on how to use `/rest-15/map/`. Uncomment the request that suits your needs best and change the variables as needed. 

Convert from cDNA coordinates to genomic coordinates in json format.

In [None]:
request = requests.get(server + "/rest-15/map/cdna/Y74C9A.3.2/100..300?object_type=transcript" , 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Convert from cds coordinates to genomic coordinates in json format.

In [None]:
request = requests.get(server + "/rest-15/map/cds/Y74C9A.3.2/1..300?object_type=transcript" , 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Convert from protein coordinates to genomic coordinates in json format.

In [None]:
request = requests.get(server + "/rest-15/map/translation/Y74C9A.3.2/1..100?object_type=translation" , 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

#### Explore ontology and taxonomy terms

In the cell below, there are several examples on how to use `/rest-15/ontology/`. Uncomment the request that suits your needs best and change the variables as needed. 

Reconstruct the entire ancestry of a term from is_a and part_of relationships in json format.

In [None]:
request = requests.get(server + "/rest-15/ontology/ancestors/GO:0005667?" , 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 
#request = requests.get(server + "/rest-15/ontology/ancestors/chart/GO:0005667?" , 
#                        headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Find all the terms descended from a given term in json format.

In [None]:
request = requests.get(server + "/rest-15/ontology/descendants/GO:0005667?" , 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Search for an ontological term by its namespaced identifier in json format

In [None]:
request = requests.get(server + "/rest-15/ontology/id/GO:0005667?" , 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Search for a list of ontological terms by their name in json format.

In [None]:
request = requests.get(server + "/rest-15/ontology/name/transcription factor complex?" , 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

#### Retrieves features that overlap a given region

In the cell below, there are several examples on how to use `/rest-15/overlap/`. Uncomment the request that suits your needs best and change the variables as needed. 

Retrieves mentioned features of a kind that overlap a region defined by a gene in json format.

In [None]:
request = requests.get(server + "/rest-15/overlap/id/WBGene00008422?feature=gene" , 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Retrieves mentioned features of multiple types for a region in json format.

In [None]:
request = requests.get(server + "/rest-15/overlap/region/caenorhabditis_elegans/I:1-5000?feature=gene;" + \
                       "feature=transcript;feature=cds;feature=exon" , 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Retrieves mentioned features of multiple types for a region in json format.

In [None]:
request = requests.get(server + "/rest-15/overlap/region/caenorhabditis_elegans/I:1-5000?feature=gene;" + \
                       "feature=transcript;feature=cds;feature=exon" , 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Retrieve mentioned features of multiple types for a specific translation in json format.

In [None]:
request = requests.get(server + "/rest-15/overlap/translation/Y74C9A.3.2?type=Superfamily" , 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

#### Extracting all objects linked by cross referencing

In the cells below, there are several examples on how to use `/rest-15/xrefs/`. Uncomment the request that suits your needs best and change the variables as needed. 

Looks up an external symbol and returns all objects linked to it in xml format.

In [None]:
request = requests.get(server + "/rest-15/xrefs/symbol/caenorhabditis_elegans_prjna13758/chaf-1?" , 
                       headers = {"Content-Type" : "text/xml", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit()
print(request.text)

Looks up an external symbol and returns all objects linked to it in json format.

In [None]:
request = requests.get(server+"/rest-15/xrefs/symbol/caenorhabditis_elegans_prjna13758/chaf-1?" , 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Looks up an external symbol and returns mentioned objects linked to it in json format.

In [None]:
request = requests.get(server + "/rest-15/xrefs/symbol/caenorhabditis_elegans_prjna13758/chaf-1?" + \
                       "external_db=EntrezGene" , 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Perform lookups of identifiers and retrieve their external references in other databases in json format.

In [None]:
request = requests.get(server + "/rest-15/xrefs/id/WBGene00008422?", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Perform lookups of identifiers and retrieve their external references in mentioned databases in json format.

In [None]:
request = requests.get(server + "/rest-15/xrefs/id/Y74C9A.3.2?object_type=transcript;external_db=GO;all_levels=1", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Performs a lookup based upon the primary accession or display label of an external reference in json format.

In [None]:
request = requests.get(server + "/rest-15/xrefs/name/caenorhabditis_elegans_prjna13758/chaf-1?", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

#### Comparative Genomics and GeneTree information

In the cells below, there are several examples on how to use `/rest-15/genetree/`. Uncomment the request that suits your needs best and change the variables as needed. 

Retrieves a gene tree dump for a gene tree stable identifier in json format.

In [None]:
request = requests.get(server + "/rest-15/genetree/id/WBGT00000000021204?", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Retrieves homology information (orthologues) by gene id in json format.

In [None]:
request = requests.get(server + "/rest-15/homology/id/WBGene00008422?", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Retrieves homology information related to mentioned fields by gene id in json format.

In [None]:
request = requests.get(server + "/rest-15/homology/id/WBGene00008422?compara=parasite", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Retrieves homology information related to mentioned fields by gene id in json format.

In [None]:
request = requests.get(server + "/rest-15/homology/id/WBGene00008422?target_taxon=6279;sequence=cdna;" + \
                       "target_species=wuchereria_bancrofti_prjeb536;type=orthologues", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Retrieves homology information in condensed format related to mentioned fields by gene id in json format.

In [None]:
request = requests.get(server + "/rest-15/homology/id/WBGene00008422?format=condensed;type=orthologues", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Retrieves homology information in condensed format related to mentioned fields by symbol in json format.

In [None]:
request = requests.get(server + "/rest-15/homology/symbol/caenorhabditis_elegans_prjna13758/chaf-1?", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Retrieves homology information related to mentioned fields by symbol in json format

In [None]:
request = requests.get(server + "/rest-15/homology/symbol/caenorhabditis_elegans_prjna13758/chaf-1?" + \
                       "target_taxon=6279;sequence=cdna;target_species=wuchereria_bancrofti_prjeb536;" + \
                       "type=orthologues", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Retrieves homology information in condensed format related to mentioned fields by symbol in json format

In [None]:
request = requests.get(server + "/rest-15/homology/symbol/caenorhabditis_elegans_prjna13758/chaf-1?" + \
                       "format=condensed;type=orthologues", 
                       headers = {"Content-Type" : "application/json", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit() 
decoded = request.json()
print(json.dumps(decoded, indent = 4))

Retrieves a gene tree dump for a gene tree stable identifier in nh format.

In [None]:
request = requests.get(server + "/rest-15/genetree/id/WBGT00000000021204?nh_format=simple", 
                       headers = {"Content-Type" : "text/x-nh", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit()
print(request.text)

Retrieves a gene tree dump for a gene tree stable identifier in phyloxml-xml format

In [None]:
request = requests.get(server + "/rest-15/genetree/id/WBGT00000000021204?", 
                       headers = {"Content-Type" : "text/x-phyloxml+xml", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit()
print(request.text)

Retrieves a gene tree dump with cdna sequence information for a gene tree stable identifier in phyloxml-xml format

In [None]:
request = requests.get(server + "/rest-15/genetree/id/WBGT00000000021204?aligned=1;sequence=cdna", 
                       headers = {"Content-Type" : "text/x-phyloxml+xml", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit()
print(request.text)

Retrieves a gene tree that contains the stable identifier in phyloxml-xml format.

In [None]:
request = requests.get(server + "/rest-15/genetree/member/id/WBGene00008422?", 
                       headers = {"Content-Type" : "text/x-phyloxml+xml", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit()
print(request.text)

Retrieves a gene tree containing the gene identified by a symbol in phyloxml-xml format.

In [None]:
request = requests.get(server + "/rest-15/genetree/member/symbol/caenorhabditis_elegans_prjna13758/chaf-1?", 
                       headers = {"Content-Type" : "text/x-phyloxml+xml", "Accept" : ""}) 

if not request.ok:
  request.raise_for_status()
  sys.exit()
print(request.text)

This is the end of the tutorial for accessing WormBase ParaSite through the RESTful API in several data types.

In the next tutorial, we will use the WormBase RESTful API to access the essential gene information for any WormBase gene IDs, and replicate the SimpleMine results.

Acknowledgements;
- ParaSite RESTful API (https://parasite.wormbase.org/rest-15)