# Programmatic Access to EMBL EBI #
### (Exploring Biological Sequences 2019) ###
******





All the resources at EMBL-EBI are freely available and can be explored from https://www.ebi.ac.uk/services. You can access the resources either using a browser interface or programmatically i.e., using web services. A list of EMBL-EBI Web Services APIs for data retrieval resources is given in https://bit.ly/EMBL-EBI-APIs 

Here are some examples to access some of our services programmatically: 

## 1. Retrieve data from DbFetch ##

[DbFetch](https://www.ebi.ac.uk/Tools/dbfetch/) provides an easy way to retrieve entries from various databases at the EMBL-EBI in a consistent manner.

You can retrieve data from DbFetch using our web interface http://www.ebi.ac.uk/Tools/dbfetch/

Required parameters: 
1. Database name
2. Sequence id

Optional Parameters
1. Style (default: html)
2. Format (default format is often the default format for the specified database)



### 1. Retrieve data from DbFetch using REST URL ###

To retrieve a coding sequence entry (example AAA59452) from the European Nucleotide Archive (ENA), we could open a browser window and try the following URL
https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_coding;id=AAA59452
 

database is _ena coding_ and the sequence id is _AAA59452_

https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_coding;id=AAA59452;style=raw 

You can retrieve the data using _curl_ or _wget_ commands

In [1]:
!curl -X GET "https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_coding;id=AAA59452;style=raw"

ID   AAA59452; SV 1; linear; genomic DNA; STD; HUM; 4149 BP.
XX
PA   AH002851.2
XX
DT   13-JUN-2016 (Rel. 129, Created)
DT   13-JUN-2016 (Rel. 129, Last updated, Version 1)
XX
DE   Homo sapiens (human) insulin receptor
XX
KW   .
XX
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC   Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae;
OC   Homo.
XX
RN   [1]
RX   DOI; 10.1073/pnas.86.1.114.
RX   PUBMED; 2911561.
RA   Seino S., Seino M., Nishi S., Bell G.I.;
RT   "Structure of the human insulin receptor gene and characterization of its
RT   promoter";
RL   Proc. Natl. Acad. Sci. U.S.A. 86(1):114-118(1989).
XX
RN   [2]
RX   PUBMED; 2210055.
RA   Seino S., Seino M., Bell G.I.;
RT   "Human insulin-receptor gene. Partial sequence and amplification of exons
RT   by polymerase chain reaction";
RL   Diabetes 39(1):123-128(1990).
XX
DR   MD5; 4c7ab6fae5c07f3becb459d31ae2d7fc.
XX
FH  

In [2]:
!curl -X GET 'https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_coding;id=AAA59452;style=raw' -o AAA59452.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  9911    0  9911    0     0  16545      0 --:--:-- --:--:-- --:--:-- 16518


In [None]:
!head -n 30 AAA59452.txt

In [3]:
!curl -X GET 'https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_coding;id=AAA59452;style=raw;format=fasta' -o AAA59452.fasta

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4282    0  4282    0     0   8462      0 --:--:-- --:--:-- --:--:--  8445


In [4]:
!head AAA59452.fasta

>ENA|AAA59452|AAA59452.1 Homo sapiens (human) insulin receptor
ATGGGCACCGGGGGCCGGCGGGGAGCGGCGGCCGCGCCGCTGCTGGTGGCGGTGGCCGCG
CTGCTACTGGGCGCCGCGGGCCACCTGTACCCCGGAGAGGTGTGTCCCGGCATGGATATC
CGGAACAACCTCACTAGGTTGCATGAGCTGGAGAATTGCTCTGTCATCGAAGGACACTTG
CAGATACTCTTGATGTTCAAAACGAGGCCCGAAGATTTCCGAGACCTCAGTTTCCCCAAA
CTCATCATGATCACTGATTACTTGCTGCTCTTCCGGGTCTATGGGCTCGAGAGCCTGAAG
GACCTGTTCCCCAACCTCACGGTCATCCGGGGATCACGACTGTTCTTTAACTACGCGCTG
GTCATCTTCGAGATGGTTCACCTCAAGGAACTCGGCCTCTACAACCTGATGAACATCACC
CGGGGTTCTGTCCGCATCGAGAAGAACAATGAGCTCTGTTACTTGGCCACTATCGACTGG
TCCCGTATCCTGGATTCCGTGGAGGATAATTACATCGTGTTGAACAAAGATGACAACGAG


### 2. Retrieve data from DbFetch using Web Service Client ###


Python, Perl and Java clients are provided for EBI Tools Web Services from https://github.com/ebi-wp/webservice-clients.

To simplify the process, we can download one of the clients (e.g. DBfetch) and run it in alternative to using the curl command.

Get the raw client from GitHub

In [5]:
!wget https://raw.githubusercontent.com/ebi-wp/webservice-clients/master/python/dbfetch.py

--2019-10-08 10:18:20--  https://raw.githubusercontent.com/ebi-wp/webservice-clients/master/python/dbfetch.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13597 (13K) [text/plain]
Saving to: ‘dbfetch.py’


2019-10-08 10:18:20 (1.30 MB/s) - ‘dbfetch.py’ saved [13597/13597]



One can learn more about available parameters and how to use them by typing python <client_name>.py --help

In [6]:
!python dbfetch.py --help

EMBL-EBI EMBOSS WSDbfetch Python Client:

Dbfetch service enables database entry retrieval given a set of entry
identifiers, and a required data format.

Usage:
  python dbfetch.py <method> [arguments...] [--baseUrl <baseUrl>]

A number of methods are available:
  getSupportedDBs       List available databases.
  getSupportedFormats   List available databases with formats.
  getSupportedStyles    List available databases with styles.
  getDbFormats          List formats for a specifed database. Requires <dbName>.
  getFormatStyles       List styles for a specified database and format.
                        Requires <dbName> and <dbFormat>.
  fetchData             Retrive an database entry. See below for details of arguments.
  fetchBatch            Retrive database entries. See below for details of arguments.

Fetching an entry: fetchData
  python dbfetch.py fetchData <dbName:id> [format [style]]

  dbName:id  database name and entry ID or accession (e.g. UNIPROT

Retrieving the same sequence in fasta format from ENA could be done using the Python client as follows:

In [7]:
!python dbfetch.py fetchData ena_coding:AAA59452 fasta

>ENA|AAA59452|AAA59452.1 Homo sapiens (human) insulin receptor
ATGGGCACCGGGGGCCGGCGGGGAGCGGCGGCCGCGCCGCTGCTGGTGGCGGTGGCCGCG
CTGCTACTGGGCGCCGCGGGCCACCTGTACCCCGGAGAGGTGTGTCCCGGCATGGATATC
CGGAACAACCTCACTAGGTTGCATGAGCTGGAGAATTGCTCTGTCATCGAAGGACACTTG
CAGATACTCTTGATGTTCAAAACGAGGCCCGAAGATTTCCGAGACCTCAGTTTCCCCAAA
CTCATCATGATCACTGATTACTTGCTGCTCTTCCGGGTCTATGGGCTCGAGAGCCTGAAG
GACCTGTTCCCCAACCTCACGGTCATCCGGGGATCACGACTGTTCTTTAACTACGCGCTG
GTCATCTTCGAGATGGTTCACCTCAAGGAACTCGGCCTCTACAACCTGATGAACATCACC
CGGGGTTCTGTCCGCATCGAGAAGAACAATGAGCTCTGTTACTTGGCCACTATCGACTGG
TCCCGTATCCTGGATTCCGTGGAGGATAATTACATCGTGTTGAACAAAGATGACAACGAG
GAGTGTGGAGACATCTGTCCGGGTACCGCGAAGGGCAAGACCAACTGCCCCGCCACCGTC
ATCAACGGGCAGTTTGTCGAACGATGTTGGACTCATAGTCACTGCCAGAAAGTTTGCCCG
ACCATCTGTAAGTCACACGGCTGCACCGCCGAAGGCCTCTGTTGCCACAGCGAGTGCCTG
GGCAACTGTTCTCAGCCCGACGACCCCACCAAGTGCGTGGCCTGCCGCAACTTCTACCTG
GACGGCAGGTGTGTGGAGACCTGCCCGCCCCCGTACTACCACTTCCAGGACTGGCGCTGT
GTGAACTTCAGCTTCTGCCAGGACCTGCACCACAAATGCAAGAACTCGCGGAGGCAGGGC
TGCCAC

If the above command failed, some dependencies might be missing. See instructions for installing them in https://github.com/ebi-wp/webservice-clients

## 2. Run NCBI Blast+ ##

In addition to data retrieval, EMBL-EBI provides Web Services for popular Bioinformatics Applications such as NCBI BLAST+, Clustal Omega, InterProScan 5, and HMMER. Programmatic access to these services can be explored from https://www.ebi.ac.uk/Tools/webservices. The common API can be browsed from https://www.ebi.ac.uk/Tools/common/tools/help/

Since data needs to be passed to the server for the application to run (i.e. some input sequence data, in this case), the request uses the POST HTTP verb.
In this example, we can run NCBI BLAST+ using Swissprot database (uniprotkb_swissprot) and using a UniProt sequence accession number as the input.

### 1.Run NCBI Blast+ using REST URL


In [8]:
!curl -X POST --header 'Content-Type: application/x-www-form-urlencoded' --header 'Accept: text/plain' -d 'email=test@ebi.ac.uk&program=blastp&stype=protein&sequence=sp:wap_rat&database=uniprotkb_swissprot' 'https://www.ebi.ac.uk/Tools/services/rest/ncbiblast/run'

ncbiblast-R20191008-111833-0999-94082917-p1m

As you can see, in this case we passed `-X POST` instead of `-X GET`. Additionally, we needed to pass `-d or --data`. The various parameter options were separated with `&` symbol. 
The parameters were:
* `email=test\@ebi.ac.uk`
* `program=blastp`
* `stype=protein` 
* `sequence=sp:wap_rat`
* `database=uniprotkb_swissprot`

For retrieving a BLAST output for the previous job with (example `jobid=ncbiblast-R20190225-110224-0172-5825946-p1m`) we need to use a different endpoint `/result/{jobId}/{resultType}`. In this case we can retrieve the default BLAST output format which has been named `out`.

In [9]:
!curl -X GET --header 'Accept: text/plain' 'https://www.ebi.ac.uk/Tools/services/rest/ncbiblast/result/ncbiblast-R20191008-111833-0999-94082917-p1m/out' -o blast.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  203k    0  203k    0     0   133k      0 --:--:--  0:00:01 --:--:--  133k


In [10]:
!head -n 50 blast.txt

BLASTP 2.7.1+


Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.


Reference for composition-based statistics: Alejandro A. Schaffer,
L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri
I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
"Improving the accuracy of PSI-BLAST protein database searches with
composition-based statistics and other refinements", Nucleic Acids
Res. 29:2994-3005.



Database: uniprotkb_swissprot
           560,823 sequences; 201,585,439 total letters



Query= sp|P01174|WAP_RAT Whey acidic protein OS=Rattus norvegicus OX=10116
GN=Wap PE=1 SV=2

Length=137
                                                                      Score     E
Sequences producing significant alignments:                          

### 2. Run NCBI Blast+ using Web Service Client

Similarly to what we have done for Dbfetch, we can download the clients and perform various sequence analysis using the available Bioinformatics Applications. In this example we run the same BLAST sequence search as we performed with curl.

In [11]:
# note: we are getting the raw client from GitHub
!wget https://raw.githubusercontent.com/ebi-wp/webservice-clients/master/python/ncbiblast.py

--2019-10-08 10:19:03--  https://raw.githubusercontent.com/ebi-wp/webservice-clients/master/python/ncbiblast.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 33986 (33K) [text/plain]
Saving to: ‘ncbiblast.py’


2019-10-08 10:19:03 (3.15 MB/s) - ‘ncbiblast.py’ saved [33986/33986]



In [12]:
# the clients can run the job and return outputs on the same call (synchronously as below, or asynchronously)
!python ncbiblast.py --email test@ebi.ac.uk --program blastp --stype protein --sequence sp:wap_rat --database uniprotkb_swissprot --outformat out --outfile wap_rat

JobId: ncbiblast-R20191008-111907-0234-85333878-p1m
FINISHED
Creating result file: wap_rat.out.txt


### Useful links

EMBL-EBI services and data resources: https://www.ebi.ac.uk/services  
EMBL-EBI APIs:https://bit.ly/EMBL-EBI-APIs  
EMBL-EBI Web Services General Documentation: https://www.ebi.ac.uk/Tools/webservices  
Web Service Clients for EBI Tools and EBI Search: https://github.com/ebi-wp/webservice-clients  
RESTful API (SWAGGER) User Interface for EBI Tools: https://www.ebi.ac.uk/Tools/common/tools/help/  
RESTful API (SWAGGER) User Interface for EBI Search: https://www.ebi.ac.uk/ebisearch/swagger.ebi  

**Contact us via Help and Support at https://www.ebi.ac.uk/support/webservices**