# Introduction to EMBL-EBI Web Services

## In this Session
* Quick pointers
* Basic examples on how to retrieve data from a URL
* How to construct REST URLs to fetch data in different formats
* (Optional) using a Python client to retrieve data with Dbfetch
* (Optional) using a Python client to submit a job
* Appendix 
    * Glossary
    * Useful links

## Quick pointers

EMBL-EBI data resources and tools can be explored from the https://www.ebi.ac.uk/services web page. Many of the resources listed provide programmatic access capabilities, via simple download pages (e.g. from an ftp server), or through REST/SOAP APIs. 

Web Production Team has collected a list of EMBL-EBI resources that provide APIs, available at https://bit.ly/EMBL-EBI-APIs

## Basic examples on how to retrieve data from a URL

In this example we will be using [Dbfetch](https://www.ebi.ac.uk/Tools/dbfetch/), which provides an easy way to retrieve entries from various databases at the EMBL-EBI in a consistent manner. It can be accessed from any browser as well as through programming access.

To retrieve a coding sequence entry from the European Nucleotide Archive (ENA), we could open a browser window and try the following URL: https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_coding;id=AAA59452

By inspecting the URL, one can see that the queried `db` is named "ena_coding" and the queried `id` is AAA59452. The default view for this query is *html*, since the result is displayed within the webpage. By adding `style=raw` to the previous URL we can see the same result as a raw plain text output: https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_coding;id=AAA59452;style=raw

**Note:** different options are separated by ";" when constructing the final URL.

We could then retrieve this entry using *curl* (or other applications, such as wget) but in this example we are using a popular http request Python module called [requests](http://docs.python-requests.org/en/master/).

In [None]:
!curl "https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_coding;id=AAA59452;style=raw" > AAA59452.embl

In [None]:
!head AAA59452.embl

In [None]:
import requests

dbfetch_url = "https://www.ebi.ac.uk/Tools/dbfetch/"

In [None]:
db = "ena_coding"
ena_id = "AAA59452"

url = dbfetch_url + "dbfetch?style=raw;db=%s;id=%s" % (db, ena_id)
r = requests.get(url)
if r.ok:
    print(r.text)

## How to construct URLs to fetch data in different formats
Another aspect that can be done with the previous URL, is to specify a particular output `format`. In this case, EMBL format is return as it is the default for this database. Others, such as fasta, can also be retrieved:
https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_coding;id=AAA59452;style=raw;format=fasta

In [None]:
outformat = "fasta"

url = dbfetch_url + "dbfetch?style=raw;db=%s;id=%s;format=%s" % (db, ena_id, outformat)
r = requests.get(url)
if r.ok:
    print(r.text)

In some instances, the APIs do not have `format` or a similar options that can be specified. A common way for some APIs to return outputs in formats such as JSON, is to accept specific headers, which instruct the API to return the output in a particular format.

An example of this can be shown by using [EBI Search](https://www.ebi.ac.uk/ebisearch). Here, we would like to find InterPro cross-references, from a particular UniProtKB entry. A typical search in EBI's search bar for this would return a webpage URL such as: https://www.ebi.ac.uk/ebisearch/search.ebi?db=interpro7&query=P09211

Using EBI Search's API (at https://www.ebi.ac.uk/ebisearch/swagger.ebi), we could retrieve the same data with requests as follows:

In [None]:
ebisearch_url = "https://www.ebi.ac.uk/ebisearch/ws/rest/"

In [None]:
domain = "uniprot"
uniprotid = "P12345"
xrefdomain = "interpro"

url = ebisearch_url + "%s/entry/%s/xref/%s" % (domain, uniprotid, xrefdomain)
r = requests.get(url)
if r.ok:
    print(r.text)

As you can see the default output format for this query is XML, but we can also retrieve JSON (and others, e.g. CSV and TSV) by passing a header param.

In [None]:
import json

r = requests.get(url, headers={'Accept': "application/json"})
if r.ok:
    print(json.dumps(r.json(), sort_keys=True, indent=4))

Let's try getting the values in a CSV file:

In [None]:
import pandas as pd
from io import StringIO

r = requests.get(url, headers={'Accept': "text/csv"})
if r.ok:
    print(pd.read_csv(StringIO(r.text)))

For retrieving a BLAST output for the previous job with (example `jobid=ncbiblast-I20190625-094438-0592-62765631-p2m`) we need to use a different endpoint `/result/{jobId}/{resultType}`. In this case we can retrieve the default BLAST output format wich has been named `out`.

In [None]:
!curl -X GET --header 'Accept: text/plain' 'https://www.ebi.ac.uk/Tools/services/rest/ncbiblast/result/ncbiblast-I20190625-094438-0592-62765631-p2m/out' -o blast.txt

In [None]:
!head -n 50 blast.txt

## (Optional) using a Python client to retrieve data with Dbfetch

Python, Perl and Java clients are provided for EBI Tools Web Services from https://github.com/ebi-wp/webservice-clients.

To simplify the process we can download one of the clients (e.g. DBfetch or BLAST) and run it in alternative to using the a custom Python script.

In [None]:
# note: we are getting the raw client from GitHub
!wget https://raw.githubusercontent.com/ebi-wp/webservice-clients/master/python/dbfetch.py

One can learn more about available parameters and how to use them by typing `python <client_name>.py --help`

In [None]:
!python dbfetch.py --help

Retrieving the same sequence in fasta format from ENA could be done using the Python client as follows:

In [None]:
!python dbfetch.py fetchData ena_coding:AAA59452 fasta

If the above command failed, some dependencies might be missing. See instructions for installing them in https://github.com/ebi-wp/webservice-clients

## (Optional) using a Python client to submit a job

In addition to data retrieval, EMBL-EBI provides Web Services for popular Bioinformatics Applications such as NCBI BLAST+, Clustal Omega, InterProScan 5, and HMMER. Programmatic access to these services can be explored from https://www.ebi.ac.uk/Tools/webservices. The common API can be browsed from https://www.ebi.ac.uk/Tools/common/tools/help/

Since data needs to be passed to the server for the application to run (i.e. some input sequence data, in this case), the request uses the POST HTTP verb. In this example, we can run NCBI BLAST+ using Swissprot database (`uniprotkb_swissprot`) and using a UniProt sequence accession number as the input.

Similarly to what we have done for Dbfetch, we can download the clients and perform various sequence analysis using the available Bioinformatics Applications. 

In [None]:
# note: we are getting the raw client from GitHub
!wget https://raw.githubusercontent.com/ebi-wp/webservice-clients/master/python/ncbiblast.py

The various parameter options required for submitting a BLAST job are:  
* `--email test\@ebi.ac.uk`
* `--program blastp`
* `--stype protein` 
* `--sequence sp:wap_rat`
* `--database uniprotkb_swissprot`

In [None]:
# the clients can run the job and return outputs on the same call (synchronously as below, or asynchronously)
!python ncbiblast.py --email test@ebi.ac.uk --program blastp --stype protein --sequence sp:wap_rat --database uniprotkb_swissprot --outformat out --outfile wap_rat

If the above command failed, some dependencies might be missing. See instructions for installing them in https://github.com/ebi-wp/webservice-clients

## Appendix
### Glossary

**API** - Application Programming Interface  
**CSV** - Comma-sparated Values  
**HTTP** - HyperText Transfer Protocol  
**JSON** - JavaScript Object Notation  
**REST** - Representational State Transfer  
**SOAP** - Simple Object Access Protocol  
**TSV** - Tab-separated Values  
**URL** - Uniform Resource Locator  

### Useful links

EMBL-EBI services and data resources: https://www.ebi.ac.uk/services  
EMBL-EBI APIs:https://bit.ly/EMBL-EBI-APIs  
EMBL-EBI Web Services General Documentation: https://www.ebi.ac.uk/Tools/webservices  
Web Service Clients for EBI Tools and EBI Search: https://github.com/ebi-wp/webservice-clients  
RESTful API (SWAGGER) User Interface for EBI Tools: https://www.ebi.ac.uk/Tools/common/tools/help/  
RESTful API (SWAGGER) User Interface for EBI Search: https://www.ebi.ac.uk/ebisearch/swagger.ebi  

**Contact us via Help and Support at https://www.ebi.ac.uk/support/webservices**