# How to use EBI Metagenomics API

The EMG REST API https://www.ebi.ac.uk/metagenomics/api/ provides an easy-to-use set of top level resources, such as studies, samples, runs, experiment-types, biomes and annotations, that let user access metagenomics data in simple JSON format (JSON is a syntax for storing and exchanging data). Retrieving the data is as simple as sending a HTTP request. Response return JSON object formatted data structure that contains the resource type, associated object identifier (id) with attributes. Where appropriate, relationships and links are provided to other resources.

We have utilised an interactive documentation framework (Swagger UI) to visualise and simplify interaction with the API’s resources via an HTML interface. Detailed explanations of the purpose of all resources, along with many examples, are provided to guide end-users. Documentation on how to use the endpoints is available at https://www.ebi.ac.uk/metagenomics/api/docs/.

# Browse API

### Task 1

Find metagenomic studies

Answer: https://www.ebi.ac.uk/metagenomics/api/v0.2/studies?experiment_type=metagenomic

### Task 2

Find oceanic metagenomic samples taken from latitude >= 70° (N)

Answer: https://www.ebi.ac.uk/metagenomics/api/v0.2/samples?experiment_type=metagenomic&lineage=root%3AEnvironmental%3AAquatic%3AMarine%3AOceanic&latitude_gte=70

# Write scripts

### Import Python modules

In [1]:
from pandas import DataFrame

try:
    from urllib import urlencode
except ImportError:
    from urllib.parse import urlencode

In [2]:
from jsonapi_client import Session, Filter

API_BASE = 'https://www.ebi.ac.uk/metagenomics/api/v0.2/'

### Get study and list samples with biome

Get study: https://www.ebi.ac.uk/metagenomics/api/v0.2/studies/ERP005831

List samples: https://www.ebi.ac.uk/metagenomics/api/v0.2/studies/ERP005831/samples

Change study accession to ERP002497

In [3]:
df = DataFrame(columns=('sample name', 'biome', 'lineage'))
df.index.name = 'accession'

with Session(API_BASE) as s:
    study = s.get('studies', 'ERP005831').resource
    for sample in study.samples:
        biome = sample.biome
        df.loc[sample.accession] = [sample.sample_name, biome.biome_name, biome.lineage]

df

Unnamed: 0_level_0,sample name,biome,lineage
accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ERS456668,Tocil Lake surface sediment (12C),Sediment,root:Environmental:Aquatic:Freshwater:Lentic:S...
ERS456669,Hunts Mill field soil (12C),Agricultural,root:Environmental:Terrestrial:Soil:Loam:Agric...


### List runs

Get sample: https://www.ebi.ac.uk/metagenomics/api/v0.2/samples/ERS667565

List runs: https://www.ebi.ac.uk/metagenomics/api/v0.2/samples/ERS667565/runs

In [4]:
df = DataFrame(columns=('instrument platform', 'instrument model', 'analysis pipeline'))
df.index.name = 'accession'

with Session(API_BASE) as s:
    sample = s.get('samples', 'ERS667565').resource
    for run in sample.runs:
        df.loc[run.accession] = [
            run.instrument_platform, run.instrument_model,
            ", ".join([p.release_version for p in run.pipelines])
        ]

df

Unnamed: 0_level_0,instrument platform,instrument model,analysis pipeline
accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ERR867951,ILLUMINA,Illumina MiSeq,4.0
ERR867950,ILLUMINA,Illumina MiSeq,4.0
ERR771104,ILLUMINA,Illumina MiSeq,"2.0, 4.0"


### List sample metadata

Get sample: https://www.ebi.ac.uk/metagenomics/api/v0.2/samples/ERS488919

List sample metadata: https://www.ebi.ac.uk/metagenomics/api/v0.2/samples/ERS488919/metadata

In [5]:
def format_unit(unit):
    import html
    return html.unescape(m.unit) if m.unit else ""

df = DataFrame(columns=('metadata key', 'value', 'unit'))

with Session(API_BASE) as s:
    sample = s.get('samples', 'ERS488919').resource
    print(sample.sample_name, sample.accession)
            
    for i, m in enumerate(sample.metadata):
        df.loc[i] = [
            m.var_name, m.var_value,
            format_unit(m.unit)
        ]

df

TARA_20100318T1133Z_039_EVENT_PUMP_P_D_(25 m)_BACT_NUC-DNA(100L)_W1.6-20_TARA_B100000105 ERS488919


Unnamed: 0,metadata key,value,unit
0,temperature,26.812225,°C
1,project name,Tara Oceans expedition (2009-2013),
2,geographic location (depth),25,m
3,environmental package,water,
4,instrument model,Illumina HiSeq 2000,
5,ENA checklist,ENA TARA (ERC000030),
6,latitude end,18.5679,DD
7,longitude end,66.4581,DD
8,marine region,,
9,protocol label,BACT_NUC-DNA(100L)_W1.6-20,


### List functional annotations

Gene Ontology (GO) terms derived from InterPro matches: https://www.ebi.ac.uk/metagenomics/api/v0.2/runs/ERR263024/pipelines/1.0/go-slim

In [6]:
df = DataFrame(columns=('category', 'description', 'annotation counts'))
df.index.name = 'GO term'

with Session(API_BASE) as s:
    f = Filter('page_size=100')
    for ann in s.iterate('runs/SRR1047054/pipelines/2.0/go-slim', f):
        df.loc[ann.accession] = [
            ann.lineage, ann.description, ann.count
        ]
df

Unnamed: 0_level_0,category,description,annotation counts
GO term,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
GO:0000156,molecular_function,two-component response regulator activity,2.0
GO:0000160,biological_process,phosphorelay signal transduction system,50.0
GO:0000166,molecular_function,nucleotide binding,316.0
GO:0000746,biological_process,conjugation,3.0
GO:0000902,biological_process,cell morphogenesis,5.0
GO:0000988,molecular_function,protein binding transcription factor activity,17.0
GO:0001071,molecular_function,nucleic acid binding transcription factor acti...,47.0
GO:0002376,biological_process,immune system process,0.0
GO:0003676,molecular_function,nucleic acid binding,218.0
GO:0003774,molecular_function,motor activity,2.0


### List oceanic metagenomic samples collected in a temperature between 1°C and 5°C

List samples: https://www.ebi.ac.uk/metagenomics/api/v0.2/biomes/root:Environmental:Aquatic:Marine:Oceanic/samples?experiment_type=metagenomic&metadata_key=temperature&metadata_value_gte=1&metadata_value_lte=5

In [7]:
def get_metadata(metadata, key='temperature'):
    import html
    for m in metadata:
        if m.var_name.lower() == key.lower():
            value = m.var_value
            unit = html.unescape(m.unit) if m.unit else ""
            return "{value} {unit}".format(value=value, unit=unit)
    return None

depth_label = 'geographic location (depth)'
temp_label = 'temperature'
df = DataFrame(columns=('sample name', 'biome', 'temperature', 'depth', 'location', 'latitude'))
df.index.name = 'accession'

with Session(API_BASE) as s:
    params = {
        'experiment_type': 'metagenomic',
        'metadata_key': 'temperature',
        'metadata_value_gte': 1,
        'metadata_value_lte': 5,
        'latitude_gte': 0,
        'include': 'biome,metadata',
    }
    f = Filter(urlencode(params))
    for sample in s.iterate('biomes/root:Environmental:Aquatic:Marine/samples', f):
        df.loc[sample.accession] = [
            sample.sample_name, sample.biome.biome_name,
            get_metadata(sample.metadata, temp_label),
            get_metadata(sample.metadata, depth_label),
            sample.geo_loc_name, sample.latitude
        ]
df

Unnamed: 0_level_0,sample name,biome,temperature,depth,location,latitude
accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
SRS981327,BP_381106-1,Marine,4.4889,,USA: Gulf of Mexico,28.7051
SRS981330,BP_381105-1,Marine,5.5365,,USA: Gulf of Mexico,28.7051
SRS981311,BP_381111-2,Marine,4.4199,,USA: Gulf of Mexico,28.6632
SRS981308,BP_381112-2,Marine,4.281,,USA: Gulf of Mexico,28.6632
SRS981307,BP_381112-3,Marine,4.281,,USA: Gulf of Mexico,28.6632
SRS981326,BP_381106-2,Marine,4.4889,,USA: Gulf of Mexico,28.7051
SRS981325,BP_381106-3,Marine,4.4889,,USA: Gulf of Mexico,28.7051
SRS981309,BP_381112-1,Marine,4.281,,USA: Gulf of Mexico,28.6632
SRS981322,BP_381107-3,Marine,4.2599,,USA: Gulf of Mexico,28.7051
SRS981310,BP_381111-3,Marine,4.4199,,USA: Gulf of Mexico,28.6632


### Export to CSV

Get study: https://www.ebi.ac.uk/metagenomics/api/v0.2/studies/ERP005831

In [8]:
import csv

with open("test_export.csv", "w") as csvfile:
    with Session(API_BASE) as s:
        fieldnames = ['study', 'sample', 'biome', 'lineage', 'longitude', 'latitude']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        study = s.get('studies', 'ERP005831').resource
        for sample in study.samples:
            biome = sample.biome
            row = {
                'study': study.accession,
                'sample': sample.accession,
                'biome': biome.biome_name,
                'lineage': biome.lineage,
                'longitude': sample.longitude,
                'latitude': sample.latitude
            }
            writer.writerow(row)

df = DataFrame().from_csv('test_export.csv')
df

Unnamed: 0_level_0,sample,biome,lineage,longitude,latitude
study,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ERP005831,ERS456668,Sediment,root:Environmental:Aquatic:Freshwater:Lentic:S...,-1.56,52.38
ERP005831,ERS456669,Agricultural,root:Environmental:Terrestrial:Soil:Loam:Agric...,-1.61,52.19
