# How to use EBI Metagenomics API

The EMG REST API https://www.ebi.ac.uk/metagenomics/api/ provides an easy-to-use set of top level resources, such as studies, samples, runs, experiment-types, biomes and annotations, that let user access metagenomics data in simple JSON format (JSON is a syntax for storing and exchanging data). Retrieving the data is as simple as sending a HTTP request. Response return JSON object formatted data structure that contains the resource type, associated object identifier (id) with attributes. Where appropriate, relationships and links are provided to other resources.

We have utilised an interactive documentation framework (Swagger UI) to visualise and simplify interaction with the API’s resources via an HTML interface. Detailed explanations of the purpose of all resources, along with many examples, are provided to guide end-users. Documentation on how to use the endpoints is available at https://www.ebi.ac.uk/metagenomics/api/docs/.

### Import Python modules

In [None]:
from pandas import DataFrame

try:
    from urllib import urlencode
except ImportError:
    from urllib.parse import urlencode

In [None]:
from jsonapi_client import Session, Filter

API_BASE = 'https://www.ebi.ac.uk/metagenomics/api/v0.2/'

### Get study and list of the samples with biome

Get study: https://www.ebi.ac.uk/metagenomics/api/v0.2/studies/ERP005831

List samples: https://www.ebi.ac.uk/metagenomics/api/v0.2/studies/ERP005831/samples

In [None]:
df = DataFrame(columns=('sample name', 'biome', 'lineage'))
df.index.name = 'accession'

with Session(API_BASE) as s:
    study = s.get('studies', 'ERP005831').resource
    for sample in study.samples:
        biome = sample.biome
        df.loc[sample.accession] = [sample.sample_name, biome.biome_name, biome.lineage]

df

### List runs

Get sample: https://www.ebi.ac.uk/metagenomics/api/v0.2/samples/ERS667565

List runs: https://www.ebi.ac.uk/metagenomics/api/v0.2/samples/ERS667565/runs

In [None]:
df = DataFrame(columns=('instrument platform', 'instrument model', 'analysis pipeline'))
df.index.name = 'accession'

with Session(API_BASE) as s:
    sample = s.get('samples', 'ERS667565').resource
    for run in sample.runs:
        df.loc[run.accession] = [
            run.instrument_platform, run.instrument_model,
            ", ".join([p.release_version for p in run.pipelines])
        ]

df

### List sample metadata

Get sample: https://www.ebi.ac.uk/metagenomics/api/v0.2/samples/ERS488919

List sample metadata: https://www.ebi.ac.uk/metagenomics/api/v0.2/samples/ERS488919/metadata

In [None]:
def format_unit(unit):
    import html
    return html.unescape(m.unit) if m.unit else ""

df = DataFrame(columns=('metadata key', 'value', 'unit'))

with Session(API_BASE) as s:
    sample = s.get('samples', 'ERS488919').resource
    print(sample.sample_name, sample.accession)
            
    for i, m in enumerate(sample.metadata):
        df.loc[i] = [
            m.var_name, m.var_value,
            format_unit(m.unit)
        ]

df

### List functional annotations

Gene Ontology (GO) terms derived from InterPro matches: https://www.ebi.ac.uk/metagenomics/api/v0.2/runs/ERR263024/pipelines/1.0/go-slim

In [None]:
df = DataFrame(columns=('category', 'description', 'annotation counts'))
df.index.name = 'GO term'

with Session(API_BASE) as s:
    f = Filter('page_size=100')
    for ann in s.iterate('runs/SRR1047054/pipelines/2.0/go-slim', f):
        df.loc[ann.accession] = [
            ann.lineage, ann.description, ann.count
        ]
df

### List oceanic metagenomic samples collected in a temperature between 1°C and 5°C

List samples: https://www.ebi.ac.uk/metagenomics/api/v0.2/biomes/root:Environmental:Aquatic:Marine:Oceanic/samples?experiment_type=metagenomic&metadata_key=temperature&metadata_value_gte=1&metadata_value_lte=5

In [None]:
def get_metadata(metadata, key='temperature'):
    import html
    for m in metadata:
        if m.var_name.lower() == key.lower():
            value = m.var_value
            unit = html.unescape(m.unit) if m.unit else ""
            return "{value} {unit}".format(value=value, unit=unit)
    return None

depth_label = 'geographic location (depth)'
temp_label = 'temperature'
df = DataFrame(columns=('sample name', 'biome', 'temperature', 'depth', 'location', 'latitude'))
df.index.name = 'accession'

with Session(API_BASE) as s:
    params = {
        'experiment_type': 'metagenomic',
        'metadata_key': 'temperature',
        'metadata_value_gte': 1,
        'metadata_value_lte': 5,
        'latitude_gte': 0,
        'include': 'biome,metadata',
    }
    f = Filter(urlencode(params))
    for sample in s.iterate('biomes/root:Environmental:Aquatic:Marine/samples', f):
        df.loc[sample.accession] = [
            sample.sample_name, sample.biome.biome_name,
            get_metadata(sample.metadata, temp_label),
            get_metadata(sample.metadata, depth_label),
            sample.geo_loc_name, sample.latitude
        ]
df

### Export to CSV

Get study: https://www.ebi.ac.uk/metagenomics/api/v0.2/studies/ERP005831

In [None]:
import csv

with open("test_export.csv", "w") as csvfile:
    with Session(API_BASE) as s:
        fieldnames = ['study', 'sample', 'biome', 'lineage', 'longitude', 'latitude']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        study = s.get('studies', 'ERP005831').resource
        for sample in study.samples:
            biome = sample.biome
            row = {
                'study': study.accession,
                'sample': sample.accession,
                'biome': biome.biome_name,
                'lineage': biome.lineage,
                'longitude': sample.longitude,
                'latitude': sample.latitude
            }
            writer.writerow(row)

df = DataFrame().from_csv('test_export.csv')
df