# How to use EBI Metagenomics API

The EMG REST API https://www.ebi.ac.uk/metagenomics/api/latest/ provides an easy-to-use set of top level resources, such as studies, samples, runs, experiment-types, biomes and annotations, that let user access metagenomics data in simple JSON format (JSON object formatted data structure is a syntax for storing and exchanging data). Retrieving the data is as simple as sending a HTTP request. Response return JSON object formatted data structure that contains the resource type, associated object identifier (id) with attributes. Where appropriate, relationships and links are provided to other resources.

We have utilised an interactive documentation framework (Swagger UI) to visualise and simplify interaction with the API’s resources via an HTML interface. Detailed explanations of the purpose of all resources, along with many examples, are provided to guide end-users. Documentation on how to use the endpoints is available at https://www.ebi.ac.uk/metagenomics/api/docs/.

# Browse API

### Task 1

Find marine studies

Answer:
1. https://www.ebi.ac.uk/metagenomics/api/latest/studies?lineage=root%3AEnvironmental%3AAquatic%3AMarine
2. https://www.ebi.ac.uk/metagenomics/api/latest/biomes/root:Environmental:Aquatic:Marine/studies

### Task 2

Find oceanic metagenomic samples taken from latitude >= 70° (N)

Answer:
1. https://www.ebi.ac.uk/metagenomics/api/latest/samples?experiment_type=metagenomic&lineage=root%3AEnvironmental%3AAquatic%3AMarine%3AOceanic&latitude_gte=70

2. https://www.ebi.ac.uk/metagenomics/api/latest/experiment-types/metagenomic/samples?experiment_type=&biome_name=&lineage=root%3AEnvironmental%3AAquatic%3AMarine%3AOceanic&geo_loc_name=&latitude_gte=70

# Write scripts

### Import Python modules

In [None]:
from pandas import DataFrame

try:
    from urllib import urlencode
except ImportError:
    from urllib.parse import urlencode

In [None]:
from jsonapi_client import Session, Filter

API_BASE = 'https://www.ebi.ac.uk/metagenomics/api/latest/'

### Get study

Get study: https://www.ebi.ac.uk/metagenomics/api/latest/studies/ERP009004

In [None]:
with Session(API_BASE) as s:
    study = s.get('studies', 'ERP009004').resource
    print('Study name:', study.study_name)
    print('Study abstract:', study.study_abstract)
    for biome in study.biomes:
        print('Biome:', biome.biome_name, biome.lineage)

### List samples with biomes for the given study

Get study: https://www.ebi.ac.uk/metagenomics/api/latest/studies/ERP001736

List samples: https://www.ebi.ac.uk/metagenomics/api/latest/studies/ERP001736/samples


Fetch samples for the given study accession: https://www.ebi.ac.uk/metagenomics/api/latest/samples?study_accession=ERP001736


In [None]:
df = DataFrame(columns=('sample name', 'lineage', 'biome', 'feature', 'material'))
df.index.name = 'accession'

with Session(API_BASE) as s:
    params = {
        'study_accession': 'ERP001736',
        'page_size': 100,
    }
    f = Filter(urlencode(params))
    for sample in s.iterate('samples', f):
        df.loc[sample.accession] = [
            sample.sample_name,
            sample.biome.id,
            sample.environment_biome,
            sample.environment_feature,
            sample.environment_material
        ]
df

### List samples with biomes and metadata for the given study

Samples for the given study accession: https://www.ebi.ac.uk/metagenomics/api/latest/samples?study_accession=ERP001736


In [None]:
def get_metadata(metadata, key):
    import html
    for m in metadata:
        if m['key'].lower() == key.lower():
            value = m['value']
            unit = html.unescape(m['unit']) if m['unit'] else ""
            return "{value} {unit}".format(value=value, unit=unit)
    return None

depth_label = 'geographic location (depth)'
temp_label = 'temperature'
df = DataFrame(columns=('sample name', 'biome', 'temperature', 'depth', 'longitude', 'latitude'))
df.index.name = 'accession'

with Session(API_BASE) as s:
    params = {
        'study_accession': 'ERP001736',
        'include': 'biome',
        'page_size': 100,
    }
    f = Filter(urlencode(params))
    for sample in s.iterate('samples', f):
        df.loc[sample.accession] = [
            sample.sample_name, sample.biome.id,
            get_metadata(sample.sample_metadata, temp_label),
            get_metadata(sample.sample_metadata, depth_label),
            sample.longitude, sample.latitude
        ]
df

### List runs

Get sample: https://www.ebi.ac.uk/metagenomics/api/latest/samples/ERS1871412

List runs: https://www.ebi.ac.uk/metagenomics/api/latest/samples/ERS1871412/runs

In [None]:
df = DataFrame(columns=('instrument platform', 'instrument model', 'analysis pipeline'))
df.index.name = 'accession'

with Session(API_BASE) as s:
    sample = s.get('samples', 'ERS1871412').resource
    for run in sample.runs:
        df.loc[run.accession] = [
            run.instrument_platform, run.instrument_model,
            ", ".join([p.release_version for p in run.pipelines])
        ]

df

### List sample metadata

Get sample with metadata: https://www.ebi.ac.uk/metagenomics/api/latest/samples/ERS488919

In [None]:
def format_unit(unit):
    import html
    return html.unescape(unit) if unit else ""

df = DataFrame(columns=('metadata key', 'value', 'unit'))

with Session(API_BASE) as s:
    sample = s.get('samples', 'ERS488919').resource
    print(sample.sample_name, sample.accession)
            
    for i, m in enumerate(sample.sample_metadata):
        df.loc[i] = [
            m['key'], m['value'],
            format_unit(m['unit'] or None)
        ]

df

### List organisms

Organisms: https://www.ebi.ac.uk/metagenomics/api/latest/runs/ERR263024/pipelines/1.0/taxonomy

In [None]:
df = DataFrame(columns=('parent','domain', 'rank', 'reads'))
df.index.name = 'Organism'

with Session(API_BASE) as s:
    f = Filter('page_size=250')
    for ann in s.iterate('runs/ERR771106/pipelines/4.0/taxonomy/lsu', f):
        df.loc[ann.name] = [
            ann.parent, ann.domain, ann.rank, ann.count
        ]
df.sort_values('reads', ascending=False)

### List functional annotations

Gene Ontology (GO) terms derived from InterPro matches: https://www.ebi.ac.uk/metagenomics/api/latest/runs/ERR263024/pipelines/1.0/go-slim

In [None]:
df = DataFrame(columns=('category', 'description', 'annotation counts'))
df.index.name = 'GO term'

with Session(API_BASE) as s:
    f = Filter('page_size=100')
    for ann in s.iterate('runs/SRR1047054/pipelines/2.0/go-slim', f):
        df.loc[ann.accession] = [
            ann.lineage, ann.description, ann.count
        ]
df

### List marine metagenomic samples collected in a temperature between 1°C and 5°C

List samples: https://www.ebi.ac.uk/metagenomics/api/latest/biomes/root:Environmental:Aquatic:Marine/samples?experiment_type=metagenomic&metadata_key=temperature&metadata_value_gte=1&metadata_value_lte=5

In [None]:
def get_metadata(metadata, key='temperature'):
    import html
    for m in metadata:
        if m['key'].lower() == key.lower():
            value = m['value']
            unit = html.unescape(m['unit']) if m['unit'] else ""
            return "{value} {unit}".format(value=value, unit=unit)
    return None

depth_label = 'geographic location (depth)'
temp_label = 'temperature'
df = DataFrame(columns=('sample name', 'biome', 'temperature', 'depth', 'location', 'latitude'))
df.index.name = 'accession'

with Session(API_BASE) as s:
    params = {
        'experiment_type': 'metagenomic',
        'metadata_key': 'temperature',
        'metadata_value_gte': 1,
        'metadata_value_lte': 5,
        'latitude_gte': 0,
        'include': 'biome',
    }
    f = Filter(urlencode(params))
    for sample in s.iterate('biomes/root:Environmental:Aquatic:Marine/samples', f):
        df.loc[sample.accession] = [
            sample.sample_name, sample.biome.id,
            get_metadata(sample.sample_metadata, temp_label),
            get_metadata(sample.sample_metadata, depth_label),
            sample.geo_loc_name, sample.latitude
        ]
df

### Export to CSV

Get study: https://www.ebi.ac.uk/metagenomics/api/latest/studies/ERP005831

In [None]:
import csv

with open("output.csv", "w") as csvfile:
    with Session(API_BASE) as s:
        fieldnames = ['study', 'sample', 'biome', 'lineage', 'longitude', 'latitude']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        study = s.get('studies', 'ERP005831').resource
        for sample in study.samples:
            biome = sample.biome
            row = {
                'study': study.accession,
                'sample': sample.accession,
                'biome': biome.biome_name,
                'lineage': biome.lineage,
                'longitude': sample.longitude,
                'latitude': sample.latitude
            }
            writer.writerow(row)

df = DataFrame().from_csv('output.csv')
df