APIs (Application Programming Interfaces) allow programmatic access to data resources.
If you need to fetch and analyse metadata for many Samples, 
you can write a script to fetch data in this way.

The [HoloFood Data Portal API]() 
gives programmatic access to the HoloFood Samples and their metadata,
as well as URLs for where datasets are stored in public archives.

## Canonical URLs
Throughout the API, `canonical_url`s are returned which point to the canonical database entry, 
i.e. the authoritative source,  for each data object.

These are:

- The [European Nucleotide Archive Browser](https://www.ebi.ac.uk/ena/browser/home) for Samples and Projects.
- [MGnify](https://www.ebi.ac.uk/metagenomics) for metagenomic-derived analyses and MAGs (metagenome assembled genomes)
- [MetaboLights](https://www.ebi.ac.uk/metabolights/) for metabolomic analyses
- The websites of various partner institutions and registries where an [IRI](https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier) has been supplied with a metadata entry.
- The HoloFood Data Portal itself for "Annotations", which are documents hosted only by this website.
- The HoloFood Data Portal itself for viral annotations

## API Endpoints and Playground

The ["Browsable API Playground"]({{< var portal.root_url >}}/api) is the best place to discover the API.
Find this under the `API` navigation item on the data portal.

The browsable API lets you see the endpoints and their response schemas.
Under an endpoint, press `Try it out` to see the output for a specific query.

In brief, the top-level endpoints are:

### `/api/samples`
List samples, or fetch details about a specific sample (like its metadata).

### `/api/annotations`
List summary analyses published on the data portal.

### `/api/genome-catalogues`
List MAG catalogues, or fetch detail about a catalogue, or list the MAGs within a catalogue.

### `/api/viral-catalogues`
List Viral catalogues, or fetch detail about a catalogue, or list the fragments within a catalogue.

## Using the API
### From the command line

Use a command line tool like [cURL](https://curl.se/) to query the API.
Responses are in [JSON](https://www.json.org/json-en.html) format.

For example to list all samples:
```bash
curl {{< var portal.root_url >}}/api/samples
```

To find the `Acetic acid` metadata associated with sample `SAMEA14099422`, we could (using [`jq`](https://stedolan.github.io/jq/) to handle JSON data on the command line):
```bash
curl {{< var portal.root_url >}}/api/samples/SAMEA14099422 | jq '.structured_metadata | .[] | select(.marker.name == "Acetic acid")'
```
and get
```json
{
  "marker": {
    "name": "Acetic acid",
    "type": "FATTY ACIDS",
    "canonical_url": null
  },
  "measurement": "62.30155",
  "units": "umol/g digesta"
}

```

### From Python
More realistically, to fetch a list of samples as a [Pandas dataframe](https://pandas.pydata.org/)

:::{.callout-tip}
## Packages required
```bash
pip install requests pandas
```
:::

In [24]:
import requests
import pandas as pd

```python
samples = requests.get('{{< var portal.root_url >}}/api/samples')
```

In [21]:
#|include: false
samples = requests.get('http://127.0.0.1:8000/api/samples')

In [32]:
samples_df = pd.json_normalize(samples.json()['items'])
samples_df.head(3)

Unnamed: 0,accession,title,system,has_metagenomics,canonical_url,metagenomics_url,project.accession,project.title,project.canonical_url
0,SAMEA14099422,CB03.16F1a,chicken,False,https://www.ebi.ac.uk/ena/browser/view/SAMEA14...,,PRJEB39110,HoloFood Chicken - MAG Catalogue from Caecum c...,https://www.ebi.ac.uk/ena/browser/view/PRJEB39110
1,SAMEA7025234,CA01.10F1a,chicken,False,https://www.ebi.ac.uk/ena/browser/view/SAMEA70...,,PRJEB39110,HoloFood Chicken - MAG Catalogue from Caecum c...,https://www.ebi.ac.uk/ena/browser/view/PRJEB39110
2,SAMEA7025235,CA02.14F1a,chicken,False,https://www.ebi.ac.uk/ena/browser/view/SAMEA70...,,PRJEB39110,HoloFood Chicken - MAG Catalogue from Caecum c...,https://www.ebi.ac.uk/ena/browser/view/PRJEB39110


:::{.callout-important}
## Paginated data

However... we haven't got all of the data. We only have one page.

The `?page=` URL query parameter lets us retrieve subsequent pages.
:::

In [45]:
len(samples_df)

100

The API response does tell us how many items there are *in total*:

In [55]:
samples.json()['count']

429

In [52]:
#|include: false
samples_endpoint_base = 'http://127.0.0.1:8000/api/samples'

```python
samples_endpoint_base = '{{< var portal.root_url >}}/api/samples'
```

In [56]:
page = 1

while page:
    print(f'Fetching {page=}')
    
    samples_page = requests.get(f'{samples_endpoint_base}?{page=}').json()
    samples_page_df = pd.json_normalize(samples_page['items'])
    
    if page == 1:
        samples_df = samples_page_df
    else:
        samples_df = pd.concat([samples_df, samples_page_df])
    page += 1
    if len(samples_df) >= samples_page['count']:
        page = False

Fetching page=1
Fetching page=2
Fetching page=3
Fetching page=4
Fetching page=5


In [50]:
len(samples_df)

429

#### Find all *salmon* samples:

In [54]:
salmon = samples_df[samples_df.system == 'salmon']
print(len(salmon))
salmon.head(3)

359


Unnamed: 0,accession,title,system,has_metagenomics,canonical_url,metagenomics_url,project.accession,project.title,project.canonical_url
68,SAMEA7678309,SA01.05C1a,salmon,False,https://www.ebi.ac.uk/ena/browser/view/SAMEA76...,,PRJEB41657,HoloFood Salmon - Metagenomic DNA,https://www.ebi.ac.uk/ena/browser/view/PRJEB41657
69,SAMEA7678310,SB10.13C1a,salmon,False,https://www.ebi.ac.uk/ena/browser/view/SAMEA76...,,PRJEB41657,HoloFood Salmon - Metagenomic DNA,https://www.ebi.ac.uk/ena/browser/view/PRJEB41657
70,SAMEA7687880,SA01.01C1a,salmon,False,https://www.ebi.ac.uk/ena/browser/view/SAMEA76...,,PRJEB41657,HoloFood Salmon - Metagenomic DNA,https://www.ebi.ac.uk/ena/browser/view/PRJEB41657
