<div style="max-width:1200px"><img src="../_resources/mgnify_banner.png" width="100%"></div>

<img src="../_resources/mgnify_logo.png" width="200px">

# Search for MGnify Studies or Samples, using MGnifyR

The [MGnify API](https://www.ebi.ac.uk/metagenomics/api/v1) returns data and relationships as JSON. 
[MGnifyR](https://github.com/beadyallen/MGnifyR) is a package to help you read MGnify data into your R analyses.

**This example shows you how to perform a search of MGnify Studies or Samples**

You can find all of the other "API endpoints" using the [Browsable API interface in your web browser](https://www.ebi.ac.uk/metagenomics/api/v1).
This interface also lets you inspect the kinds of Filters that can be created for each list.

This is an interactive code notebook (a Jupyter Notebook).
To run this code, click into each cell and press the ▶ button in the top toolbar, or press `shift+enter`.

---

In [None]:
library(IRdisplay)
display_markdown(file = '../_resources/mgnifyr_help.md')

Load packages:

In [None]:
library(vegan)
library(ggplot2)
library(phyloseq)
library(MGnifyR)

mg <- mgnify_client(usecache = T, cache_dir = '/tmp/mgnify_cache')

## Contents
- [Example: Find Polar Samples](#Example:-find-Polar-samples)
- [Example: Find Wastewater Samples](#Example:-find-Wastewater-studies)
- [More Sample filters](#More-Sample-filters)
- [More Study filters](#More-Study-filters)
- [Example: Filtering Samples both API-side and client-side](#Example:-adding-additional-filters-to-the-data-frame)

### Documentation for `mgnify_query`

In [None]:
?mgnify_query

## Example: find Polar samples 
In these examples we set `maxhits=1` to retrieve only the first page of results. You can change the limit or set it to `-1` to retrieve all samples matching the query.

In [None]:
samps_np <- mgnify_query(mg, "samples", latitude_gte=88, maxhits=1)
samps_sp <- mgnify_query(mg, "samples", latitude_lte=-88, maxhits=1)
samps_polar <- rbind(samps_np, samps_sp)

In [None]:
head(samps_polar)

## Example: find Wastewater studies

In [None]:
studies_ww <- mgnify_query(mg, "studies", biome_name="wastewater", maxhits=1)

In [None]:
head(studies_ww)

## More filters to try:

### Samples by location

```R
more_northerly_than <- mgnify_query(mg, "samples", latitude_gte=88, maxhits=1)

more_southerly_than <- mgnify_query(mg, "samples", latitude_lte=-88, maxhits=1)

more_easterly_than <- mgnify_query(mg, "samples", longitude_gte=170, maxhits=1)

more_westerly_than <- mgnify_query(mg, "samples", longitude_lte=170, maxhits=1)

at_location <- mgnify_query(mg, "samples", geo_loc_name="usa", maxhits=1)
```

### Samples by biome
```R
biome_within_wastewater <- mgnify_query(mg, "samples", biome_name="wastewater", maxhits=1)
```

### Samples by metadata
There are a large number of metadata key:value pairs, because these are author-submitted, along with the samples, to the ENA archive.

If you know how to specify the metadata key:value query for the samples you're interested in, you can use this form to find matching Samples:

```R
from_ex_smokers <- mgnify_query(mg, "samples", metadata_key="smoker", metadata_value="ex-smoker", maxhits=-1)
```
To find `metadata_key`s and values, it is best to browse the [interactive API Browser](https://www.ebi.ac.uk/metagenomics/v1/samples), and use the `Filters` button to construct queries interactively at first.

### Studies by centre name
```R
from_smithsonian <- mgnify_query(mg, "studies", centre_name="Smithsonian", maxhits=-1)
```

To find `metadata_key`s and values, it is best to browse the [interactive API Browser](https://www.ebi.ac.uk/metagenomics/v1/samples), and use the `Filters` button to construct queries interactively at first.

---


## Example: adding additional filters to the data frame

First, fetch some samples from the Lentic biome. We can specify the entire Biome lineage, too.

In [None]:
lentic_samples <- mgnify_query(mg, "samples", biome_name="root:Environmental:Aquatic:Lentic", usecache=T)

Now, also filter by depth *within* the returned results, using normal R syntax.

In [None]:
depth_numeric = as.numeric(lentic_samples$depth)  # We must convert data from MGnifyR (always strings) to numerical format.
depth_numeric[is.na(depth_numeric)] = 0.0  # If depth data is missing, assume it is surface-level.
lentic_subset = lentic_samples[depth_numeric >=25 & depth_numeric <=50,]  # Filter to samples collected between 25m and 50m down.
lentic_subset