# Demo: Retrieving Mgnify tomato samples

<!-- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ebi-metagenomics/mgnipy/blob/main/docs/tutorial/demo-mgnify-tomatoes.ipynb) -->

In this tutorial we demonstrate how MGniPy can be used to retrieve tomato rhizosphere metagenomics analyses (only sample metadata for now) available on MGnify. 

## Searching for studies

We can use a Mgnifier to look in a given resource: 
- `biomes` to look for studies
- `studies` to look for samples
- `samples` to look for runs/assemblies
- `runs` to look for analyses
- `analyses` to look for results? 


In [1]:
from mgnipy import Mgnifier

you can either pass query parameters as dict to `params` or as kwargs. Please refer to [mgnify api docs](https://www.ebi.ac.uk/metagenomics/api/docs/) for the accepted kwargs for now or via attribute `Mgnifier.supported_kwargs`

In [2]:
# init 
glass = Mgnifier(
    resource='biomes',
    lineage="root:Host-associated:Plants:Rhizosphere",
    search="tomato",
    page_size=3
)

print(glass)

Mgnifier instance for MGnify biomes metadata
----------------------------------------
Base URL: https://www.ebi.ac.uk/metagenomics/api/
API Version: v1
Parameters: {'lineage': 'root:Host-associated:Plants:Rhizosphere', 'search': 'tomato', 'page_size': 3}
Checkpoint Directory: None
Request URL: https://www.ebi.ac.uk/metagenomics/api/v1/biomes/?lineage=root%3AHost-associated%3APlants%3ARhizosphere&search=tomato&page_size=3



the mgnifier has been initiated only, no request made yet. 

Next we must plan or preview before carrying out the full request (of all page results) 

In [3]:
glass.plan()

Planning the API call with params:
{'lineage': 'root:Host-associated:Plants:Rhizosphere', 'search': 'tomato', 'page_size': 3}
Acquiring meta for 3 biomes per page...
Request URL: https://www.ebi.ac.uk/metagenomics/api/v1/biomes/?lineage=root%3AHost-associated%3APlants%3ARhizosphere&search=tomato&page_size=3
Response status code: 200
Total pages to retrieve: 2
Total records to retrieve: 6


previewing is basically the same as planning but returns the first page results as a pandas.DataFrame

In [4]:
glass.preview()

Previewing Page 1 of 2 pages (6 records)...


Unnamed: 0,type,id,links,samples-count,bioproject,accession,is-private,last-update,secondary-accession,centre-name,...,study-abstract,study-name,data-origination,analyses.links.related,publications.links.related,biomes.links.related,biomes.data,geocoordinates.links.related,samples.links.related,downloads.links.related
0,studies,MGYS00006231,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,114,PRJEB55060,MGYS00006231,False,2025-07-05T06:19:56,ERP139927,EMG,...,The Third Party Annotation (TPA) assembly was ...,EMG produced TPA metagenomics assembly of PRJN...,SUBMITTED,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,"[{'type': 'biomes', 'id': 'root:Host-associate...",https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...
1,studies,MGYS00006230,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,1,PRJEB55057,MGYS00006230,False,2023-06-15T13:46:24,ERP139923,EMG,...,The Third Party Annotation (TPA) assembly was ...,EMG produced TPA metagenomics assembly of PRJN...,SUBMITTED,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,"[{'type': 'biomes', 'id': 'root:Host-associate...",https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...
2,studies,MGYS00006205,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,12,PRJEB55224,MGYS00006205,False,2023-04-20T21:07:45,ERP140107,EMG,...,The Third Party Annotation (TPA) assembly was ...,EMG produced TPA metagenomics assembly of PRJN...,SUBMITTED,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,"[{'type': 'biomes', 'id': 'root:Host-associate...",https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...


nice okay let's collect the rest of the records (the one other page)

In [5]:
import asyncio
study_meta = await glass.collect()
display(study_meta)

https://www.ebi.ac.uk/metagenomics/api/v1/biomes/?lineage=root%3AHost-associated%3APlants%3ARhizosphere&search=tomato&page_size=3


100%|██████████| 1/1 [00:00<00:00,  7.43it/s]


Unnamed: 0,type,id,links,samples-count,bioproject,accession,is-private,last-update,secondary-accession,centre-name,...,study-abstract,study-name,data-origination,analyses.links.related,publications.links.related,biomes.links.related,biomes.data,geocoordinates.links.related,samples.links.related,downloads.links.related
0,studies,MGYS00006231,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,114,PRJEB55060,MGYS00006231,False,2025-07-05T06:19:56,ERP139927,EMG,...,The Third Party Annotation (TPA) assembly was ...,EMG produced TPA metagenomics assembly of PRJN...,SUBMITTED,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,"[{'type': 'biomes', 'id': 'root:Host-associate...",https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...
1,studies,MGYS00006230,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,1,PRJEB55057,MGYS00006230,False,2023-06-15T13:46:24,ERP139923,EMG,...,The Third Party Annotation (TPA) assembly was ...,EMG produced TPA metagenomics assembly of PRJN...,SUBMITTED,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,"[{'type': 'biomes', 'id': 'root:Host-associate...",https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...
2,studies,MGYS00006205,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,12,PRJEB55224,MGYS00006205,False,2023-04-20T21:07:45,ERP140107,EMG,...,The Third Party Annotation (TPA) assembly was ...,EMG produced TPA metagenomics assembly of PRJN...,SUBMITTED,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,"[{'type': 'biomes', 'id': 'root:Host-associate...",https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...
0,studies,MGYS00006208,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,1,PRJEB55232,MGYS00006208,False,2023-04-20T18:58:52,ERP140115,EMG,...,The Third Party Annotation (TPA) assembly was ...,EMG produced TPA metagenomics assembly of PRJN...,SUBMITTED,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,"[{'type': 'biomes', 'id': 'root:Host-associate...",https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...
1,studies,MGYS00006204,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,12,PRJEB55219,MGYS00006204,False,2023-04-20T17:57:11,ERP140102,EMG,...,The Third Party Annotation (TPA) assembly was ...,EMG produced TPA metagenomics assembly of PRJN...,SUBMITTED,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,"[{'type': 'biomes', 'id': 'root:Host-associate...",https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...
2,studies,MGYS00001012,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,1,PRJNA258487,MGYS00001012,False,2019-11-07T16:58:06,SRP045621,nanjing agricultural university,...,Healthy and bacterial wilted tomato plants rhi...,Rhizosphere soil Metagenome,HARVESTED,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,"[{'type': 'biomes', 'id': 'root:Host-associate...",https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...,https://www.ebi.ac.uk/metagenomics/api/v1/stud...


## Getting the samples from the studies

We will use `Samplifier` which we can provide with a `presearch` which will automatically pass the resulted accessions and collect the associated samples for. 

But you don't need to do a presearch using mgnifier :) 

you can also pass known accessions/study_accessions as a kwarg to the samplifier.

In [6]:
from mgnipy.metadata import Samplifier

samplify = Samplifier(
    presearch=glass,
    page_size=5
)
    
print(samplify)

Mgnifier instance for MGnify samples metadata
----------------------------------------
Base URL: https://www.ebi.ac.uk/metagenomics/api/
API Version: v1
Parameters: {'page_size': 5}
Checkpoint Directory: None
Request URL: https://www.ebi.ac.uk/metagenomics/api/v1/samples/?page_size=5

Repeating params: study_accession: ['MGYS00006231', 'MGYS00006230', 'MGYS00006205', 'MGYS00006208', 'MGYS00006204', 'MGYS00001012']


next we plan or preview again before collecting sample data. here if we preview, it returns a dict of dfs, one for each repeating param

In [7]:
preview_dict = samplify.preview()

preview_dict['MGYS00006231']

Plan not yet checked. Running now...
Multiplanner with presearch conditions...
Planning for study_accession: MGYS00006231...
Planning the API call with params:
{'page_size': 5, 'study_accession': 'MGYS00006231'}
Acquiring meta for 5 samples per page...
Request URL: https://www.ebi.ac.uk/metagenomics/api/v1/samples/?page_size=5&study_accession=MGYS00006231
Response status code: 200
Total pages to retrieve: 23
Total records to retrieve: 114

Planning for study_accession: MGYS00006230...
Planning the API call with params:
{'page_size': 5, 'study_accession': 'MGYS00006230'}
Acquiring meta for 5 samples per page...
Request URL: https://www.ebi.ac.uk/metagenomics/api/v1/samples/?page_size=5&study_accession=MGYS00006230
Response status code: 200
Total pages to retrieve: 1
Total records to retrieve: 1

Planning for study_accession: MGYS00006205...
Planning the API call with params:
{'page_size': 5, 'study_accession': 'MGYS00006205'}
Acquiring meta for 5 samples per page...
Request URL: https:/

Unnamed: 0,type,id,links,longitude,biosample,latitude,sample-metadata,accession,analysis-completed,collection-date,...,sample-alias,host-tax-id,species,last-update,biome.data.type,biome.data.id,biome.links.related,studies.links.related,studies.data,runs.links.related
0,samples,SRS11333406,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114800,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333406,,,...,Moneymaker_4_metag,286530,,2025-07-05T06:19:56,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...
1,samples,SRS11333423,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114696,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333423,,,...,P207metag,286530,,2025-07-05T05:02:30,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...
2,samples,SRS11333439,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114757,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333439,,,...,P276-2metag,286530,,2025-07-05T03:10:58,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...
3,samples,SRS11333435,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114700,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333435,,,...,P211metag,286530,,2025-07-05T01:35:44,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...
4,samples,SRS11333459,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114775,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333459,,,...,P294metag,286530,,2025-07-05T00:32:51,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...


if the preview looks good and youw ant to proceed to collect all then dont provide specific study_accessions of the above to collect, otherwise do.

In [10]:
tomato_samples = await samplify.collect(study_accession=['MGYS00006231'])
# check out rseults
tomato_samples['MGYS00006231'].head(10)

  0%|          | 0/22 [00:00<?, ?it/s]

100%|██████████| 22/22 [00:10<00:00,  2.17it/s]
  acc: pd.concat([None if df.empty else df for df in dfs])


Unnamed: 0,type,id,links,longitude,biosample,latitude,sample-metadata,accession,analysis-completed,collection-date,...,sample-alias,host-tax-id,species,last-update,biome.data.type,biome.data.id,biome.links.related,studies.links.related,studies.data,runs.links.related
0,samples,SRS11333406,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114800,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333406,,,...,Moneymaker_4_metag,286530.0,,2025-07-05T06:19:56,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...
1,samples,SRS11333423,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114696,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333423,,,...,P207metag,286530.0,,2025-07-05T05:02:30,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...
2,samples,SRS11333439,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114757,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333439,,,...,P276-2metag,286530.0,,2025-07-05T03:10:58,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...
3,samples,SRS11333435,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114700,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333435,,,...,P211metag,286530.0,,2025-07-05T01:35:44,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...
4,samples,SRS11333459,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114775,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333459,,,...,P294metag,286530.0,,2025-07-05T00:32:51,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...
0,samples,SRS11333409,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114803,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333409,,,...,Pimpinellifolium_1_metag,286530.0,,2025-07-04T07:01:56,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...
1,samples,SRS11333413,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114806,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333413,,,...,Pimpinellifolium_4_metag,286530.0,,2025-07-04T05:30:44,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...
2,samples,SRS11333447,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114764,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333447,,,...,P283metag,286530.0,,2025-07-04T03:00:21,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...
3,samples,SRS11333504,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114787,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333504,,,...,P307metag,286530.0,,2025-07-04T01:30:01,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...
4,samples,SRS11333479,{'self': 'https://www.ebi.ac.uk/metagenomics/a...,5.67,SAMN24114735,51.98,"[{'key': 'geographic location (longitude)', 'v...",SRS11333479,,,...,P253metag,286530.0,,2025-07-04T00:05:02,biomes,root:Host-associated:Plants:Rhizosphere,https://www.ebi.ac.uk/metagenomics/api/v1/biom...,https://www.ebi.ac.uk/metagenomics/api/v1/samp...,"[{'type': 'studies', 'id': 'MGYS00006231', 'li...",https://www.ebi.ac.uk/metagenomics/api/v1/samp...
