# METASPACE bulk reannotation

v1.1 - [Changelog](changelog_bulk_reannotation.md)

This notebook shows how to reannotate multiple METASPACE datasets against a new database.

### Setup

Before running this notebook, ensure that you have [set up your API key](https://metaspace2020.readthedocs.io/en/latest/content/examples/fetch-dataset-annotations.html#Connect-to-the-sm-server) for METASPACE!

In [1]:
from metaspace import SMInstance

In [2]:
sm = SMInstance()

If you want to reannotate all datasets within a project, you can also download the project's metadata as a CSV file:  

![](project_export.png)


...which you can then import into this notebook to get the dataset ids:

In [8]:
import pandas as pd

metadata = pd.read_csv("/Users/alberto-mac/Documents/DA_ESPORTARE/LOCAL_EMBL_FILES/scratch/projects/gastrosome_processing_full/spacem/spacem_datasets_paths_filtered.csv")
metadata

Unnamed: 0,datasetId,datasetName,condition,well,slide,MALDI_size,PreMaldi_res,PostMaldi_res,group,submitter,...,maldiMatrix,analyzer,resPower400,polarity,uploadDateTime,FDR@10%,database,opticalImage,metaspace_download_dir_path,path
0,2021-10-27_00h20m47s,2021-28-09_Gastrosome_Slide6Drugs_Well8_150x15...,Drugs,8,6,150,0.64,0.64,♡EMBL♡,Mohammed Shahraz,...,DHB,Orbitrap,98995,positive,2021-10-26T18:20:47.944000,132,SwissLipids - 2018-02-02,https://metaspace2020.eu/fs/raw_optical_images...,/Users/alberto-mac/EMBL_ATeam/projects/gastros...,/scratch/bailoni/projects/gastrosome_processin...
1,2021-10-27_00h05m07s,2021-28-09_Gastrosome_Slide5Feeding_Well3_150x...,Feeding,3,5,150,0.64,0.64,♡EMBL♡,Mohammed Shahraz,...,DHB,Orbitrap,98995,positive,2021-10-26T18:05:07.978000,107,SwissLipids - 2018-02-02,https://metaspace2020.eu/fs/raw_optical_images...,/Users/alberto-mac/EMBL_ATeam/projects/gastros...,/scratch/bailoni/projects/gastrosome_processin...
2,2021-10-27_23h59m41s,2021-28-09_Gastrosome_Slide1control_well7_100x...,Control,7,1,100,0.64,0.64,♡EMBL♡,Mohammed Shahraz,...,DHB,Orbitrap,98995,positive,2021-10-27T23:59:41.744511,94,CoreMetabolome - v3,No optical image,/Users/alberto-mac/EMBL_ATeam/projects/gastros...,/scratch/bailoni/projects/gastrosome_processin...
3,2021-10-27_23h59m25s,2021-28-09_Gastrosome_Slide1control_well8_100x...,Control,8,1,100,0.64,0.64,♡EMBL♡,Mohammed Shahraz,...,DHB,Orbitrap,98995,positive,2021-10-27T23:59:25.751249,113,SwissLipids - 2018-02-02,No optical image,/Users/alberto-mac/EMBL_ATeam/projects/gastros...,/scratch/bailoni/projects/gastrosome_processin...
4,2021-10-27_00h32m38s,2021-28-09_Gastrosome_Slide1control_well4_150x...,Control,4,1,150,0.64,0.64,♡EMBL♡,Mohammed Shahraz,...,DHB,Orbitrap,98995,positive,2021-10-27T00:32:39.557240,148,SwissLipids - 2018-02-02,No optical image,/Users/alberto-mac/EMBL_ATeam/projects/gastros...,/scratch/bailoni/projects/gastrosome_processin...
5,2021-10-27_00h20m58s,2021-28-09_Gastrosome_Slide6Drugs_Well4_150x15...,Drugs,4,6,150,0.64,0.64,♡EMBL♡,Mohammed Shahraz,...,DHB,Orbitrap,98995,positive,2021-10-27T00:20:59.427535,74,CoreMetabolome - v3,No optical image,/Users/alberto-mac/EMBL_ATeam/projects/gastros...,/scratch/bailoni/projects/gastrosome_processin...
6,2021-10-27_00h16m49s,2021-28-09_Gastrosome_Slide6Drugs_Well3_150x15...,Drugs,3,6,150,0.64,0.64,♡EMBL♡,Mohammed Shahraz,...,DHB,Orbitrap,98995,positive,2021-10-27T00:16:49.937781,34,CoreMetabolome - v3,No optical image,/Users/alberto-mac/EMBL_ATeam/projects/gastros...,/scratch/bailoni/projects/gastrosome_processin...
7,2021-10-27_00h09m40s,2021-28-09_Gastrosome_Slide5Feeding_Well8_150x...,Feeding,8,5,150,0.64,0.64,♡EMBL♡,Mohammed Shahraz,...,DHB,Orbitrap,98995,positive,2021-10-27T00:09:40.949112,93,SwissLipids - 2018-02-02,No optical image,/Users/alberto-mac/EMBL_ATeam/projects/gastros...,/scratch/bailoni/projects/gastrosome_processin...
8,2021-10-27_00h03m04s,2021-28-09_Gastrosome_Slide5Feeding_Well7_150x...,Feeding,7,5,150,0.64,0.64,♡EMBL♡,Mohammed Shahraz,...,DHB,Orbitrap,98995,positive,2021-10-27T00:03:04.917123,69,SwissLipids - 2018-02-02,No optical image,/Users/alberto-mac/EMBL_ATeam/projects/gastros...,/scratch/bailoni/projects/gastrosome_processin...
9,2021-10-26_23h23m07s,2021-28-09_Gastrosome_Slide1control_well3_100x...,Control,3,1,100,0.64,0.64,♡EMBL♡,Mohammed Shahraz,...,DHB,Orbitrap,98995,positive,2021-10-26T23:23:07.395537,65,SwissLipids - 2018-02-02,No optical image,/Users/alberto-mac/EMBL_ATeam/projects/gastros...,/scratch/bailoni/projects/gastrosome_processin...


In [9]:
# metadata_subset = metadata.loc[~metadata.index.isin([6]), :]
# Only select datasets that failed:
metadata = metadata.loc[metadata.index.isin([7, 4, 9, 5]), :]
# 7, 9

datasets = metadata.datasetId.to_list()
datasets

['2021-10-27_00h32m38s',
 '2021-10-27_00h20m58s',
 '2021-10-27_00h09m40s',
 '2021-10-26_23h23m07s']

### Selecting the database for reannotation

If you are unsure which ID corresponds to the database you want to reannotate against, you can determine it based on its name and version:

In [11]:
new_db_id = sm.database(name="Gastrosome_singlecell_intraions", version="v1").id
# new_db_id = sm.database(name="Gastrosome_singlecell_intraions_2", version="v1").id
new_db_id

579

<div class="alert alert-info"> 

**Note:** If this returns nothing this database/version does not exist!

</div>

Once you do have your database's ID, enter it here:

In [6]:
# new_db_id = 579 # (38 is CoreMetabolome v3)

<div class="alert alert-info"> 

**Note:** the dataset(s) will be reannotated against the new databases **in addition to the ones already annotated against.**

</div>

### Submitting datasets for reannotation

In [7]:
for ds_id in datasets:
    ds = sm.dataset(id=ds_id)
    print(ds.name)
    database_ids = [db["id"] for db in ds.database_details]
    database_ids
    if new_db_id not in database_ids:
        new_databases = database_ids + [new_db_id]
        print("Adding new db...")
        sm.update_dataset_dbs(ds.id, new_databases, ds.adducts)
    else:
        print("Dataset has already been annotated against this database!")

2021-28-09_Gastrosome_Slide5Feeding_Well8_150x150_a29ss25_DHBpos
Dataset has already been annotated against this database!


Once METASPACE has finished reannotion of your datasets, open up SpaceM again, load the reannotated dataset and move to the Dataset Reprocessing step, where you will now be able to select the new database.