# Showcase: `isogroup` Python package

## Targeted processing to annotate isotopic clusters
The aim is to annotate a dataset from an untargeted labeling experiment using an in-house database

In [None]:
from isocor.base import LabelledChemical
from isogroup.base.feature import Feature
from isogroup.base.sample import Sample
from isogroup.base.cluster import Cluster
from isogroup.base.database import Database
from isogroup.base.targeted_experiment import Experiment
import pandas as pd

## Instanciation of an *isotopic database*

**A database object** is build from a peaktable which contains the following information:
- "metabolite" : name of the metabolite in your database
- "rt" : retention time for this compound in your analytical method
- "formula" : formula of the metabolite
- "charge"

i) Give path to open your database file and run the cell

In [None]:
db_data = pd.read_csv("data\database_test.csv", sep=";")

# Displays the first lines of the database for inspection
db_data.head()

ii) Define the tracer of your experiment (e.g: "13C", "15N"...) \
iii) Run the cell to create your database. It returns all isotopologues (and masses) of the metabolite in your database.

In [None]:
database = Database(dataset=db_data, tracer="13C")

# Print the theoretical features
for feature in database.features:
    print(feature)

## Instanciation of the *targeted experiment*

### Open the dataset
The **dataset** with the experimental features must contain the following columns : 
- the identification of features (id)
- the mass-charge ratio (m/z)
- the retention time (rt)
- samples with intensities

i) Give path to open your dataset file (e.g: output file of XCMS, MZMine, ... ) \
ii) Run the cell

In [None]:
data = pd.read_csv("data\dataset_test.txt", sep="\t")
data = data.set_index(["mz", "rt", "id"])

# Displays the first lines of your dataset for inspection
data.head()

### Create the experiment

It proceeds in the following steps:

1. Initialization of experimental features from the dataset. A **feature** is defined from mass data as a set of (mz, rt, intensity) and is individual for each sample in the dataset
2. Annotation of experimental features using your database, within given tolerance (mz & rt).
3. Create clusters from annotated features. It returns clusters with supplementary information to specify if the cluster is complete or not (if all isotopologues are retrieved)
4. Export dataframe

i) Give your database, your dataset and the tracer of your experiment \
ii) Run the cell : it returns your experiment object 

In [None]:
experiment = Experiment(dataset=data, database=database, tracer="13C")

### Annotate the dataset 

i) Give the mz (in ppm) and rt tolerance (in seconds) you allowed \
ii) Run the cell. It returns experimental feature with potential annotations and exact mass and rt errors compared to your database. 

In [None]:
experiment.annotate_experiment(mz_tol=5, rt_tol=10)

Optional : Display the samples of your experiment

In [None]:
# Display the samples of your experiment
for sample in experiment.samples:
    print(sample)
print()

# Display the annotated features for each samples
#for sample, feature in experiment.samples.items():
#    print(sample, feature, end="\n\n")


### Get annotated clusters


A **cluster** is composed of a list of features\
The annotated clusters are obtained by grouping features according to their annotation

In [None]:
experiment.clusterize()

If you want to display the annotated cluster for a specif sample: \
change the name of the sample and run the cell below

In [None]:
# Display the annotated cluster for a specific sample
experiment.clusters['C12_TP_1']

print(experiment.clusters, end="\n\n")

### Create dataframe and export tables

#### For features export

Export a dataframe containing all the features of your dataset with potential annotation (metabolite & isotopologues) and the calculated errors (mz & rt)

In [None]:
df = experiment.export_features()

# Print the head of your dataframe for inspection
df.head()

If you want to export a tsv file : provide a path and a filename.\
Run the cell

In [None]:
# Export the dataframe
experiment.export_features("data/df_feature.tsv")

I you want to export a tsv file for a specific sample : provide a path, a filename and a sample name\
Run the cell

In [None]:
# Export the dataframe for a specific sample
experiment.export_features("data/df_feature_sample.tsv", sample_name = "C13_TP_1")

#### For clusters export

Export a dataframe containing all the annotated clusters build from annotated features.\
The dataframe contains supplementary information on the clusters like its status :
- ok if the cluster is complete
- incomplete if there is missing isotopologues
- .....

In [None]:
df_cluster = experiment.export_clusters()

# Print the head of your dataframe for inspection
df_cluster.head()

If you want to export a tsv file : provide a path and a filename.\
Run the cell

In [None]:
# Export the cluster summary
experiment.export_clusters(filename="data/df_cluster.tsv")

I you want to export a tsv file for a specific sample : provide a path, a filename and a sample name\
Run the cell

In [None]:
# Export the cluster summary for a specific sample
experiment.export_clusters("data/cluster_summary_sample.tsv", sample_name="C13_TP_1")

#### For clusters summary

If you want to export a summary of specificities for each cluster (i.e: id, name, features, isotopologues, status...)\
Give a path and a file name\
Run the cell

In [None]:
experiment.clusters_summary(filename="data/test_cluster_summary.tsv")