# BioDendro Quick Start Pipeline

The BioDendro pipeline automates the process of binning and hierarchically clustering
MSMS spectra based on the presence of common ions.

The quick start example executes all parameters (default and/or user defined) in a single line.

The pipeline then allows in-line interrogation of the clusters as well as exported results for offline analysis.

A step-by-step analysis is available in the `longer-workflow.ipynb` notebook at the BioDendro repository.

In [None]:
# Load modules

import os
import plotly
import BioDendro
import copy

#### To get a list of possible parameters and defaults you can run `help(BioDendro.pipeline)`.

In [None]:
help(BioDendro.pipeline)

The main `pipeline` function runs the full pipeline (i.e. reading files, clustering, and plotting).
Minimum requirements are an MGF file and a component list.
Note that by default, the results will be saved to a folder in your current working directory using the name `results_<datetime>` where datetime will be the date and current time of day in `hhmmss` format.
This is to avoid overwriting data in multiple runs.

In [None]:
# Run the complete BioDendro pipeline

tree = BioDendro.pipeline("Fireflies_MSMS.mgf", "Fireflies_component_list.txt", clustering_method="braycurtis", scaling=True, filtering=True, eps=0.001, bin_threshold=0.004, height=1000)

#### The pipeline also returns a `Tree` object, which stores most of the results.

Scrutinising individual analytes or clusters is possible in-line below or exported results are in the newly created `results_<datetime>` folder in your BioDendro directory.

#### Find analytes and their clusters below

The results folder contains .csv and .png files of all clusters. This information can also be viewed in-line.

In [None]:
# return the cluster number for which the queried analyte belongs
tree.cluster_map["Ppyr_hemolymph_extract_533.238464355468_15.101331"]

In [None]:
# for visualising the ion table of your cluster of interest
tree.cluster_table(cluster=7)

In [None]:
# for plotting the histogram of your cluster of interest
%matplotlib inline
tree.cluster_hist(cluster=7)

#### If the dendogram cutoff level was unsuitable for your data, you can apply a new level and scrutinise the new clusters below.

In [None]:
# Show the number of clusters before adjustment
print("BEFORE: Cutoff:", tree.cutoff, "n clusters:", len(set(tree.clusters)))

# Re-set a new cutoff for clusters
new_tree=copy.deepcopy(tree)
new_tree.cut_tree(cutoff=0.8)

# Show number of clusters after adjustment
print("AFTER: Cutoff:", new_tree.cutoff, "n clusters:", len(set(new_tree.clusters)))

In [None]:
# Generate the out plots and tables of the new clusters.
# rename a new directory.
# To write out the new tree.
os.makedirs("results-cutoff-08", exist_ok=False)
new_tree.write_summaries(path="results-cutoff-08")
new_tree.plot(filename="results-cutoff-08/simple_dendrogram.html", width=900, height=1200);

#### View the original (iplot1) and new (iplot2) cutoffs below

In [None]:
# View the new dendrogram cutoff inline
# for visualising plot inline
plotly.offline.init_notebook_mode(connected=True) 
iplot1 = tree.plot(width=800, height=900)
plotly.offline.iplot(iplot1)

In [None]:
iplot2 = new_tree.plot(width=800, height=900)
plotly.offline.iplot(iplot2)

#### Scrutinise the new clusters for your analytes below

In [None]:
# return the cluster number for which the queried analyte belongs
new_tree.cluster_map["Ppyr_hemolymph_extract_533.238464355468_15.101331"]

In [None]:
# for visualising the ion table of your cluster of interest
new_tree.cluster_table(cluster=120)

In [None]:
# for plotting the histogram of your cluster of interest
%matplotlib inline
new_tree.cluster_hist(cluster=179)