SIMILE

SIMILE (Significant Interrelation of MS/MS Ions via Laplacian Embedding) is a Python library for interrelating fragmentation spectra with significance estimation and is robust to multiple differences in chemical structure. Nature Communications manuscript

New in V2:

Precursor-based neutral loss difference counts can be used in addition to the original MZ difference counts
Maximum weight matching is used instead of original monotonic alignment method with improved performance
Multiple matching in addition to original pairwise matching for fragment centric analyses
Multiple comparison statistics
MUCH faster mass delta counting and significance testing
Matching ions report summarizing all scores and mass deltas with metadata

Installation

Use the package manager conda to install environment-base.yml for minimum requirements. Alternatively, use environment.yml to run the example notebook.

conda env create -f environment-base.yml

Python dependencies

python3 (pinned to 3.7 currently due to non-SIMILE bugs)
numpy
scipy
pandas

Usage

import simile as sml

# Generate fragmentation similarity matrix
S, spec_ids = sml.similarity_matrix(mzs, pmzs=pmzs, tolerance=tolerance)

# Generate max weight matching for similarity matrix
M = sml.multiple_match(S, spec_ids)

# Generate pro/con comparison matrix such that 
# symmetric matches are 1 (pro) and
# asymmetric matches are -1 (con)
C = sml.sym_compare(M, spec_ids)

# Calculate significance of max weight matching between fragment ions
# for all combination of spectra
spec_scores, pval, null_dist = sml.z_test(S, M, C, spec_ids, return_dist=True, log_size=5)

# Report back mass deltas and scores for simile comparison
df = sml.matching_ions_report(S, M, C, mzs, pmzs)

Theory

At its core, SIMILE contributes two concepts to the analysis of tandem mass spectrometry:

1. A similarity measure between fragment ions based on the fragmentation process. This similarity measure is defined to satisfy two expected properties of the fragmentation process:

(a) Fragment ions are similar if the difference in mass between them is common.
(b) Fragment ions are similar if their ancesetor and descendent fragment ions are similar.

Property (a) is satisfied by constructing a transition matrix with row-normalized mass difference frequencies as transition probabilites. This corresponds to a "shortest path" distance between fragment ions.

Property (b) is satisfied by converting the transition matrix of (a) into the pseudo-inverse of its (normalized) laplacian. This corresponds to an "average commute time" distance between fragment ions with the following intuition: If instead of taking the shortest path between x and y we instead meander about according to the transition matrix, how long will it take to wander from x to y and back to x on average?

This notion of "average commute time" distance captures property (b) because if x and y are similar, then their parents and children are similar; and if fragment ions are similar, then the transition probability between them is high. In other words, the paths walked when meandering between x and y are enriched with the ancestors and descendents of x and y. Therefore, if x and y share no (or few) ancestors or descendents, then the time to meander between them is comparably longer than if they do.

2. A null distribution for spectral similarity which leverages intraspectral comparisons to add confidence to interspectral comparisons.

Using an outdated analogy for the fragmentation process, fragment ions are generated from "parent" ions and generate "child" ions. We can extend this analogy to include "sibling" ions by noting that siblings are more similar to eachother than to their parents or children.

By leveraging SIMILE's fragment ion similarity measure which conforms to this analogy, we can ask how likely it is that the fragment ions matched up between fragmentation spectra by SIMILE are siblings. Taking this line of reasoning to its natural conclusion yields a null distibution generated by permuting intra and inter spectral fragment similarity scores to yield p-values.

Current research in multiple comparison is exploring using the asymmetry of the SIMILE max weight matrix as an alternative way to generate a null distribution.

Contributing

Pull requests are welcome.

For major changes, please open an issue first to discuss what you would like to change.

License

Modified BSD

Acknowledgements

The development of SIMILE was made possible by:

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
example_data		example_data
README.md		README.md
SimileFig1Vert.png		SimileFig1Vert.png
__init__.py		__init__.py
environment-base.yml		environment-base.yml
environment.yml		environment.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
simile-example.ipynb		simile-example.ipynb
simile.py		simile.py

biorack/simile

Folders and files

Latest commit

History

Repository files navigation

SIMILE

New in V2:

Installation

Python dependencies

Usage

Theory

At its core, SIMILE contributes two concepts to the analysis of tandem mass spectrometry:

1. A similarity measure between fragment ions based on the fragmentation process. This similarity measure is defined to satisfy two expected properties of the fragmentation process:

2. A null distribution for spectral similarity which leverages intraspectral comparisons to add confidence to interspectral comparisons.

Contributing

License

Acknowledgements

About

Resources

Stars

Watchers

Forks

Languages