<h1>In-silico fragmentation annotation with SIRIUS<h/1>

SIRIUS is an in-silico fragmentation tool written in Java, which can be found at http://bio.informatik.uni-jena.de/software/sirius/. 

In this notebook, we write a wrapper script to call sirius, passing it our MS1 and MS2 dataframes (before input to LDA). The results from this is another column ('annotation') in the dataframes, produced from running in-silico fragmentation using SIRIUS.

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import os
import sys
basedir = '../'
sys.path.append(basedir)

from IPython.display import display
import pandas as pd
import visualisation.sirius.sirius_wrapper as sir

<h2>Load ms1 and ms2 dataframes</h2>

In [2]:
ms1_filename = '../input/final/Beer_3_full1_5_2E5_pos_ms1.csv'
ms1 = pd.read_csv(ms1_filename, index_col=0)

In [3]:
ms2_filename = '../input/final/Beer_3_full1_5_2E5_pos_ms2.csv'
ms2 = pd.read_csv(ms2_filename, index_col=0)

<h2>Run sirius on the ms1 & ms2 dataframes</h2>

Annotation is performed in SIRIUS for each MS1 peak using its child MS2 patterns. We set SIRIUS not to use the isotopic pattern information (which is not available, since we don't know what the isotopic peaks to the MS1 peak are, although we can try to find them out). 

In [4]:
sirius_exec = '/home/joewandy/linux64/sirius' # path to the executable

In [7]:
# this will probably run 1000x times faster if we directly call the appropriate methods in the SIRIUS class,
# rather than the current way of create temp input file, which is then passed to SIRIUS through a subprocess 
annot_ms1, annot_ms2 = sir.annotate_sirius(ms1, ms2, sirius_exec, sirius_platform='orbitrap', verbose=False)

Annotating parent peakID=1, mass=70.0651651321, intensity=16431632.0, no. of fragments=2
Annotating parent peakID=4, mass=70.065190596, intensity=602081.6875, no. of fragments=1
Annotating parent peakID=6, mass=72.0807993876, intensity=1067314.75, no. of fragments=4
 - 4 fragments annotated
Annotating parent peakID=11, mass=72.0808183493, intensity=1025769.125, no. of fragments=2
 - 2 fragments annotated
Annotating parent peakID=14, mass=73.0647880231, intensity=925079.4375, no. of fragments=2
 - 2 fragments annotated
Annotating parent peakID=17, mass=76.0393161822, intensity=1047793.6875, no. of fragments=2
Annotating parent peakID=20, mass=76.0756988501, intensity=1636355.75, no. of fragments=3
 - 1 fragments annotated
Annotating parent peakID=26, mass=81.0335144645, intensity=766053.0, no. of fragments=2
 - 2 fragments annotated
Annotating parent peakID=29, mass=83.0602445276, intensity=705789.625, no. of fragments=3
 - 3 fragments annotated
Annotating parent peakID=33, mass=84.0443

The last line in the cell output above shows the total no. of MS1/MS2 peaks annotated:

> Total annotations MS1=1390/1588, MS2=10612/16221

<h2>Show the results</h2>

The 'annotation' columns in the dataframes below were produced from SIRIUS output. No idea how well this compares to results from e.g. MassBank or NIST. Also, SIRIUS returns multiple results (corresponding to different potential formulae/fragmentation trees) with different scores, but for now, we take the one with the highest score.

In [8]:
display(annot_ms1)

Unnamed: 0,peakID,MSnParentPeakID,msLevel,rt,mz,intensity,Sample,GroupPeakMSn,CollisionEnergy,annotation
1,1,0,1,578.503,70.065165,16431632.0000,1,0,0,
4,4,0,1,652.517,70.065191,602081.6875,1,0,0,
6,6,0,1,566.043,72.080799,1067314.7500,1,0,0,C4H9N
11,11,0,1,1210.110,72.080818,1025769.1250,1,0,0,C4H9N
14,14,0,1,468.470,73.064788,925079.4375,1,0,0,C4H8O
17,17,0,1,656.240,76.039316,1047793.6875,1,0,0,
20,20,0,1,1027.660,76.075699,1636355.7500,1,0,0,C3H9NO
26,26,0,1,562.308,81.033514,766053.0000,1,0,0,C5H4O
29,29,0,1,632.557,83.060245,705789.6250,1,0,0,C4H6N2
33,33,0,1,486.476,84.044375,2827970.7500,1,0,0,C4H5NO


In [9]:
display(annot_ms2)

Unnamed: 0,peakID,MSnParentPeakID,msLevel,rt,mz,intensity,Sample,GroupPeakMSn,CollisionEnergy,fragment_bin_id,loss_bin_id,annotation
2,2,1,2,578.503,53.002529,0.005552,1,0,0,53.00259,,
3,3,1,2,578.503,70.065056,1.000000,1,0,0,70.06514,,
5,5,4,2,652.517,70.065071,1.000000,1,0,0,70.06514,,
7,7,6,2,566.043,53.039211,0.050747,1,0,0,53.03893,,C4H4
8,8,6,2,566.043,55.054646,1.000000,1,0,0,55.05466,17.02617,C4H6
9,9,6,2,566.043,57.057762,0.391897,1,0,0,57.05774,,C3H6N
10,10,6,2,566.043,72.080719,0.679171,1,0,0,72.08070,,C4H9N
12,12,11,2,1210.110,55.054623,0.008443,1,0,0,55.05466,17.02617,C4H6
13,13,11,2,1210.110,72.080650,1.000000,1,0,0,72.08070,,C4H9N
15,15,14,2,468.470,55.054657,1.000000,1,0,0,55.05466,18.01013,C4H6


<h2>Save the files</h2>

In [None]:
ms1_filename = '../input/final/Beer_3_full1_5_2E5_pos_ms1_annotated.csv'
ms2_filename = '../input/final/Beer_3_full1_5_2E5_pos_ms2_annotated.csv'
annot_ms1.to_csv(ms1_filename)
annot_ms2.to_csv(ms2_filename)