This notebook can be used to calculate NA Corrected intensities as well as fractional enrichment for LCMS data with ppm varying with molecular mass. The example shows a dataset with D2 label:

 - D_Sample_Input_Simple.xlsx - demo raw MS intensity file containing intensities for C10H17N3O6S simulated using combinatorics by considering C13 indistinguishable with O17 and N15 indistinguishable with S34

In [1]:
import pandas as pd
import numpy as np
import re

from corna.inputs import maven_parser as parser
import corna.constants as const
from corna.helpers import replace_negatives_in_column, merge_multiple_dfs
from corna.algorithms.nacorr_lcms import na_correction
from corna.postprocess import fractional_enrichment


Reading raw file and merging with sample metadata if present, in this example running without sample metadata

In [2]:
#raw_df = pd.read_excel('D_Sample_Input_Simple_one.xlsx')
raw_df = pd.read_excel('C_Sample_Input_Simple_three.xlsx')
sample_metadata = pd.DataFrame()

merged_df, iso_tracer_data, element_list = parser.read_maven_file(raw_df, sample_metadata)
merged_df.head()

Unnamed: 0,Name,Label,Formula,Sample,Intensity,Unlabeled Fragment
0,glucose-6-phosphate,C12 PARENT,C6H13O9P,A12_1,85751.28,glucose-6-phosphate
1,glucose-6-phosphate,C13-label-1,C6H13O9P,A12_1,12179.52,glucose-6-phosphate
2,glucose-6-phosphate,C13-label-2,C6H13O9P,A12_1,345720.09,glucose-6-phosphate
3,glucose-6-phosphate,C13-label-3,C6H13O9P,A12_1,12830.81,glucose-6-phosphate
4,glucose-6-phosphate,C13-label-4,C6H13O9P,A12_1,15879.57,glucose-6-phosphate


According to the formula from Su, Xiaoyang et al.,2017

\begin{equation*}
\frac{Δm}{m} = 1.66 × \frac{m^\frac{1}{2}}{(MinimalNominalResolution×√200)} × 10^6
\end{equation*}

Different vendors will have different formulas for the above, the function below converts resolution to ppm according to the formula above but it can be modified as per usage

Dictionary containing natural abundance values for the common isotopes found in nature. It can be defined by the user or one can use the default values from the package. The format of the dictionary is as shown below: 

{E:[M0, M1, ..Mn]} where E is the element symbol and the natural abundance fraction is in the increasing order of masses. For example:

In [3]:
#user defined
#from accucor

na_dict={'C':[0.9893, 0.0107],
           'H':[0.999885, 0.000115],
           'N':[0.99636, 0.00364],
           'O':[0.99757, 0.00038, 0.00205],
           'S':[0.9493, 0.00762, 0.0429]}

Performing na_correction and using the dictionary above for NA values. For Orbitrap, for molecular mass of 307 (which is our input compound) and 293808 resolution, ppm is ~7, according to the formula from Su, Xiaoyang et al.,2017

\begin{equation*}
\frac{Δm}{m} = 1.66 × \frac{m^\frac{1}{2}}{(MinimalNominalResolution×√200)} × 10^6
\end{equation*}

which is our ppm_user_input 

In [4]:
na_corr_df, ele_corr_dict = na_correction(merged_df, iso_tracers=['C13'], eleme_corr={},
                                          na_dict=na_dict, autodetect=True, res=250000, 
                                          res_mw=200, instrument='orbitrap')
print(ele_corr_dict)
na_corr_df = replace_negatives_in_column(na_corr_df, const.NA_CORRECTED_WITH_ZERO, const.NA_CORRECTED_COL)
na_corr_df

The ppm requirement is at the boderline for C6H14O12P2:H
{u'fructose-1-6-bisphosphate': {'O17': 3.0, 'H': 1.0, 'O18': 1.0}, u'glucose-6-phosphate': {'O17': 2.0, 'H': 0.0, 'O18': 0.0}, u'6-phospho-D-gluconate': {'O17': 2.0, 'H': 0.0, 'O18': 0.0}}


Unnamed: 0,Name,Formula,Sample,NA Corrected,Intensity,Label,NA Corrected with zero
0,6-phospho-D-gluconate,C6H13O10P,A12_1,312.513495,285.51,C12 PARENT,312.513495
1,6-phospho-D-gluconate,C6H13O10P,A12_1,585.738845,560.53,C13-label-1,585.738845
2,6-phospho-D-gluconate,C6H13O10P,A12_1,26465.791458,24736.69,C13-label-2,26465.791458
3,6-phospho-D-gluconate,C6H13O10P,A12_1,-581.263485,615.21,C13-label-3,0.0
4,6-phospho-D-gluconate,C6H13O10P,A12_1,587.444685,561.98,C13-label-4,587.444685
5,6-phospho-D-gluconate,C6H13O10P,A12_1,-14.716933,0.0,C13-label-5,0.0
6,6-phospho-D-gluconate,C6H13O10P,A12_1,0.095178,0.0,C13-label-6,0.095178
7,fructose-1-6-bisphosphate,C6H14O12P2,A12_1,9717.878725,8834.07,C12 PARENT,9717.878725
8,fructose-1-6-bisphosphate,C6H14O12P2,A12_1,9742.643205,9580.26,C13-label-1,9742.643205
9,fructose-1-6-bisphosphate,C6H14O12P2,A12_1,400430.740367,372706.12,C13-label-2,400430.740367


Calculating fractional enrichments, merging all data into a single file and saving as 'auto_detect_dual_label_isotope_ppm7.csv'

In [5]:
frac_enr_df = fractional_enrichment(na_corr_df)
frac_enr_df

Unnamed: 0,Sample,Name,Label,Formula,Pool_total,Fractional enrichment
0,A12_1,6-phospho-D-gluconate,C12 PARENT,C6H13O10P,27951.583661,0.011181
1,A12_1,6-phospho-D-gluconate,C13-label-1,C6H13O10P,27951.583661,0.020955
2,A12_1,6-phospho-D-gluconate,C13-label-2,C6H13O10P,27951.583661,0.946844
3,A12_1,6-phospho-D-gluconate,C13-label-3,C6H13O10P,27951.583661,0.0
4,A12_1,6-phospho-D-gluconate,C13-label-4,C6H13O10P,27951.583661,0.021017
5,A12_1,6-phospho-D-gluconate,C13-label-5,C6H13O10P,27951.583661,0.0
6,A12_1,6-phospho-D-gluconate,C13-label-6,C6H13O10P,27951.583661,3e-06
7,A12_1,fructose-1-6-bisphosphate,C12 PARENT,C6H14O12P2,445753.571644,0.021801
8,A12_1,fructose-1-6-bisphosphate,C13-label-1,C6H14O12P2,445753.571644,0.021857
9,A12_1,fructose-1-6-bisphosphate,C13-label-2,C6H14O12P2,445753.571644,0.898323


In [6]:
output_df = merge_multiple_dfs([merged_df, na_corr_df, frac_enr_df])
output_df

Unnamed: 0,Name,Label,Formula,Sample,Intensity_x,Unlabeled Fragment,NA Corrected,Intensity_y,NA Corrected with zero,Pool_total,Fractional enrichment
0,glucose-6-phosphate,C12 PARENT,C6H13O9P,A12_1,85751.28,glucose-6-phosphate,93633.540139,85751.28,93633.540139,485781.056187,0.192748
1,glucose-6-phosphate,C13-label-1,C6H13O9P,A12_1,12179.52,glucose-6-phosphate,6827.916611,12179.52,6827.916611,485781.056187,0.014056
2,glucose-6-phosphate,C13-label-2,C6H13O9P,A12_1,345720.09,glucose-6-phosphate,368893.318704,345720.09,368893.318704,485781.056187,0.759382
3,glucose-6-phosphate,C13-label-3,C6H13O9P,A12_1,12830.81,glucose-6-phosphate,-3486.466231,12830.81,0.0,485781.056187,0.0
4,glucose-6-phosphate,C13-label-4,C6H13O9P,A12_1,15879.57,glucose-6-phosphate,16423.686501,15879.57,16423.686501,485781.056187,0.033809
5,glucose-6-phosphate,C13-label-5,C6H13O9P,A12_1,0.0,glucose-6-phosphate,-408.324105,0.0,0.0,485781.056187,0.0
6,6-phospho-D-gluconate,C12 PARENT,C6H13O10P,A12_1,285.51,6-phospho-D-gluconate,312.513495,285.51,312.513495,27951.583661,0.011181
7,6-phospho-D-gluconate,C13-label-1,C6H13O10P,A12_1,560.53,6-phospho-D-gluconate,585.738845,560.53,585.738845,27951.583661,0.020955
8,6-phospho-D-gluconate,C13-label-2,C6H13O10P,A12_1,24736.69,6-phospho-D-gluconate,26465.791458,24736.69,26465.791458,27951.583661,0.946844
9,6-phospho-D-gluconate,C13-label-3,C6H13O10P,A12_1,615.21,6-phospho-D-gluconate,-581.263485,615.21,0.0,27951.583661,0.0


In [7]:
output_df.to_csv('C13_accucor_output.csv')