This notebook can be used to calculate NA Corrected intensities as well as fractional enrichment for LCMS data with given mass spec resolution. The example shows a dataset with H2 and N15 label:

 - N15H2_purine.xlsx - demo raw MS intensity file containing intensities for C5H4N4 simulated using considering C13 indistinguishable with H2 but resolved with N15

In [1]:
import pandas as pd
import numpy as np
import re

from corna.inputs import maven_parser as parser
import corna.constants as const
from corna.helpers import replace_negatives_in_column, merge_multiple_dfs
from corna.algorithms.nacorr_lcms import na_correction
from corna.postprocess import fractional_enrichment


Reading raw file and merging with sample metadata if present, in this example running without sample metadata

In [2]:
raw_df = pd.read_excel('N15H2_purine.xlsx')
sample_metadata = pd.DataFrame()

merged_df, iso_tracer_data, element_list = parser.read_maven_file(raw_df, sample_metadata)
merged_df.head()

Unnamed: 0,Name,Label,Formula,Sample,Intensity,Unlabeled Fragment
0,Cpd2,C12 PARENT,C5H4N4,Sample 1,0.2761,Cpd2
1,Cpd2,N15-label-1,C5H4N4,Sample 1,0.0033,Cpd2
2,Cpd2,N15-label-2,C5H4N4,Sample 1,0.0002,Cpd2
3,Cpd2,H2-label-1,C5H4N4,Sample 1,0.0168,Cpd2
4,Cpd2,H2N15-label-1-1,C5H4N4,Sample 1,0.6568,Cpd2


Dictionary containing natural abundance values for the common isotopes found in nature. It can be defined by the user or one can use the default values from the package. The format of the dictionary is as shown below: 

{E:[M0, M1, ..Mn]} where E is the element symbol and the natural abundance fraction is in the increasing order of masses. For example:

In [3]:
#user defined
na_dict={'C':[0.9889,0.0111],
           'H':[0.99985, 0.00015],
           'N':[0.9964,0.0036],
           'O':[0.9976,0.0004,0.002],
           'S':[0.950,0.0076,0.0424]}

In [4]:
na_corr_df, ele_corr_dict = na_correction(merged_df, iso_tracers=['H2', 'N15'], res_type='autodetect', na_dict=na_dict, autodetect=True, 
                                          res=24500,res_mw=200, instrument='orbitrap')
na_corr_df = replace_negatives_in_column(na_corr_df, const.NA_CORRECTED_WITH_ZERO, const.NA_CORRECTED_COL)
na_corr_df


The ppm requirement is at the boderline for {'H': 4, 'C': 5, 'N': 4}:C


Unnamed: 0,Name,Formula,Sample,NA Corrected,Intensity,Label,NA Corrected with zero
0,Cpd2,C5H4N4,Sample 1,0.2963673,0.2761,C12 PARENT,0.2963673
1,Cpd2,C5H4N4,Sample 1,-0.0007382025,0.0033,N15-label-1,0.0
2,Cpd2,C5H4N4,Sample 1,0.0001980653,0.0002,N15-label-2,0.0001980653
3,Cpd2,C5H4N4,Sample 1,-1.452678e-06,0.0,H2N15-label-0-3,0.0
4,Cpd2,C5H4N4,Sample 1,2.647378e-09,0.0,H2N15-label-0-4,2.647378e-09
5,Cpd2,C5H4N4,Sample 1,0.001222173,0.0168,H2-label-1,0.001222173
6,Cpd2,C5H4N4,Sample 1,0.7021516,0.6568,H2N15-label-1-1,0.7021516
7,Cpd2,C5H4N4,Sample 1,-0.0003498087,0.0068,H2N15-label-1-2,0.0
8,Cpd2,C5H4N4,Sample 1,-2.478126e-05,0.0,H2N15-label-1-3,0.0
9,Cpd2,C5H4N4,Sample 1,6.098626e-08,0.0,H2N15-label-1-4,6.098626e-08


Calculating fractional enrichments, merging all data into a single file and saving as 'auto_detect_dual_label_isotope_ppm7.csv'

In [5]:
frac_enr_df = fractional_enrichment(na_corr_df)
frac_enr_df

Unnamed: 0,Sample,Name,Label,Formula,Pool_total,Fractional enrichment
0,Sample 1,Cpd2,C12 PARENT,C5H4N4,1.001843,0.295822
1,Sample 1,Cpd2,N15-label-1,C5H4N4,1.001843,0.0
2,Sample 1,Cpd2,N15-label-2,C5H4N4,1.001843,0.0001977009
3,Sample 1,Cpd2,H2N15-label-0-3,C5H4N4,1.001843,0.0
4,Sample 1,Cpd2,H2N15-label-0-4,C5H4N4,1.001843,2.642507e-09
5,Sample 1,Cpd2,H2-label-1,C5H4N4,1.001843,0.001219924
6,Sample 1,Cpd2,H2N15-label-1-1,C5H4N4,1.001843,0.7008598
7,Sample 1,Cpd2,H2N15-label-1-2,C5H4N4,1.001843,0.0
8,Sample 1,Cpd2,H2N15-label-1-3,C5H4N4,1.001843,0.0
9,Sample 1,Cpd2,H2N15-label-1-4,C5H4N4,1.001843,6.087406e-08


In [6]:
output_df = merge_multiple_dfs([merged_df, na_corr_df, frac_enr_df])
output_df

Unnamed: 0,Name,Label,Formula,Sample,Intensity_x,Unlabeled Fragment,NA Corrected,Intensity_y,NA Corrected with zero,Pool_total,Fractional enrichment
0,Cpd2,C12 PARENT,C5H4N4,Sample 1,0.2761,Cpd2,0.296367,0.2761,0.296367,1.001843,0.295822
1,Cpd2,N15-label-1,C5H4N4,Sample 1,0.0033,Cpd2,-0.000738,0.0033,0.0,1.001843,0.0
2,Cpd2,N15-label-2,C5H4N4,Sample 1,0.0002,Cpd2,0.000198,0.0002,0.000198,1.001843,0.000198
3,Cpd2,H2-label-1,C5H4N4,Sample 1,0.0168,Cpd2,0.001222,0.0168,0.001222,1.001843,0.00122
4,Cpd2,H2N15-label-1-1,C5H4N4,Sample 1,0.6568,Cpd2,0.702152,0.6568,0.702152,1.001843,0.70086
5,Cpd2,H2N15-label-1-2,C5H4N4,Sample 1,0.0068,Cpd2,-0.00035,0.0068,0.0,1.001843,0.0
6,Cpd2,H2-label-2,C5H4N4,Sample 1,0.0002,Cpd2,-0.000238,0.0002,0.0,1.001843,0.0
7,Cpd2,H2N15-label-2-1,C5H4N4,Sample 1,0.0389,Cpd2,0.001874,0.0389,0.001874,1.001843,0.00187
8,Cpd2,H2N15-label-2-2,C5H4N4,Sample 1,0.0003,Cpd2,-0.00011,0.0003,0.0,1.001843,0.0
9,Cpd2,H2N15-label-3-1,C5H4N4,Sample 1,0.0006,Cpd2,-0.000366,0.0006,0.0,1.001843,0.0


In [7]:
output_df.to_csv('N15H2_purine_out.csv')