## **Ipthon Script to calculate NACorrection and Fractional Enrichment for a demo dataset.**

Prerequisite Knowledge:

**Natural Abundance Correction**- Natural abundance (NA) refers to the abundance of isotopes of a chemical element as naturally found on the planet. While performing analysis,the observed intensity contains contribution from isotopic natural abundance that needs to be corrected. This process is referred as NA Correction.

**Pool Total**- Sum of the intensities of all different number of labeled atoms of the isotope element is called pool total.

**Fractional enrichment**- Normalization of intensities of a metabolite between the range of 0 to 1.

**Welcome to the interactive Polly IPython Notebook.**

With this interactive Polly notebook you would be able to calculate NA Corrected intensities as well as fractional enrichment for LCMS/MS input file. Information on some functions used:

 - corna- package which looks into NA Correction.
 - msms.csv - demo raw_intensity file.
 - multiquant_parser.merge_mq_metadata - merge multiquant files and metadata files
 - multiquant_parser.add_mass_and_no_of_atoms_info_frm_label - from label column add information of molecular mass,    isotopic mass, total number of atoms, number of labeled atoms for parent as well as isotope fragment.
 - fractional_enrichment - Calculates fractional enrichment for the NA Corrected dataframe.

In [1]:
import pandas as pd
import numpy as np
import re

import corna.constants as const
from corna.helpers import get_isotope_na
from corna.inputs.column_conventions import multiquant 
from corna.inputs import multiquant_parser
from corna.postprocess import fractional_enrichment

**Defining the input files path and Natural Abundance values of elements.**

In [2]:
raw_df= pd.read_csv('raw_intensity_file_msms.csv')
metadata_df= pd.read_excel('metadata_msms.xlsx')
sample_metadata = None
isBackground= False
isotope_dict= const.ISOTOPE_NA_MASS
REQUIRED_COL= [multiquant.FORMULA, multiquant.LABEL, multiquant.NAME, multiquant.SAMPLE, multiquant.COHORT,
                     multiquant.MQ_FRAGMENT, multiquant.INTENSITY, multiquant.PARENT_FORM,const.NA_CORRECTED_COL]

**Merge the raw_intensity dataframe with the metadata and sample metadata(if background correction to be performed).**

In [3]:
msms_df, list_of_replicates = multiquant_parser.merge_mq_metadata(raw_df, metadata_df, sample_metadata)



**From Label column add information of molecular mass, isotopic mass, total number of atoms, number of labeled atoms for parent as well as isotope fragment.**

In [4]:
final_df, isotracer= multiquant_parser.add_mass_and_no_of_atoms_info_frm_label(msms_df)
print final_df
intensity_col= const.INTENSITY_COL

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


                              Sample                     Cohort Name  \
0     Filename_ABC.wiff (sample 426)       A. [13C-glc] Cohort1 0min   
1     Filename_ABC.wiff (sample 427)       B. [13C-glc] Cohort1 5min   
2     Filename_ABC.wiff (sample 428)      C. [13C-glc] Cohort1 15min   
3     Filename_ABC.wiff (sample 429)      D. [13C-glc] Cohort1 30min   
4     Filename_ABC.wiff (sample 430)      E. [13C-glc] Cohort1 60min   
5     Filename_ABC.wiff (sample 431)     F. [13C-glc] Cohort1 120min   
6     Filename_ABC.wiff (sample 432)     G. [13C-glc] Cohort1 240min   
7     Filename_ABC.wiff (sample 433)  H. [6,6-DD-glc] Cohort1 240min   
8     Filename_ABC.wiff (sample 434)       I. [13C-glc] Cohort2 0min   
9     Filename_ABC.wiff (sample 435)       J. [13C-glc] Cohort2 5min   
10    Filename_ABC.wiff (sample 436)      K. [13C-glc] Cohort2 15min   
11    Filename_ABC.wiff (sample 437)      L. [13C-glc] Cohort2 30min   
12    Filename_ABC.wiff (sample 438)      M. [13C-glc] Cohort2 6

**Get Natural abundance value of the isotracer present in the compound(na).**

In [5]:
final_df[const.NA_CORRECTED_COL]=0.0
output_df= pd.DataFrame()
metab_dict={}

na= get_isotope_na(isotracer[0], isotope_dict)

**PARENT_NUM_ ATOMS - Total number of atoms of the isotracer element in parent formula.**

**DAUGHTER_NUM_ATOMS - Total number of atoms of the isotracer element in daughter formula.**

**PARENT_NUM_LABELED_ATOMS - number of labeled atoms ofisotracer element in parent formula.**

**DAUGHTER_NUM_LABELED_ATOMS - number of labeled atoms ofisotracer element in daughter formula.**

In [6]:
final_df['A']=(1 + na * (final_df[const.PARENT_NUM_ATOMS]-final_df[const.PARENT_NUM_LABELED_ATOMS]))
final_df['B']= na * ((final_df[const.PARENT_NUM_ATOMS]-final_df[const.DAUGHTER_NUM_ATOMS]) -\
                         (final_df[const.PARENT_NUM_LABELED_ATOMS]-final_df[const.DAUGHTER_NUM_LABELED_ATOMS]-1))
final_df['C']=  na * (final_df[const.DAUGHTER_NUM_ATOMS]-final_df[const.DAUGHTER_NUM_LABELED_ATOMS]+1)

**Drop columns not required for processing.**

In [7]:
final_df.drop([const.PARENT_MASS_MOL, const.DAUGHTER_MASS_MOL, const.PARENT_NUM_ATOMS,
                const.DAUGHTER_NUM_ATOMS, const.DAUGHTER_NUM_LABELED_ATOMS, const.PARENT_NUM_LABELED_ATOMS], axis=1, inplace=True)

**Create metabolite : intensity dictionary of the form:**

        {'SAMPLE 2_10':{
            (191, 111): 2345.75, (192, 111):5644.847
            }
        }

In [8]:
for samp in final_df.Sample.unique():
    
    metab_df = final_df[final_df[multiquant.SAMPLE]==samp]
    frag_dict={}
    for index, row in metab_df.iterrows():
        frag_dict[(row[const.PARENT_MASS_ISO],row[const.DAUGHTER_MASS_ISO])]=row[intensity_col]
    
    metab_dict[samp]= frag_dict

**In each sample correct the intensities of daughter fragment one by one using the intensity of M+0 isotopolgue.**

In [9]:
for samp in final_df.Sample.unique():
    metab_df = final_df[final_df[multiquant.SAMPLE]==samp]
    frag= metab_dict[samp]  
     
    for index, row in metab_df.iterrows():
        m_n= row[const.DAUGHTER_MASS_ISO]
        m_1_n= row[const.PARENT_MASS_ISO]-1
        m_n_1= row[const.DAUGHTER_MASS_ISO]-1
        intensity_m_n= row[intensity_col]
        try:
            intensity_m_1_n= frag[m_1_n, m_n]
        except KeyError:
            intensity_m_1_n=0
        try:
            intensity_m_1_n_1= frag[m_1_n, m_n_1]
        except KeyError:
            intensity_m_1_n_1= 0
        
        corrected= intensity_m_n * row['A']  - intensity_m_1_n * row['B'] -\
                                                intensity_m_1_n_1 * row['C']
        metab_df.set_value(index=index, col=const.NA_CORRECTED_COL, value= corrected)

    output_df=output_df.append(metab_df) 

**Filter the output dataframe to extract the required columns.**

In [10]:
output_df= output_df.filter(REQUIRED_COL)
print output_df

      Formula            Label              Name  \
0      C5H3O3  C13_191.0_111.0   Citrate 191/111   
144    C5H3O3  C13_192.0_111.0   Citrate 191/111   
288    C5H3O3  C13_192.0_112.0   Citrate 191/111   
432    C5H3O3  C13_193.0_112.0   Citrate 191/111   
576    C5H3O3  C13_193.0_113.0   Citrate 191/111   
720    C5H3O3  C13_194.0_113.0   Citrate 191/111   
864    C5H3O3  C13_194.0_114.0   Citrate 191/111   
1008   C5H3O3  C13_195.0_114.0   Citrate 191/111   
1152   C5H3O3  C13_195.0_115.0   Citrate 191/111   
1296   C5H3O3  C13_196.0_115.0   Citrate 191/111   
1440   C5H3O3  C13_196.0_116.0   Citrate 191/111   
1584   C5H3O3  C13_197.0_116.0   Citrate 191/111   
1728    C4H3O   C13_191.0_67.0    Citrate 191/67   
1872    C4H3O   C13_192.0_67.0    Citrate 191/67   
2016    C4H3O   C13_192.0_68.0    Citrate 191/67   
2160    C4H3O   C13_193.0_67.0    Citrate 191/67   
2304    C4H3O   C13_193.0_68.0    Citrate 191/67   
2448    C4H3O   C13_193.0_69.0    Citrate 191/67   
2592    C4H3

**Calculate Fractional Enrichment**

In [11]:
fractional_enriched_df = fractional_enrichment(output_df)
print fractional_enriched_df

                              Sample              Name            Label  \
0     Filename_ABC.wiff (sample 426)   Citrate 191/111  C13_191.0_111.0   
1     Filename_ABC.wiff (sample 426)   Citrate 191/111  C13_192.0_111.0   
2     Filename_ABC.wiff (sample 426)   Citrate 191/111  C13_192.0_112.0   
3     Filename_ABC.wiff (sample 426)   Citrate 191/111  C13_193.0_112.0   
4     Filename_ABC.wiff (sample 426)   Citrate 191/111  C13_193.0_113.0   
5     Filename_ABC.wiff (sample 426)   Citrate 191/111  C13_194.0_113.0   
6     Filename_ABC.wiff (sample 426)   Citrate 191/111  C13_194.0_114.0   
7     Filename_ABC.wiff (sample 426)   Citrate 191/111  C13_195.0_114.0   
8     Filename_ABC.wiff (sample 426)   Citrate 191/111  C13_195.0_115.0   
9     Filename_ABC.wiff (sample 426)   Citrate 191/111  C13_196.0_115.0   
10    Filename_ABC.wiff (sample 426)   Citrate 191/111  C13_196.0_116.0   
11    Filename_ABC.wiff (sample 426)   Citrate 191/111  C13_197.0_116.0   
12    Filename_ABC.wiff (

In [12]:
df= pd.merge(output_df, fractional_enriched_df, on=['Label', 'Sample', 'Name', 'Formula'])
df

Unnamed: 0,Formula,Label,Name,Sample,Cohort Name,Component Name,Intensity,Parent_formula,NA Corrected,Pool_total,Fractional enrichment
0,C5H3O3,C13_191.0_111.0,Citrate 191/111,Filename_ABC.wiff (sample 426),A. [13C-glc] Cohort1 0min,Citrate 191/111,7.698357e+06,C6H7O7,8.211068e+06,8.281799e+06,0.991459
1,C5H3O3,C13_192.0_111.0,Citrate 191/111,Filename_ABC.wiff (sample 426),A. [13C-glc] Cohort1 0min,Citrate 192/111,1.038591e+05,C6H7O7,2.417146e+04,8.281799e+06,0.002919
2,C5H3O3,C13_192.0_112.0,Citrate 191/111,Filename_ABC.wiff (sample 426),A. [13C-glc] Cohort1 0min,Citrate 192/112,4.080285e+05,C6H7O7,3.415317e+03,8.281799e+06,0.000412
3,C5H3O3,C13_193.0_112.0,Citrate 191/111,Filename_ABC.wiff (sample 426),A. [13C-glc] Cohort1 0min,Citrate 193/112,5.970184e+03,C6H7O7,-4.058034e+03,8.281799e+06,0.000000
4,C5H3O3,C13_193.0_113.0,Citrate 191/111,Filename_ABC.wiff (sample 426),A. [13C-glc] Cohort1 0min,Citrate 193/113,5.817237e+04,C6H7O7,4.263876e+04,8.281799e+06,0.005148
5,C5H3O3,C13_194.0_113.0,Citrate 191/111,Filename_ABC.wiff (sample 426),A. [13C-glc] Cohort1 0min,Citrate 194/113,0.000000e+00,C6H7O7,-9.107895e+02,8.281799e+06,0.000000
6,C5H3O3,C13_194.0_114.0,Citrate 191/111,Filename_ABC.wiff (sample 426),A. [13C-glc] Cohort1 0min,Citrate 194/114,2.364000e+03,C6H7O7,5.055812e+02,8.281799e+06,0.000061
7,C5H3O3,C13_195.0_114.0,Citrate 191/111,Filename_ABC.wiff (sample 426),A. [13C-glc] Cohort1 0min,Citrate 195/114,0.000000e+00,C6H7O7,-2.624040e+01,8.281799e+06,0.000000
8,C5H3O3,C13_195.0_115.0,Citrate 191/111,Filename_ABC.wiff (sample 426),A. [13C-glc] Cohort1 0min,Citrate 195/115,0.000000e+00,C6H7O7,-5.248080e+01,8.281799e+06,0.000000
9,C5H3O3,C13_196.0_115.0,Citrate 191/111,Filename_ABC.wiff (sample 426),A. [13C-glc] Cohort1 0min,Citrate 196/115,0.000000e+00,C6H7O7,0.000000e+00,8.281799e+06,0.000000
