# Improving the chemical profiling of complex natural extracts by joint 13C NMR and LC-HRMS2 analysis and the querying of in silico generated chemical databases.

#### Julien Cordonnier,a,b  Simon Remy,b* Alexis Kotland,d Ritchy Leroy,b Pierre Darme,a,b, Benjamin Ber-taux,b Charlotte Sayagh,b Agathe Martinez,b Nicolas Borie,b Jane Hubert,d Dominique Aubert,a,c Isa-belle Villena,a,c Jean-Marc Nuzillard,b Jean-Hugues Renault b*

##### a University of Reims Champagne Ardenne, ESCAPE EA7510, 51097 Reims, France 
##### b University of Reims Champagne Ardenne, CNRS, ICMR 7312, 51097 Reims, France  
##### c University of Reims Champagne Ardenne, CRB National reference Centre on Toxoplasmosis, 51097 Reims, France
##### d NatExplore, 51140 Prouilly, France
##### *Correspondence should be addressed to S.R. (simon.remy@univ-reims.fr)


#             CATHEDRAL

## This script was developped for the cross-validation of annotations coming from different annotation tools (FBMN, NAP, SIRIUS, CaraMel), while the dereplication workflow.

#### This workflow has been developped by PhD student Julien Cordonnier and the CSN lab (ICMR - UMR CNRS 7312 - Université de Reims Champagne Ardenne, FRANCE).




Depending of which annotation tools you have used, some of the following steps may not be necessary.

Please check before starting:
  - that you run the following script in a RDKit environment. If not, take a look at https://www.rdkit.org/docs/Install.html#how-to-install-rdkit-with-conda 
  - all the necessary packages are installed in the environment (math, pandas, glob, stat, tqdm).

##### /!\ The only 4 paths you will have to replace are: /!\
 - the path to your recap file  **/!\ This file should be placed in the current directory /!\**
 - the path to your SIRIUS project folder (must end with /*) .
 - the path to your 13C NMR candidates file
 - the path to save the comparison results

##### These paths must be informed in the cell below. 


In [1]:
recap_file_path = './recap230106.csv'
sirus_project_path ='C:/Users/jcrdnr/Desktop/Experimental_Data_Larix_decidua/sirius_190922/*'
CNMR_candidates_path ='C:/Users/jcrdnr/Desktop/Experimental_Data_Larix_decidua/smile_23_nmr_iso.txt'
resume_file_path ='C:/Users/jcrdnr/Desktop/article_larix/230413/SI/Scripts/df_resume_confidence_230814.tsv'

In [2]:
# recap_file_path = input('Please, enter the path for the recap.csv file. ')
# sirus_project_path = input('Please, enter the path to your SIRIUS project. (must end with /*) ')
# CNMR_candidates_path = input('Please, enter the path for the 13C NMR candidates file. ')
# resume_file_path = input('Please, enter the path to save the comparisons resume. ')

## → First step, import the required packages

In [3]:
import math
import pandas as pd
import rdkit
import os
import glob
import sys
import stat
from tqdm import tqdm
from rdkit.Chem import AllChem as Chem

## → Create a blank dataframe that will resume the cross-validation results

In [4]:
Dict_sumup = {
    'Feature_SU':{},
    'm/z':{},
    'Rt_SU':{},
    'GNPS_SU':{},
    'NAP_SU':{},
    'MolNetEnhancer_SU':{},
    'SIRIUS_SU':{},
    'Not_Matching_Tool_Annotation_SU':{},
    'Is_NMR_Annotated_SU':{},
    '3rd_Tool_SU':{},
    'Molecular_Name_SU':{},
    'Confidence_Level':{}
                }

## /!\ Open the recap metadata file that merges the metadata coming from IIMN, FBMN, NAP, MolNetEnhancer /!\

## /!\ This file should be placed in the current directory /!\
Just replace the red name between quotation marks in the cell bellow.

Final *confidence score*  is set to  0 by default

In [5]:
df = pd.read_csv(recap_file_path, sep=';')
df['All_Tools_confidence'] = 0 

If you want to vizualize the dataframe, run the cell bellow

In [6]:
df.head()

Unnamed: 0,Adduct,AllGroups,Analog:Adduct,Analog:Compound_Name,Analog:Compound_Source,Analog:Data_Collector,Analog:GNPSLibraryURL,Analog:IIN Best Ion=Library Adduct,Analog:INCHI,Analog:Instrument,...,sum(precursor intensity),sum.precursor.intensity.,superclass,tags,TIC_Query,UniqueFileSources,Unnamed: 25,UpdateWorkflowName,PRED,All_Tools_confidence
0,,,,,,,,,,,...,4117.484887,,,,,,,,3,0
1,,,,,,,,,,,...,41457.6672,,,,,,,,3,0
2,,,,,,,,,,,...,5492.948748,,,,,,,,3,0
3,M+H,,,,,,,,,,...,21911.78675,,Phenylpropanoids and polyketides,,4141.0,,,UPDATE-SINGLE-ANNOTATED-BRONZE,1,0
4,,,,,,,,,,,...,1111.107542,,,,,,,,3,0


## 1. If your data contains GNPS annotations through the FBMN and/or IIMN, you should run the following cells.
This will select the corresponding features (PRED value = 1).

**/!\**  If your data contains NAP annotations, the present script is working with **maximum 3 MetFrag candidates** **/!\**

**/!\** Fusion and Consensus candidates will not be considered **/!\**

In [7]:
df1=df.loc[df['PRED']==1]
if len(df1['MetFragID'].tolist()) != 0:
    df1[['MetFragID1', 'MetFragID2', 'MetFragID3']] = df1['MetFragID'].str.split(',', expand=True)
    df1[['MetFragSMILES1', 'MetFragSMILES2', 'MetFragSMILES3']] = df1['MetFragSMILES'].str.split(',', expand=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1[['MetFragID1', 'MetFragID2', 'MetFragID3']] = df1['MetFragID'].str.split(',', expand=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1[['MetFragID1', 'MetFragID2', 'MetFragID3']] = df1['MetFragID'].str.split(',', expand=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1[['MetFragID

### 1.1 If your data contains NAP MetFrag annotations, please run the following cell to compare them to FBMN / IIMN annotations.

In [8]:
list_ID1=[]
for i in range(len(df1['shared name'].tolist())):
    
    
    if '.' in df1['Smiles'].tolist()[i]:
        smiles = df1['Smiles'].tolist()[i].split('.')[0]
    else:
        smiles = str(df1['Smiles'].tolist()[i])
#     print(smiles)
    mref=Chem.MolFromSmiles(smiles)
    iref=Chem.MolToInchi(mref)
    ikref=Chem.rdinchi.InchiToInchiKey(iref).split('-')[0]
    
    
    
    if df1['MetFragSMILES1'].tolist()[i] != None:
        try:

            m1=Chem.MolFromSmiles(df1['MetFragSMILES1'].tolist()[i])
            i1=Chem.MolToInchi(m1)
            ik1=Chem.rdinchi.InchiToInchiKey(i1).split('-')[0]
            if ik1 == ikref:
                LID1 = 'MetFrag_1_'+df1['MetFragID1'].tolist()[i]
                list_ID1.append((int(df1['shared name'].tolist()[i]),LID1, df1['MetFragSMILES1'].tolist()[i]))
        except:
            continue
        
            
    if df1['MetFragSMILES2'].tolist()[i] != None:   
        try:

            m2=Chem.MolFromSmiles(df1['MetFragSMILES2'].tolist()[i])
            i2=Chem.MolToInchi(m2)
            ik2=Chem.rdinchi.InchiToInchiKey(i2).split('-')[0]
            if ik2 == ikref:
                LID2= 'MetFrag_2_'+df1['MetFragID2'].tolist()[i]
                list_ID1.append((int(df1['shared name'].tolist()[i]), LID2, df1['MetFragSMILES2'].tolist()[i]))
            
        except:
            continue
        
        
    if df1['MetFragSMILES3'].tolist()[i] != None:   
        try:

            m3=Chem.MolFromSmiles(df1['MetFragSMILES3'].tolist()[i])
            i3=Chem.MolToInchi(m3)
            ik3=Chem.rdinchi.InchiToInchiKey(i3).split('-')[0]
            if ik3 == ikref:
                LID3 = 'MetFrag_3_'+df1['MetFragID3'].tolist()[i]
                list_ID1.append(int((df1['shared name'].tolist()[i]), LID3, df1['MetFragSMILES3'].tolist()[i]))
        except:
            continue
        






























































If you want to vizualize the comparison results, run the cell bellow. 

Results are presented as a tupple (FeatureID, MetFrag candidate, SMILES of the candidate). 

In [9]:
list_ID1

[(190, 'MetFrag_2_LTS0004651', 'O=c1c(O)c(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12'),
 (51,
  'MetFrag_1_LTS0066122',
  'Oc1cc(O)c2c(c1)O[C@H](c1ccc(O)c(O)c1)[C@H](O)[C@H]2c1c(O)cc(O)c2c1O[C@H](c1ccc(O)c(O)c1)[C@@H](O)C2'),
 (283, 'MetFrag_1_LTS0155822', 'O=c1c(O)c(-c2ccc(O)cc2)oc2cc(O)cc(O)c12'),
 (367, 'MetFrag_1_LTS0155822', 'O=c1c(O)c(-c2ccc(O)cc2)oc2cc(O)cc(O)c12'),
 (60,
  'MetFrag_1_LTS0265245',
  'Oc1cc(O)c2c(c1)O[C@H](c1ccc(O)c(O)c1)[C@H](O)C2'),
 (136,
  'MetFrag_1_LTS0066122',
  'Oc1cc(O)c2c(c1)O[C@H](c1ccc(O)c(O)c1)[C@H](O)[C@H]2c1c(O)cc(O)c2c1O[C@H](c1ccc(O)c(O)c1)[C@@H](O)C2'),
 (237,
  'MetFrag_1_LTS0186298',
  'CC1OC(Oc2c(-c3ccc(O)c(O)c3)oc3cc(O)cc(O)c3c2=O)C(O)C(O)C1O'),
 (220, 'MetFrag_2_LTS0004651', 'O=c1c(O)c(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12')]

### 1.2 If your data were annotated with SIRIUS (following the format from the 5.5.3 version), you would like to compare these annotations to FBMN / IIMN ones. 
Please run the following cells.

##### /!\ Open the corresponding SIRIUS project folder /!\

In [10]:
files = glob.glob(sirus_project_path)
list_folder = [file for file in files if '.' not in str(file)]
print('There are '+ str(len(list_folder)) + ' features in your SIRIUS project. ')

There are 1915 features in your SIRIUS project. 


In [11]:
list_sirius_inchikey=[]
for i in range(len(df1['shared name'].tolist())):
    

    
    if '.' in df1['Smiles'].tolist()[i]:
        smiles = df1['Smiles'].tolist()[i].split('.')[0]
    else:
        smiles = df1['Smiles'].tolist()[i]
    mref=Chem.MolFromSmiles(smiles)
    iref=Chem.MolToInchi(mref)
    ikref=Chem.rdinchi.InchiToInchiKey(iref).split('-')[0]

    
    

    filein =[x for x in list_folder if  str('_' + str(int(df1['shared name'].tolist()[i]))) == x[-len(str('_' + str(int(df1['shared name'].tolist()[i])))):]]
    
    if len(filein) != 0:
        
        os.chmod(filein[0], stat.S_IROTH)

        if os.path.exists(filein[0]+'/structure_candidates.tsv'):
           
            df_sirius_formula = pd.read_csv(filein[0]+'/structure_candidates.tsv', sep='\t')


            for z in range(len(df_sirius_formula['InChIkey2D'].tolist())):
                
                if df_sirius_formula['InChIkey2D'].tolist()[z] == ikref:
                    aa = str(df_sirius_formula['InChIkey2D'].tolist()[z]) + '_structure_' + str(df_sirius_formula['molecularFormula'].tolist()[z]) +  df_sirius_formula['adduct'].tolist()[z] 

                    list_sirius_inchikey.append((int(df1['shared name'].tolist()[i]),aa))





























If you want to vizualize the comparison results, run the cell bellow. 

Results are presented as a tupple (FeatureID, InChIKey of the matching candidate + chemical formula + adduct type). 

In [12]:
list_sirius_inchikey

[(343, 'REFJWTPEDVJJIY_structure_C15H10O7[M + H]+'),
 (190, 'REFJWTPEDVJJIY_structure_C15H10O7[M + H]+'),
 (422, 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+'),
 (51, 'XFZJEEAOWLFHDH_structure_C30H26O12[M + H]+'),
 (488, 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+'),
 (283, 'IYRMWMYZSQPJKC_structure_C15H10O6[M + H]+'),
 (367, 'IYRMWMYZSQPJKC_structure_C15H10O6[M + H]+'),
 (182, 'IKGXIBQEEMLURG_structure_C27H30O16[M + H]+'),
 (867, 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+'),
 (334, 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+'),
 (426, 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+'),
 (60, 'PFTAWBLQPZVEMU_structure_C15H14O6[M + H]+'),
 (136, 'XFZJEEAOWLFHDH_structure_C30H26O12[M + H]+'),
 (720, 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+'),
 (569, 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+'),
 (237, 'OXGUCUVFOIWWQJ_structure_C21H20O11[M + H]+'),
 (220, 'REFJWTPEDVJJIY_structure_C15H10O7[M + H]+'),
 (1366, 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+'),
 (280, 'OVSQVDMCBVZWGM_structure_C21H20O12[

### 1.3 If your data were annotated with FBMN / IIMN, NAP and SIRIUS (following the format from the 5.5.3 version), you would like to compare them all together. 
Please run the following cells.

#### 1.3.1 Which nodes are only GNPS_NAP or only GNPS_SIRIUS common ?

In [13]:
common=[z[0] for z in list_ID1 if z[0] in [x[0] for x in list_sirius_inchikey]] 
not_common_NAP_pro=[z[0] for z in list_ID1 if z[0] not in [x[0] for x in list_sirius_inchikey]]
not_common_SIRIUS_pro=[z[0] for z in list_sirius_inchikey if z[0] not in [x[0] for x in list_ID1]]   

print(str('Features that have only GNPS_NAP common annotation are: ' + str(not_common_NAP_pro)) + '\n' +  str('Features that have only GNPS_SIRIUS common annotation are: ' + 
str(not_common_SIRIUS_pro)))

Features that have only GNPS_NAP common annotation are: []
Features that have only GNPS_SIRIUS common annotation are: [343, 422, 488, 182, 867, 334, 426, 720, 569, 1366, 280, 239]


#### 1.3.2 Which nodes are GNPS_NAP_SIRIUS common ?
Results are presented as a tupple: (FeatureID, GNPS matched ref spectra, MetFrag candidate, InChiKey + chemical formula + adduct type)

In [14]:
common_sumup=[]
for a in common:
    gnps = df1['SpectrumID'].tolist()[df1['shared name'].tolist().index(a)]
    for x in list_ID1:
        if x[0] == a:
            b=x[1]
    for y in list_sirius_inchikey:
        if y[0] ==a:
            c = y[1]
    common_sumup.append((int(a),gnps,b,c))

common_sumup

[(190,
  'CCMSLIB00005739139',
  'MetFrag_2_LTS0004651',
  'REFJWTPEDVJJIY_structure_C15H10O7[M + H]+'),
 (51,
  'CCMSLIB00005742589',
  'MetFrag_1_LTS0066122',
  'XFZJEEAOWLFHDH_structure_C30H26O12[M + H]+'),
 (283,
  'CCMSLIB00005748053',
  'MetFrag_1_LTS0155822',
  'IYRMWMYZSQPJKC_structure_C15H10O6[M + H]+'),
 (367,
  'CCMSLIB00005749366',
  'MetFrag_1_LTS0155822',
  'IYRMWMYZSQPJKC_structure_C15H10O6[M + H]+'),
 (60,
  'CCMSLIB00005742701',
  'MetFrag_1_LTS0265245',
  'PFTAWBLQPZVEMU_structure_C15H14O6[M + H]+'),
 (136,
  'CCMSLIB00005742589',
  'MetFrag_1_LTS0066122',
  'XFZJEEAOWLFHDH_structure_C30H26O12[M + H]+'),
 (237,
  'CCMSLIB00000085852',
  'MetFrag_1_LTS0186298',
  'OXGUCUVFOIWWQJ_structure_C21H20O11[M + H]+'),
 (220,
  'CCMSLIB00005739139',
  'MetFrag_2_LTS0004651',
  'REFJWTPEDVJJIY_structure_C15H10O7[M + H]+')]

##### Write results in the resume file.

In [15]:
numerous=-1
for x in range(len(common_sumup)):
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(common_sumup[x][0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=common_sumup[x][1]
    Dict_sumup['NAP_SU'][numerous]=str(common_sumup[x][2].split('_')[2] + ' MF' +  str(common_sumup[x][2].split('_')[1]))
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
    Dict_sumup['SIRIUS_SU'][numerous]=common_sumup[x][3].split('_')[0] + ' ' + common_sumup[x][3].split('_')[2]
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]=None
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]=None
    Dict_sumup['3rd_Tool_SU'][numerous]=None
    Dict_sumup['Molecular_Name_SU'][numerous]=None
    Dict_sumup['Confidence_Level'][numerous]=str(9) 
    

##### Apply confidence score = 9 to all nodes that have GNPS_NAP_SIRIUS common annotation

In [16]:
for x in common_sumup:
    df['All_Tools_confidence'][df['shared name'].tolist().index(x[0])] = 9 

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['All_Tools_confidence'][df['shared name'].tolist().index(x[0])] = 9 ### au lieu de & 230814


#### 1.3.3 Get the informations for the features that only have GNPS_NAP common annotation (cf. 1.3.1)

In [17]:
GNPS_NAP_common_sumup=[]
for a in not_common_NAP_pro:
    gnps = df1['SpectrumID'].to_list()[df1['shared name'].to_list().index(a)]
    for x in list_ID1:
        if x[0] == a:
            b=x[1]
            c=x[2]

    GNPS_NAP_common_sumup.append((a,gnps,b,c))

if len(GNPS_NAP_common_sumup) == 0:
    print('There is no common annotations between GNPS and NAP workflows.')
else:
    print(f'There are {len(GNPS_NAP_common_sumup)} that only have common annotations between GNPS and NAP workflows.' + '\n')
    for x in GNPS_NAP_common_sumup:
        print(x)

There is no common annotations between GNPS and NAP workflows.


##### Write results in the resume file.

In [18]:
for x in range(len(GNPS_NAP_common_sumup)):
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(GNPS_NAP_common_sumup[x][0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=GNPS_NAP_common_sumup[x][1]
    Dict_sumup['NAP_SU'][numerous]=str(GNPS_NAP_common_sumup[x][2].split('_')[2] + ' MF' +  str(GNPS_NAP_common_sumup[x][2].split('_')[1]))
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
    Dict_sumup['SIRIUS_SU'][numerous]=None
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]='SIRIUS'
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]=None
    Dict_sumup['3rd_Tool_SU'][numerous]=None
    Dict_sumup['Molecular_Name_SU'][numerous]=None
    Dict_sumup['Confidence_Level'][numerous]=str(12) 
    

##### Apply confidence score = 10 to all nodes that only have GNPS_NAP common annotation

In [19]:
for x in GNPS_NAP_common_sumup:
    df['All_Tools_confidence'][df['shared name'].to_list().index(x[0])] = 12 

#### 1.3.4 Get the informations for the features that only have GNPS_SIRIUS common annotation (cf. 1.3.1)

In [20]:
GNPS_SIRIUS_common_sumup=[]
for a in not_common_SIRIUS_pro:
    gnps = df1['SpectrumID'].tolist()[df1['shared name'].tolist().index(a)]

    for y in list_sirius_inchikey:
        if y[0] ==a:
            c = y[1]
    GNPS_SIRIUS_common_sumup.append((a,gnps,c))
if len(GNPS_SIRIUS_common_sumup) == 0:
    print('There is no common annotations between GNPS and SIRIUS workflows.')
else:
    print(f'There are {len(GNPS_SIRIUS_common_sumup)} fetaures that only have common annotation between GNPS aand SIRIUS workflows' + '\n')
    for x in GNPS_SIRIUS_common_sumup:
        print(x)

There are 12 fetaures that only have common annotation between GNPS aand SIRIUS workflows

(343, 'CCMSLIB00005745873', 'REFJWTPEDVJJIY_structure_C15H10O7[M + H]+')
(422, 'CCMSLIB00006570644', 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+')
(488, 'CCMSLIB00006570644', 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+')
(182, 'CCMSLIB00000222082', 'IKGXIBQEEMLURG_structure_C27H30O16[M + H]+')
(867, 'CCMSLIB00006570661', 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+')
(334, 'CCMSLIB00006570644', 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+')
(426, 'CCMSLIB00006570644', 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+')
(720, 'CCMSLIB00006570644', 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+')
(569, 'CCMSLIB00006570644', 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+')
(1366, 'CCMSLIB00006570644', 'ZQHJXKYYELWEOK_structure_C20H28O3[M + H]+')
(280, 'CCMSLIB00005739276', 'OVSQVDMCBVZWGM_structure_C21H20O12[M + H]+')
(239, 'CCMSLIB00005749366', 'IYRMWMYZSQPJKC_structure_C15H10O6[M + H]+')


##### Write results in the resume file.

In [21]:
for x in range(len(GNPS_SIRIUS_common_sumup)):
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(GNPS_SIRIUS_common_sumup[x][0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=GNPS_SIRIUS_common_sumup[x][1]
    Dict_sumup['NAP_SU'][numerous]=None
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
    Dict_sumup['SIRIUS_SU'][numerous]=str(GNPS_SIRIUS_common_sumup[x][2].split('_')[0] + ' ' + GNPS_SIRIUS_common_sumup[x][2].split('_')[2])
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]='NAP'
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]=None
    Dict_sumup['3rd_Tool_SU'][numerous]=None
    Dict_sumup['Molecular_Name_SU'][numerous]=None
    Dict_sumup['Confidence_Level'][numerous]=str(10) 

##### Apply confidence score = 10 to all nodes that only have GNPS_SIRIUS common annotation

In [22]:
for x in GNPS_SIRIUS_common_sumup:
    df['All_Tools_confidence'][df['shared name'].tolist().index(x[0])] = 10 

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['All_Tools_confidence'][df['shared name'].tolist().index(x[0])] = 10 ### au lieu de 2 230814


#### 1.3.5 Get the informations for the features that only have NAP_SIRIUS common annotation 


In [23]:
all_common = []
for x in common:
    all_common.append(x)
for x in not_common_NAP_pro:
    all_common.append(x)
for x in not_common_SIRIUS_pro:
    all_common.append(x)
    
not_common=[x for x in df1['shared name'].tolist() if x not in all_common ]

In [24]:
list_ID2=[]
list_ID3=[]
list_ID4=[]
for x in not_common:  

    ik1,ik2,ik3='', '',''
    if df1['MetFragSMILES1'].tolist()[df1['shared name'].tolist().index(x)] != None:
        try:
             
            m1=Chem.MolFromSmiles(df1['MetFragSMILES1'].tolist()[df1['shared name'].tolist().index(x)])
            i1=Chem.MolToInchi(m1)
            ik1=Chem.rdinchi.InchiToInchiKey(i1).split('-')[0]
            

        except:
            continue
        
            
    if df1['MetFragSMILES2'].tolist()[df1['shared name'].tolist().index(x)]!= None:   
        try:
            
            m2=Chem.MolFromSmiles(df1['MetFragSMILES2'].tolist()[df1['shared name'].tolist().index(x)])
            i2=Chem.MolToInchi(m2)
            ik2=Chem.rdinchi.InchiToInchiKey(i2).split('-')[0]
            

            
        except:
            continue
        
        
    if df1['MetFragSMILES3'].tolist()[df1['shared name'].tolist().index(x)]!= None:   
        try:
             
            m3=Chem.MolFromSmiles(df1['MetFragSMILES3'].tolist()[df1['shared name'].tolist().index(x)])
            i3=Chem.MolToInchi(m3)
            ik3=Chem.rdinchi.InchiToInchiKey(i3).split('-')[0]
            

        except:
            continue
    list_ID3.append(x)
    list_ID2.append([ik1, ik2, ik3])
    
    LMF1 = df1['MetFragID1'].tolist()[df1['shared name'].tolist().index(x)]
    LMF2=df1['MetFragID2'].tolist()[df1['shared name'].tolist().index(x)]
    LMF3=df1['MetFragID3'].tolist()[df1['shared name'].tolist().index(x)]
    list_ID4.append([LMF1, LMF2, LMF3])





















In [25]:
NAP_SIRIUS_sumup=[]
list_sirius_inchikey3=[]
for ID in list_ID3: 

    filein =[x for x in list_folder if  str('_' + str(int(ID))) == x[-len(str('_' + str(int(ID)))):]]
    if len(filein)!=0:
        os.chmod(filein[0], stat.S_IROTH)

        if os.path.exists(filein[0]+'/structure_candidates.tsv'):
            df_sirius_formula = pd.read_csv(filein[0]+'/structure_candidates.tsv', sep='\t')

            aa=[]
            for z in range(len(df_sirius_formula['InChIkey2D'].tolist())):
                if df_sirius_formula['InChIkey2D'].tolist()[z] in list_ID2[list_ID3.index(ID)]:
                    aa1 = (str('MetFragSMILES_' + str(list_ID2[list_ID3.index(ID)].index(df_sirius_formula['InChIkey2D'].tolist()[z])+1) + '_' +  list_ID4[list_ID3.index(ID)][list_ID2[list_ID3.index(ID)].index(df_sirius_formula['InChIkey2D'].tolist()[z])] +' , '+ str(df_sirius_formula['InChIkey2D'].tolist()[z]) + '_structure_' + str(df_sirius_formula['molecularFormula'].tolist()[z]) + df_sirius_formula['adduct'].tolist()[z] ))

                    list_sirius_inchikey3.append(ID)
                    aa.append([aa1])
                else:
                    aa1=''

                
            if len(aa) !=0:
                NAP_SIRIUS_sumup.append((int(ID), aa))


if len(NAP_SIRIUS_sumup) == 0 :
        print('There is no feature that only have common annotations between NAP and SIRIUS.')
else:
    print(f'There are {len(NAP_SIRIUS_sumup)} features that only have NAP_SIRIUS common annotation, with a different FBMN, IIMN candidate.' + '\n')
    for x in NAP_SIRIUS_sumup:
        print(x)

There are 16 features that only have NAP_SIRIUS common annotation, with a different FBMN, IIMN candidate.

(1453, [['MetFragSMILES_2_LTS0199022 , MJYADMFNVYHSAP_structure_C30H48O4[M + H]+'], ['MetFragSMILES_1_LTS0198220 , IQWUFDBPSLWCGM_structure_C30H48O4[M + H]+']])
(666, [['MetFragSMILES_2_LTS0142664 , BYQLYGRDILHOFF_structure_C20H30O3[M + H]+'], ['MetFragSMILES_3_LTS0092318 , WUENWZUJMIZJPA_structure_C20H30O3[M + H]+'], ['MetFragSMILES_1_LTS0120852 , GDAJTKZIGQRGOD_structure_C20H30O3[M + H]+']])
(1373, [['MetFragSMILES_2_LTS0240388 , RGXBCSRGWBMBCF_structure_C30H46O4[M + H]+'], ['MetFragSMILES_1_LTS0160555 , UPHZQTDFAGIZQK_structure_C30H46O4[M + H]+'], ['MetFragSMILES_3_LTS0266507 , VGVAABHRCNAZRM_structure_C30H46O4[M + H]+']])
(750, [['MetFragSMILES_1_LTS0088733 , NGBRPGLXCQJIPU_structure_C20H26O2[M + H]+']])
(1268, [['MetFragSMILES_3_LTS0193403 , FMUNNDDBCLRMSL_structure_C30H50O2[M + H]+'], ['MetFragSMILES_2_LTS0008250 , FVWJYYTZTCVBKE_structure_C30H50O2[M + H]+'], ['MetFragSMILES

##### Write results in the resume file.

In [26]:
 for x in NAP_SIRIUS_sumup:

        list_MF_NAP_SIRIUS=[]
        list_SIRIUS_NAP_SIRIUS=[]
        for i in range(len(x[1])):
            list_MF_NAP_SIRIUS.append(str(x[1][i][0].split(' , ')[0].split('_')[2] + ' MF' + str(x[1][i][0].split(' , ')[0].split('_')[1])))      
            list_SIRIUS_NAP_SIRIUS.append(str(x[1][i][0].split(' , ')[1].split('_')[0] + ' ' +str(x[1][i][0].split(' , ')[1].split('_')[2])))


for x in range(len(NAP_SIRIUS_sumup)):
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(NAP_SIRIUS_sumup[x][0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=None
    

    list_MF_NAP_SIRIUS=[]
    list_SIRIUS_NAP_SIRIUS=[]
    for i in range(len(NAP_SIRIUS_sumup[x][1])):
        list_MF_NAP_SIRIUS.append(str(NAP_SIRIUS_sumup[x][1][i][0].split(' , ')[0].split('_')[2] + ' MF' + str(NAP_SIRIUS_sumup[x][1][i][0].split(' , ')[0].split('_')[1])))      
        list_SIRIUS_NAP_SIRIUS.append(str(NAP_SIRIUS_sumup[x][1][i][0].split(' , ')[1].split('_')[0] + ' ' +str(NAP_SIRIUS_sumup[x][1][i][0].split(' , ')[1].split('_')[2])))
        Dict_sumup['NAP_SU'][numerous]= list_MF_NAP_SIRIUS### 
        Dict_sumup['SIRIUS_SU'][numerous]= list_SIRIUS_NAP_SIRIUS ###
        
        
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
   
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]='GNPS'
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]=None
    Dict_sumup['3rd_Tool_SU'][numerous]=None
    Dict_sumup['Molecular_Name_SU'][numerous]=None
    Dict_sumup['Confidence_Level'][numerous]='11+' 

##### Apply confidence score = 11+ to all nodes that only have NAP_SIRIUS common annotation, with different FBMN, IIMN candidate

In [27]:
for x in NAP_SIRIUS_sumup:
    df['All_Tools_confidence'][df['shared name'].tolist().index(x[0])] = '11+' 

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['All_Tools_confidence'][df['shared name'].tolist().index(x[0])] = '11+' ### au lieu de 4 230814


#### 1.4 Compare the Mass annotations (FBMN / IIMN, NAP, SIRIUS) to the 13C NMR ones (CaraMel)

##### /!\ 13C NMR candidates SMILES must be deposited into a .txt file, one line per candidate (coumpound name and it SMILES), no header /!\


In [28]:
df_smiles=pd.read_csv(CNMR_candidates_path,sep=' ', header=None)
df_smiles['InchiKey']=None

#### 1.4.1 Get for each of the candidate its corresponding InChIKey 

In [29]:
for x in range(len(df_smiles[0])):
    m=Chem.MolFromSmiles(df_smiles[1].tolist()[x])
    i=Chem.MolToInchi(m)
    ik=Chem.rdinchi.InchiToInchiKey(i).split('-')[0]
    df_smiles['InchiKey'][x] =ik

df_smiles






Unnamed: 0,0,1,InchiKey
0,catechin,C1[C@@H]([C@H](OC2=CC(=CC(=C21)O)O)C3=CC(=C(C=...,PFTAWBLQPZVEMU
1,epicatechin,C1[C@H]([C@H](OC2=CC(=CC(=C21)O)O)C3=CC(=C(C=C...,PFTAWBLQPZVEMU
2,quercetin-3-rhamnoside,C[C@@]1([H])[C@@](O)([H])[C@](O)([H])[C@](O)([...,OXGUCUVFOIWWQJ
3,acetic_acid,CC(=O)O,QTBSBXVTEAMEQO
4,astringin,C1=CC(=C(C=C1/C=C/C2=CC(=CC(=C2)O[C@H]3[C@@H](...,PERPNFLGJXUDDW
5,piceatannol-3-o-glucoside,C1=CC(=C(C=C1/C=C/C2=CC(=CC(=C2)O)O)O[C@H]3[C@...,UMGCIIXWEFTPOC
6,glucosyl-frambinone,CC(=O)CCC1=CC=C(C=C1)OC2C(C(C(C(O2)CO)O)O)O,IDONYWHRKBUDOR
7,glycerol-monoacetate,CC(=O)OCC(CO)O,KMZHZAAOEWVPSE
8,lavandoside,COC1=C(C=CC(=C1)/C=C/C(=O)O)O[C@H]2[C@@H]([C@H...,IEMIRSXOYFWPFD
9,glucosyl-trans-paracoumaric-acid,C1=CC(=CC=C1/C=C/C(=O)O)O[C@H]2[C@@H]([C@H]([C...,LJFYQZQUAULRDF


#### 1.4.2 Compare the features that have GNPS_NAP_SIRIUS common annotations to 13C NMR annotations

In [30]:
def compare_to_nmr_annotation1(df_smiles, common_sumup):
    list_ID=[]
    if len(common_sumup) == 0:
        print('There was no corresponding feature for this comparison.')
    else:
        
        df_common_sumup =pd.DataFrame(common_sumup)
        df_common_sumup['inchikey']=None
        df_common_sumup['formula']=None
        for x in range(len(df_common_sumup[0])):
            df_common_sumup['inchikey'][x] = df_common_sumup[3][x].split('_')[0]
            df_common_sumup['formula'][x] = df_common_sumup[3][x].split('_')[2]

        for x in df_smiles['InchiKey'].tolist():
            if x in df_common_sumup['inchikey'].tolist():
                txt = df_common_sumup[0].loc[df_common_sumup['inchikey']==x].tolist()[0], df_common_sumup[1].loc[df_common_sumup['inchikey']==x].tolist()[0], str(df_common_sumup[2].loc[df_common_sumup['inchikey']==x].tolist()[0].split('_')[2] + '  MF' + str(df_common_sumup[2].loc[df_common_sumup['inchikey']==x].tolist()[0].split('_')[1]))  , x, df_common_sumup['formula'].loc[df_common_sumup['inchikey']==x].tolist()[0], df_smiles[0][df_smiles['InchiKey'].tolist().index(x)]

                list_ID.append((txt))
        if len(list_ID) == 0 :
            print('No GNPS_NAP_SIRIUS common annotation features  are confirmed by 13C NMR workflow.')
        else:
            print(f'There are {len(list_ID)} features GNPS_NAP_SIRIUS common annotation feature that are confirmed by 13C NMR workflow.' + '\n')
            for x in list_ID:
                print(x)
    return list_ID

In [31]:
xa=[]
xa = compare_to_nmr_annotation1(df_smiles, common_sumup)

There are 3 features GNPS_NAP_SIRIUS common annotation feature that are confirmed by 13C NMR workflow.

(60, 'CCMSLIB00005742701', 'LTS0265245  MF1', 'PFTAWBLQPZVEMU', 'C15H14O6[M + H]+', 'catechin')
(60, 'CCMSLIB00005742701', 'LTS0265245  MF1', 'PFTAWBLQPZVEMU', 'C15H14O6[M + H]+', 'catechin')
(237, 'CCMSLIB00000085852', 'LTS0186298  MF1', 'OXGUCUVFOIWWQJ', 'C21H20O11[M + H]+', 'quercetin-3-rhamnoside')


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_common_sumup['inchikey'][x] = df_common_sumup[3][x].split('_')[0]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_common_sumup['formula'][x] = df_common_sumup[3][x].split('_')[2]


#### 1.4.3 Compare the features that only have GNPS_NAP common annotations to 13C NMR annotations

In [32]:
def compare_to_nmr_annotation2a(df_smiles, common_sumup):
    list_ID=[]
    if len(common_sumup) == 0:
        print('There were no corresponding feature for this comparison.')
    else:
        df_common_sumup =pd.DataFrame(common_sumup)
        df_common_sumup['inchikey']=None
        for x in range(len(df_common_sumup[0])):
            m=Chem.MolFromSmiles(df_common_sumup[3].tolist()[x])
            i=Chem.MolToInchi(m)
            ik=Chem.rdinchi.InchiToInchiKey(i).split('-')[0]
            df_common_sumup['inchikey'][x] = ik

        for x in df_smiles['InchiKey'].tolist():
            if x in df_common_sumup['inchikey'].tolist():
                txt = df_common_sumup[0].loc[df_common_sumup['inchikey']==x].tolist()[0], df_common_sumup[1].loc[df_common_sumup['inchikey']==x].tolist()[0], str(df_common_sumup[2].loc[df_common_sumup['inchikey']==x].tolist()[0].split('_')[2] + '  MF' + str(df_common_sumup[2].loc[df_common_sumup['inchikey']==x].tolist()[0].split('_')[1])), df_smiles[0][df_smiles['InchiKey'].tolist().index(x)]

                list_ID.append((txt))

        if len(list_ID) == 0 :
            print('No GNPS_NAP common annotation features  are confirmed by 13C NMR workflow.')
        else:
            print(f'There are {len(list_ID)} features GNPS_NAP common annotation feature that are confirmed by 13C NMR workflow.' + '\n')
            for x in list_ID:
                print(x)
            
    return list_ID

In [33]:
xb1=[]
xb1 = compare_to_nmr_annotation2a(df_smiles, GNPS_NAP_common_sumup)

There were no corresponding feature for this comparison.


#### 1.4.4 Compare the features that only have GNPS_SIRIUS common annotations  to 13C NMR annotations

In [34]:
def compare_to_nmr_annotation2b(df_smiles, common_sumup):
    list_ID=[]
    if len(common_sumup) == 0:
        print('There were no corresponding feature for this comparison.')
    else:
        df_common_sumup =pd.DataFrame(common_sumup)
        df_common_sumup['inchikey']=None
        df_common_sumup['formula']=None
        for x in range(len(df_common_sumup[0])):
            df_common_sumup['inchikey'][x] = df_common_sumup[2][x].split('_')[0]
            df_common_sumup['formula'][x] = df_common_sumup[2][x].split('_')[2]

        for x in df_smiles['InchiKey'].tolist():
            if x in df_common_sumup['inchikey'].tolist():
                txt = df_common_sumup[0].loc[df_common_sumup['inchikey']==x].tolist()[0], df_common_sumup[1].loc[df_common_sumup['inchikey']==x].tolist()[0], x, df_common_sumup['formula'].loc[df_common_sumup['inchikey']==x].tolist()[0], df_smiles[0][df_smiles['InchiKey'].tolist().index(x)]

                list_ID.append((txt))
        if len(list_ID) == 0 :
            print('There is no matching annotation with 13C NMR.')
        else:
            print(f'There are {len(list_ID)} features GNPS_SIRIUS common annotation feature that are confirmed by 13C NMR workflow.' + '\n')
            for x in list_ID:
                print(x)
    return list_ID

In [35]:
xb2=[]
xb2 = compare_to_nmr_annotation2b(df_smiles, GNPS_SIRIUS_common_sumup)

There is no matching annotation with 13C NMR.


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_common_sumup['inchikey'][x] = df_common_sumup[2][x].split('_')[0]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_common_sumup['formula'][x] = df_common_sumup[2][x].split('_')[2]


#### 1.4.5 Compare the features that only have NAP_SIRIUS common annotations (but with different FBMN, IIMN candidate)  to 13C NMR annotations

In [36]:
def compare_to_nmr_annotation3a(df_smiles, common_sumup):
    list_ID=[]
    if len(common_sumup) == 0 :
        print('There is no matching annotation with 13C NMR.')
    else:
        df_common_sumup =pd.DataFrame(common_sumup)

        df_common_sumup['inchikey']=None
        df_common_sumup['formula']=None
        df_common_sumup['MF']=None

        for x in range(len(df_common_sumup[0])):
            list_key = []
            list_formula=[]
            list_MF=[]
            for i in range(len(df_common_sumup[1][x])):
                list_key.append(df_common_sumup[1][x][i][0].split(' , ')[1].split('_')[0])
                list_formula.append(df_common_sumup[1][x][i][0].split(' , ')[1].split('_')[2])
                list_MF.append(str(df_common_sumup[1][x][i][0].split(' , ')[0].split('_')[2] + ' MF' + str(df_common_sumup[1][x][i][0].split(' , ')[0].split('_')[1])) )

            df_common_sumup['inchikey'][x] = list_key
            df_common_sumup['formula'][x] = list_formula
            df_common_sumup['MF'][x] = list_MF

            for y in df_smiles['InchiKey'].tolist():
                if y in df_common_sumup['inchikey'][x]:
                    txt = df_common_sumup[0][x], df_common_sumup['MF'][x][df_common_sumup['inchikey'][x].index(y)], y,  df_common_sumup['formula'][x][df_common_sumup['inchikey'][x].index(y)], df_smiles[0][df_smiles['InchiKey'].tolist().index(y)]


                    list_ID.append((txt))
        if len(list_ID) == 0:
            print('There is no matching annotation with 13C NMR.')
        else:
            print(f'There are {len(list_ID)} features NAP_SIRIUS common annotation feature that are confirmed by 13C NMR workflow.' + '\n')
            for x in list_ID:
                print(x)
    return list_ID

In [37]:
xc1=[]
xc1 = compare_to_nmr_annotation3a(df_smiles, NAP_SIRIUS_sumup)

There are 1 features NAP_SIRIUS common annotation feature that are confirmed by 13C NMR workflow.

(1121, 'LTS0034017 MF2', 'NFWKVWVWBFBAOV', 'C20H28O2[M + H]+', 'dehydroabietic-acid')


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_common_sumup['inchikey'][x] = list_key
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_common_sumup['formula'][x] = list_formula
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_common_sumup['MF'][x] = list_MF


## 2. If your data do not contains GNPS annotations through the FBMN and/or IIMN, but at least one MetFrag candidate through NAP, you should run the following cells.
This will select the corresponding features (PRED value = 2).

**/!\**  The present script is working with **maximum 3 MetFrag candidates** **/!\**

**/!\** Fusion and Consensus candidates will not be considered **/!\**

In [38]:
df1=df.loc[df['PRED']==2]
df1[['MetFragID1', 'MetFragID2', 'MetFragID3']] = df1['MetFragID'].str.split(',', expand=True)
df1[['MetFragSMILES1', 'MetFragSMILES2', 'MetFragSMILES3']] = df1['MetFragSMILES'].str.split(',', expand=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1[['MetFragID1', 'MetFragID2', 'MetFragID3']] = df1['MetFragID'].str.split(',', expand=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1[['MetFragID1', 'MetFragID2', 'MetFragID3']] = df1['MetFragID'].str.split(',', expand=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1[['MetFragID

##### Get the list of untreated nodes

In [39]:
not_common=[x for x in df1['shared name'].tolist() if x not in all_common ]
not_common = [x for x in not_common if math.isnan(x) == False]
print(f'There are still {len(not_common)} features to treat now.')

There are still 150 features to treat now.


If you want to vizualize the features, run the cell bellow

In [40]:
for x in sorted(not_common):
    try: 
        print(f'Feature n {int(x)}')
    except:
        continue


Feature n 5
Feature n 30
Feature n 81
Feature n 102
Feature n 113
Feature n 124
Feature n 129
Feature n 147
Feature n 158
Feature n 165
Feature n 230
Feature n 276
Feature n 279
Feature n 284
Feature n 288
Feature n 318
Feature n 336
Feature n 420
Feature n 423
Feature n 434
Feature n 450
Feature n 485
Feature n 501
Feature n 520
Feature n 529
Feature n 536
Feature n 538
Feature n 541
Feature n 542
Feature n 543
Feature n 552
Feature n 555
Feature n 559
Feature n 561
Feature n 565
Feature n 571
Feature n 572
Feature n 586
Feature n 590
Feature n 592
Feature n 604
Feature n 612
Feature n 615
Feature n 616
Feature n 619
Feature n 624
Feature n 649
Feature n 655
Feature n 657
Feature n 665
Feature n 673
Feature n 674
Feature n 688
Feature n 690
Feature n 691
Feature n 706
Feature n 716
Feature n 717
Feature n 748
Feature n 754
Feature n 760
Feature n 787
Feature n 798
Feature n 816
Feature n 821
Feature n 828
Feature n 849
Feature n 871
Feature n 872
Feature n 873
Feature n 888
Feature n 

#### 2.1 Get the informations for the features that only have NAP_SIRIUS common annotation, without any GNPS annotation. 


In [41]:
list_ID2=[]
list_ID3=[]
list_ID4=[]
for x in not_common:  

    ik1,ik2,ik3='', '',''
    if df1['MetFragSMILES1'].tolist()[df1['shared name'].tolist().index(x)] != None:
        try:
             
            m1=Chem.MolFromSmiles(df1['MetFragSMILES1'].tolist()[df1['shared name'].tolist().index(x)])
            i1=Chem.MolToInchi(m1)
            ik1=Chem.rdinchi.InchiToInchiKey(i1).split('-')[0]
            

        except:
            continue
        
            
    if df1['MetFragSMILES2'].tolist()[df1['shared name'].tolist().index(x)]!= None:   
        try:
            
            m2=Chem.MolFromSmiles(df1['MetFragSMILES2'].tolist()[df1['shared name'].tolist().index(x)])
            i2=Chem.MolToInchi(m2)
            ik2=Chem.rdinchi.InchiToInchiKey(i2).split('-')[0]
            

            
        except:
            continue
        
        
    if df1['MetFragSMILES3'].tolist()[df1['shared name'].tolist().index(x)]!= None:   
        try:
             
            m3=Chem.MolFromSmiles(df1['MetFragSMILES3'].tolist()[df1['shared name'].tolist().index(x)])
            i3=Chem.MolToInchi(m3)
            ik3=Chem.rdinchi.InchiToInchiKey(i3).split('-')[0]
            

        except:
            continue
    list_ID3.append(x)
    list_ID2.append([ik1, ik2, ik3])
    
    LMF1 = df1['MetFragID1'].tolist()[df1['shared name'].tolist().index(x)]
    LMF2=df1['MetFragID2'].tolist()[df1['shared name'].tolist().index(x)]
    LMF3=df1['MetFragID3'].tolist()[df1['shared name'].tolist().index(x)]
    list_ID4.append([LMF1, LMF2, LMF3])






















































































































In [42]:
NAP_SIRIUS_sumup=[]
list_sirius_inchikey3=[]
if len(list_ID3) == 0:
    print('There is no feature for the comparison.')
else:
    for ID in list_ID3: 
        ID=int(ID)

        filein =[x for x in list_folder if  str('_' + str(ID)) == x[-len(str('_' + str(ID))):]]
        if len(filein)!=0:
            os.chmod(filein[0], stat.S_IROTH)

            if os.path.exists(filein[0]+'/structure_candidates.tsv'):
                df_sirius_formula = pd.read_csv(filein[0]+'/structure_candidates.tsv', sep='\t')

                aa=[]
                for z in range(len(df_sirius_formula['InChIkey2D'].tolist())):
                    if df_sirius_formula['InChIkey2D'].tolist()[z] in list_ID2[list_ID3.index(ID)]:
                        aa1 = (str('MetFragSMILES_' + str(list_ID2[list_ID3.index(ID)].index(df_sirius_formula['InChIkey2D'].tolist()[z])+1) + '_' +  list_ID4[list_ID3.index(ID)][list_ID2[list_ID3.index(ID)].index(df_sirius_formula['InChIkey2D'].tolist()[z])] +' , '+ str(df_sirius_formula['InChIkey2D'].tolist()[z]) + '_structure_' + str(df_sirius_formula['molecularFormula'].tolist()[z]) + df_sirius_formula['adduct'].tolist()[z] ))

                        list_sirius_inchikey3.append(ID)
                        aa.append([aa1])
                    else:
                        aa1=''


                if len(aa) !=0:
                    NAP_SIRIUS_sumup.append((ID, aa))

    if len(NAP_SIRIUS_sumup) == 0 :
        print('There is no common annotation between NAP and SIRIUS workflows for features without any GNPS candiddate.')
    else:
        print(f'There are {len(NAP_SIRIUS_sumup)} features without any GNPS candidate that have common annotation between NAP et SIRIUS workflow.' + '\n')
        for x in NAP_SIRIUS_sumup:
            print(x)

There are 109 features without any GNPS candidate that have common annotation between NAP et SIRIUS workflow.

(501, [['MetFragSMILES_1_LTS0013335 , NYEXXEJYGVAGEE_structure_C19H28O2[M + H]+'], ['MetFragSMILES_2_LTS0024768 , PIJPSWNOKIPSCP_structure_C19H28O2[M + H]+']])
(1111, [['MetFragSMILES_1_LTS0064099 , YCLCHPWRGSDZKL_structure_C20H28O[M + H]+'], ['MetFragSMILES_2_LTS0035059 , ISHVJVXYPLFKAL_structure_C20H28O[M + H]+']])
(1006, [['MetFragSMILES_1_LTS0254324 , MXPXAZNVQUWDFH_structure_C20H26O4[M + H]+'], ['MetFragSMILES_3_LTS0046848 , MXCOJKLBLFWFNI_structure_C20H26O4[M + H]+'], ['MetFragSMILES_2_LTS0002654 , PKAIECBWQZFYRP_structure_C20H26O4[M + H]+']])
(565, [['MetFragSMILES_1_LTS0171116 , SGCHZBKQDFNHSL_structure_C19H26O3[M + H]+']])
(619, [['MetFragSMILES_1_LTS0088733 , NGBRPGLXCQJIPU_structure_C20H26O2[M + H]+']])
(1272, [['MetFragSMILES_2_LTS0034017 , NFWKVWVWBFBAOV_structure_C20H28O2[M + H]+'], ['MetFragSMILES_3_LTS0151331 , PRZSMDYEVUSNJM_structure_C20H28O2[M + H]+'], ['Met

##### Get the detail result for common NAP_SIRIUS annotations features
##### (Feature ID, matching MetFrag position + LOTUS_ID, InChIKey + chemical formula + adduct type)

In [43]:
NAP_SIRIUS_sumup

[(501,
  [['MetFragSMILES_1_LTS0013335 , NYEXXEJYGVAGEE_structure_C19H28O2[M + H]+'],
   ['MetFragSMILES_2_LTS0024768 , PIJPSWNOKIPSCP_structure_C19H28O2[M + H]+']]),
 (1111,
  [['MetFragSMILES_1_LTS0064099 , YCLCHPWRGSDZKL_structure_C20H28O[M + H]+'],
   ['MetFragSMILES_2_LTS0035059 , ISHVJVXYPLFKAL_structure_C20H28O[M + H]+']]),
 (1006,
  [['MetFragSMILES_1_LTS0254324 , MXPXAZNVQUWDFH_structure_C20H26O4[M + H]+'],
   ['MetFragSMILES_3_LTS0046848 , MXCOJKLBLFWFNI_structure_C20H26O4[M + H]+'],
   ['MetFragSMILES_2_LTS0002654 , PKAIECBWQZFYRP_structure_C20H26O4[M + H]+']]),
 (565,
  [['MetFragSMILES_1_LTS0171116 , SGCHZBKQDFNHSL_structure_C19H26O3[M + H]+']]),
 (619,
  [['MetFragSMILES_1_LTS0088733 , NGBRPGLXCQJIPU_structure_C20H26O2[M + H]+']]),
 (1272,
  [['MetFragSMILES_2_LTS0034017 , NFWKVWVWBFBAOV_structure_C20H28O2[M + H]+'],
   ['MetFragSMILES_3_LTS0151331 , PRZSMDYEVUSNJM_structure_C20H28O2[M + H]+'],
   ['MetFragSMILES_1_LTS0057168 , IQHBZJPFGJKKJI_structure_C20H28O2[M + H]+']]

In [44]:
 for x in NAP_SIRIUS_sumup:
        print(x[0])
        list_MF_NAP_SIRIUS=[]
        list_SIRIUS_NAP_SIRIUS=[]
        for i in range(len(x[1])):
            list_MF_NAP_SIRIUS.append(str(x[1][i][0].split(' , ')[0].split('_')[2] + ' MF' + str(x[1][i][0].split(' , ')[0].split('_')[1])))      
            list_SIRIUS_NAP_SIRIUS.append(str(x[1][i][0].split(' , ')[1].split('_')[0] + ' ' +str(x[1][i][0].split(' , ')[1].split('_')[2])))
            print(list_MF_NAP_SIRIUS)

501
['LTS0013335 MF1']
['LTS0013335 MF1', 'LTS0024768 MF2']
1111
['LTS0064099 MF1']
['LTS0064099 MF1', 'LTS0035059 MF2']
1006
['LTS0254324 MF1']
['LTS0254324 MF1', 'LTS0046848 MF3']
['LTS0254324 MF1', 'LTS0046848 MF3', 'LTS0002654 MF2']
565
['LTS0171116 MF1']
619
['LTS0088733 MF1']
1272
['LTS0034017 MF2']
['LTS0034017 MF2', 'LTS0151331 MF3']
['LTS0034017 MF2', 'LTS0151331 MF3', 'LTS0057168 MF1']
873
['LTS0092634 MF1']
828
['LTS0210076 MF1']
1068
['LTS0201039 MF2']
['LTS0201039 MF2', 'LTS0049064 MF1']
['LTS0201039 MF2', 'LTS0049064 MF1', 'LTS0004634 MF3']
1080
['LTS0002592 MF1']
896
['LTS0184224 MF1']
['LTS0184224 MF1', 'LTS0185646 MF2']
['LTS0184224 MF1', 'LTS0185646 MF2', 'LTS0150156 MF3']
657
['LTS0108097 MF1']
787
['LTS0046650 MF3']
['LTS0046650 MF3', 'LTS0203405 MF1']
['LTS0046650 MF3', 'LTS0203405 MF1', 'LTS0243815 MF2']
590
['LTS0108097 MF1']
1454
['LTS0270354 MF1']
['LTS0270354 MF1', 'LTS0212919 MF2']
529
['LTS0194549 MF1']
552
['LTS0135808 MF2']
['LTS0135808 MF2', 'LTS0083162 M

##### Write results to the resume file

In [45]:
for x in range(len(NAP_SIRIUS_sumup)):
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(NAP_SIRIUS_sumup[x][0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=None
    

    list_MF_NAP_SIRIUS=[]
    list_SIRIUS_NAP_SIRIUS=[]
    for i in range(len(NAP_SIRIUS_sumup[x][1])):
        list_MF_NAP_SIRIUS.append(str(NAP_SIRIUS_sumup[x][1][i][0].split(' , ')[0].split('_')[2] + ' MF' + str(NAP_SIRIUS_sumup[x][1][i][0].split(' , ')[0].split('_')[1])))      
        list_SIRIUS_NAP_SIRIUS.append(str(NAP_SIRIUS_sumup[x][1][i][0].split(' , ')[1].split('_')[0] + ' ' +str(NAP_SIRIUS_sumup[x][1][i][0].split(' , ')[1].split('_')[2])))
        Dict_sumup['NAP_SU'][numerous]= list_MF_NAP_SIRIUS
        Dict_sumup['SIRIUS_SU'][numerous]= list_SIRIUS_NAP_SIRIUS 
        
        
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
   
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]='GNPS'
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]=None
    Dict_sumup['3rd_Tool_SU'][numerous]=None
    Dict_sumup['Molecular_Name_SU'][numerous]=None
    Dict_sumup['Confidence_Level'][numerous]='11' 

##### Apply confidence score = 11 to all nodes that only have NAP_SIRIUS common annotation, without any GNPS candidate

In [46]:
for x in NAP_SIRIUS_sumup:
    df['All_Tools_confidence'][df['shared name'].tolist().index(x[0])] = 11 

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['All_Tools_confidence'][df['shared name'].tolist().index(x[0])] = 11 ###au leiur de 4 230814


#### 2.2 Compare the features that only have NAP_SIRIUS common annotations, without any GNPS candidate,  to 13C NMR annotations

In [47]:
def compare_to_nmr_annotation3b(df_smiles, common_sumup):
    list_ID=[]
    if len(common_sumup) == 0:
        print('There is no features for the comparison.')
    else:
        df_common_sumup =pd.DataFrame(common_sumup)

        df_common_sumup['inchikey']=None
        df_common_sumup['formula']=None
        df_common_sumup['MF']=None

        for x in range(len(df_common_sumup[0])):
            list_key = []
            list_formula=[]
            list_MF=[]
            for i in range(len(df_common_sumup[1][x])):
                list_key.append(df_common_sumup[1][x][i][0].split(' , ')[1].split('_')[0])
                list_formula.append(df_common_sumup[1][x][i][0].split(' , ')[1].split('_')[2])
                list_MF.append(str(df_common_sumup[1][x][i][0].split(' , ')[0].split('_')[2] + ' MF' + str(df_common_sumup[1][x][i][0].split(' , ')[0].split('_')[1])) )

            df_common_sumup['inchikey'][x] = list_key
            df_common_sumup['formula'][x] = list_formula
            df_common_sumup['MF'][x] = list_MF

            for y in df_smiles['InchiKey'].tolist():
                if y in df_common_sumup['inchikey'][x]:
                    txt = df_common_sumup[0][x], df_common_sumup['MF'][x][df_common_sumup['inchikey'][x].index(y)], y,  df_common_sumup['formula'][x][df_common_sumup['inchikey'][x].index(y)], df_smiles[0][df_smiles['InchiKey'].tolist().index(y)]
#                     print(txt)

                    list_ID.append((txt))
        if len(list_ID) == 0 :
            print('There is no NAP_SIRIUS common annotations feature, without any GNPS candidate, that was confirmed by the 13C NMR workflow.' + '\n')
        else:
            print(f'There are {len(list_ID)} features without any GNPS candidate, but with common NAP_SIRIUS annotations, that were confirmed by the 13C NMR workflow.' + '\n')
            for x in list_ID:
                print(x)
    return list_ID

In [48]:
xc2=[]
xc2 = compare_to_nmr_annotation3b(df_smiles, NAP_SIRIUS_sumup)

There are 6 features without any GNPS candidate, but with common NAP_SIRIUS annotations, that were confirmed by the 13C NMR workflow.

(1272, 'LTS0034017 MF2', 'NFWKVWVWBFBAOV', 'C20H28O2[M + H]+', 'dehydroabietic-acid')
(336, 'LTS0269975 MF1', 'RNDNBGULZNCSNB', 'C30H22O10[M + H]+', 'larixinol')
(674, 'LTS0034017 MF2', 'NFWKVWVWBFBAOV', 'C20H28O2[M + H]+', 'dehydroabietic-acid')
(276, 'LTS0269975 MF1', 'RNDNBGULZNCSNB', 'C30H22O10[M + H]+', 'larixinol')
(1185, 'LTS0229171 MF2', 'MSWJSDLNPCSSNW', 'C20H26O3[M + H]+', '7-oxodehydroabietic-acid')
(849, 'LTS0034017 MF3', 'NFWKVWVWBFBAOV', 'C20H28O2[M + H]+', 'dehydroabietic-acid')


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_common_sumup['inchikey'][x] = list_key
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_common_sumup['formula'][x] = list_formula
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_common_sumup['MF'][x] = list_MF


## 3. Now, you would like to compare the remaing features that do not have a common candidate through at least 2 different Mass workflows, to the 13C NMR workflow. 

##### To select the corresponding features for the comparison, you must select those with a confidence score = 0

In [49]:
df_final = df.loc[df['All_Tools_confidence'] == 0] 

#### 3.1 Prepare the data for the comparison

In [50]:
df_final['Smiles_inchikey']=''

df_final['MetFragSMILES1_inchikey']=''
df_final['MetFragSMILES2_inchikey']=''
df_final['MetFragSMILES3_inchikey']=''

df_final['SIRIUS_inchikey']=''
df_final['SIRIUS_formula']=''



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['Smiles_inchikey']=''
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['MetFragSMILES1_inchikey']=''
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['MetFragSMILES2_inchikey']=''
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,co

In [51]:
df_final[['MetFragID1', 'MetFragID2', 'MetFragID3']] = df['MetFragID'].str.split(',', expand=True)
df_final[['MetFragSMILES1', 'MetFragSMILES2', 'MetFragSMILES3']] = df['MetFragSMILES'].str.split(',', expand=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final[['MetFragID1', 'MetFragID2', 'MetFragID3']] = df['MetFragID'].str.split(',', expand=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final[['MetFragID1', 'MetFragID2', 'MetFragID3']] = df['MetFragID'].str.split(',', expand=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_fina

#### 3.2 Get the corresponding InChIkey for the candidates coming from the corresponding annotation workflow (GNPS, NAP)

In [52]:
def draw_inchikey(df_final, cat):

    list_smiles=[]
    for x in range(len(df_final[cat].tolist())):
        if str(df_final[cat].tolist()[x]) != 'nan' and str(df_final[cat].tolist()[x]) != 'None':
            list_smiles.append(df_final['shared name'].tolist()[x])
    
    list_smiles= [x for x in list_smiles if math.isnan(x) == False]

    for x in list_smiles:

        ik=''

        m=Chem.MolFromSmiles(df_final[cat].loc[df_final['shared name'] == x].tolist()[0])
        i=Chem.MolToInchi(m)
        ik=Chem.rdinchi.InchiToInchiKey(i).split('-')[0]
        print(f'Feature n° {int(x)} : {ik}')
        df_final[str(cat + '_inchikey')].loc[df_final['shared name'] == x] = ik
        

#### 3.2.1 Get the InChIKey for GNPS workflow from the reaming features

In [53]:
draw_inchikey(df_final, 'Smiles')


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final[str(cat + '_inchikey')].loc[df_final['shared name'] == x] = ik













Feature n° 483 : UVKYPKNCQJIGKV
Feature n° 1243 : BMVRDJXDSMPZGO
Feature n° 629 : DAWSYIQAGQMLFS
Feature n° 1099 : JRNSSSJKIGAFCT
Feature n° 1117 : WKKBRRFSRMDTJB
Feature n° 326 : FQWLMRXWKZGLFI
Feature n° 1278 : VIKNJXKGJWUCNN
Feature n° 521 : VFLDPWHFBUODDF
Feature n° 1075 : HFGSQOYIOKBQOW
Feature n° 111 : KGGUASRIGLRPAX
Feature n° 1358 : MDZKJHQSJHYOHJ
Feature n° 1042 : ZQHJXKYYELWEOK
Feature n° 1401 : YSEVFKWFUGTGAQ
Feature n° 1195 : MUMGGOZAMZWBJJ
Feature n° 1012 : KLMZPLYXGZZBCX
Feature n° 805 : ZQHJXKYYELWEOK
Feature n° 685 : MUMGGOZAMZWBJJ
Feature n° 823 : JRNSSSJKIGAFCT
Feature n° 722 : RBQNDQOKFICJGL
Feature n° 391 : VIKNJXKGJWUCNN
Feature n° 882 : OKJCFMUGMSVJBG
Feature n° 1405 : OKJCFMUGMSVJBG
Feature n° 301 : BPGBDEHBHGXYDZ
Feature n° 414 : VIKNJXKGJWUCNN
Feature n° 642 : UVKZSORBKUEBAZ
Feature n° 557 : BLGXFZZNTVWLAY
Feature n° 908 : OKJCFMUGMSVJBG
Feature n° 537 : RBOXVHNMENFORY
Feature n° 440 : OKJCFMUGMSVJBG
Feature n° 1028 : MUMGGOZAMZWBJJ
Feature n° 895 : FXKCXGBBUBC

#### 3.2.2 Get the InChIKey for the NAP workflow (MetFrag candidate 1, 2 and 3) for the reaming features

In [54]:
draw_inchikey(df_final, 'MetFragSMILES1')

Feature n° 124 : PERPNFLGJXUDDW
Feature n° 538 : NWPUHDAIOGMKFI
Feature n° 624 : KVQQCXYORPHUQU
Feature n° 420 : AXKQOCLPWRXCRI
Feature n° 973 : PGZCJOPTDHWYES
Feature n° 913 : ABGXDYHSMIYRIC
Feature n° 983 : UGAGPNKCDRTDHP
Feature n° 113 : PUGXDKPZBZICDX
Feature n° 1309 : FKCPLBHSZGVMNG
Feature n° 1043 : FOARYHMYWPXOBW
Feature n° 450 : KVQQCXYORPHUQU
Feature n° 1117 : GDAJTKZIGQRGOD
Feature n° 521 : MVIYWFBLVAFZID
Feature n° 559 : VDPJWHMYWDZZGX
Feature n° 673 : VBEKTMIFJPKWJA
Feature n° 940 : OJSKJQFODPKTBT
Feature n° 102 : MOJZMWJRUKIQGL
Feature n° 571 : QBAITYMIZWFOLG
Feature n° 520 : VETWBGGPKLAQQE
Feature n° 1183 : QWIXXDJUHXOKPB
Feature n° 1031 : FZMONKRUPXQHKO
Feature n° 1029 : UKOKENMLXFNPEJ
Feature n° 1041 : LNWOKEZJIRLIDO
Feature n° 30 : MOJZMWJRUKIQGL
Feature n° 1646 : KUMASSYJGMPIHK
Feature n° 872 : UGAGPNKCDRTDHP
Feature n° 816 : AXKQOCLPWRXCRI
Feature n° 1772 : KUMASSYJGMPIHK
Feature n° 1116 : VZCCETWTMQHEPK
Feature n° 592 : WSSLVRHDEOVNKI
Feature n° 434 : FKCPLBHSZGVMNG

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final[str(cat + '_inchikey')].loc[df_final['shared name'] == x] = ik










Feature n° 798 : UQDRWSKYMXNNHX
Feature n° 942 : IQHBZJPFGJKKJI
Feature n° 81 : MOJZMWJRUKIQGL
Feature n° 655 : ZJGMUVOCFLKRTL
Feature n° 1645 : RHCAOVDBFCREAC







In [55]:
draw_inchikey(df_final, 'MetFragSMILES2')

Feature n° 538 : OWUWKGDMPZLWFL
Feature n° 624 : ZPCDXXMKLPRLRL
Feature n° 420 : MSWJSDLNPCSSNW
Feature n° 973 : YWNVUSYLDSLXLI
Feature n° 913 : DETZLZBJHDSRCR
Feature n° 113 : PESNXVJICRFESF
Feature n° 1309 : PORHOKHIMOFMMH
Feature n° 450 : ZPCDXXMKLPRLRL
Feature n° 1117 : FTCCXIDYXDLLRK
Feature n° 521 : PKORXOLYTWDULG
Feature n° 673 : REMAFSUYXZQVFM
Feature n° 940 : YZXBAPSDXZZRGB
Feature n° 571 : IGUDTNVZIOWVIV
Feature n° 520 : FBZSMLWLLPEEKP
Feature n° 1183 : PMKRDHYIJLQVRT
Feature n° 1031 : JXHQWTYFUSHCGX
Feature n° 1041 : QWIXXDJUHXOKPB
Feature n° 816 : MSWJSDLNPCSSNW
Feature n° 1116 : DTOSIQBPPRVQHS
Feature n° 592 : IIWNDLDEVPJIBT
Feature n° 434 : PORHOKHIMOFMMH
Feature n° 894 : PORHOKHIMOFMMH
Feature n° 615 : ALGYTGOYQATWBA
Feature n° 706 : REMAFSUYXZQVFM
Feature n° 871 : SECPZKHBENQXJG
Feature n° 485 : FBZSMLWLLPEEKP
Feature n° 942 : NFWKVWVWBFBAOV
Feature n° 655 : QGBVBYUDRGXELK
Feature n° 1645 : JLQCIOOODNXJEK


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final[str(cat + '_inchikey')].loc[df_final['shared name'] == x] = ik












In [56]:
draw_inchikey(df_final, 'MetFragSMILES3')

Feature n° 538 : TVDMUSYVWJLIDK
Feature n° 420 : ISECDNAMJMNAHZ
Feature n° 973 : ZHMKECHJAPXWCT
Feature n° 913 : CQTHQCHQGAZGNF
Feature n° 1309 : MLBYBBUZURKHAW
Feature n° 1117 : WUENWZUJMIZJPA
Feature n° 521 : IXORZMNAPKEEDV
Feature n° 673 : BGKHCLZFGPIKKU
Feature n° 940 : BTAURFWABMSODR
Feature n° 571 : QMTZBWPBIQZLBJ
Feature n° 1183 : LNWOKEZJIRLIDO
Feature n° 1041 : PMKRDHYIJLQVRT
Feature n° 816 : ISECDNAMJMNAHZ
Feature n° 1116 : HXQHFNIKBKZGRP
Feature n° 434 : RWWVEQKPFPXLGL
Feature n° 894 : AYZMGUHJPVGYEB
Feature n° 615 : CUXCAMYFFUWHQV
Feature n° 706 : BGKHCLZFGPIKKU
Feature n° 942 : PRZSMDYEVUSNJM
Feature n° 655 : DEMNMQDWPCIOLA
Feature n° 1645 : ZXOSMGNNWJODHY


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final[str(cat + '_inchikey')].loc[df_final['shared name'] == x] = ik






#### 3.3 Compare the annotation of the corresponding tool (GNPS, NAP) for the remaing features to the ones coming from 13C NMR workflow 

In [57]:
def compare_to_nmr_annotation4a(df_smiles, df_common_sumup, cat, pre, suf):
    list_ID1=[]
    if len(df_common_sumup[cat].tolist()) == 0 :
        print('There is no left annotations to compare.')
    else:
        for x in range(len(df_common_sumup[cat].tolist())):

            for y in df_smiles['InchiKey'].tolist():
                if y in df_common_sumup[cat].tolist()[x]:
                    txt = int(df_common_sumup['shared name'].tolist()[x]), str(df_common_sumup[pre].tolist()[x] + ' ' + suf), df_smiles[0].tolist()[df_smiles['InchiKey'].tolist().index(y)]

                    list_ID1.append((txt))
        if len(list_ID1) == 0:
            print(f'There is no remaining candidate from {cat} matching with the 13C NMR workflow.')
        else:
            print(f'There are {len(list_ID1)} remaining candidates from {cat} that match with the 13C NMR workflow.' + '\n')
            for x in list_ID1:
                print(x)
    return list_ID1

#### 3.3.1 Compare the annotations from GNPS workflow for the remaing features to the 13C NMR ones

In [58]:
x0 = compare_to_nmr_annotation4a(df_smiles, df_final, 'Smiles_inchikey','SpectrumID', 'GNPS')

There is no remaining candidate from Smiles_inchikey matching with the 13C NMR workflow.


##### Write results in the resume file

In [59]:
for x in x0:
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(x[0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=x[1]
    Dict_sumup['NAP_SU'][numerous]=None
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
    Dict_sumup['SIRIUS_SU'][numerous]=None
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]='NAP, SIRIUS'
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]= 1
    Dict_sumup['3rd_Tool_SU'][numerous]=None
    Dict_sumup['Molecular_Name_SU'][numerous]=x[-1]
    Dict_sumup['Confidence_Level'][numerous]=str(5) 

##### Apply confidence score = 5 to nodes with the GNPS remaining candidates that match with the 13C NMR annotations

In [60]:
for x in x0:
    df['All_Tools_confidence'][df['shared name'].to_list().index(x[0][0])] = 5 

#### 3.3.2 Compare the remaning annotations from NAP (MetFrag candidates 1, 2 and 3) to the 13C NMR workflow

#### 3.3.2.1 Compare the annotations from NAP - MetFrag candidate 1, for the remaing features, to the 13C NMR workflow

In [61]:
x1 =compare_to_nmr_annotation4a(df_smiles, df_final, 'MetFragSMILES1_inchikey', 'MetFragID1' , 'MF1')

There are 1 remaining candidates from MetFragSMILES1_inchikey that match with the 13C NMR workflow.

(124, 'LTS0251851 MF1', 'astringin')


##### Write results in the resume file

In [62]:
for x in x1:
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(x[0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=None
    Dict_sumup['NAP_SU'][numerous]=x[1]
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
    Dict_sumup['SIRIUS_SU'][numerous]=None
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]='GNPS, SIRIUS'
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]= 1
    Dict_sumup['3rd_Tool_SU'][numerous]=None
    Dict_sumup['Molecular_Name_SU'][numerous]=x[2]
    Dict_sumup['Confidence_Level'][numerous]=str(7)

##### Apply confidence score = 7 to nodes with the GNPS remaining candidates that match with the 13C NMR annotations

In [63]:
for x in x1:
    df['All_Tools_confidence'][df['shared name'].tolist().index(x[0])] = 7

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['All_Tools_confidence'][df['shared name'].tolist().index(x[0])] = 7


#### 3.3.2.2 Compare the annotations from NAP - MetFrag candidate 2, for the remaing features, to the 13C NMR workflow_to_nmr_annotation

In [64]:
x2=compare_to_nmr_annotation4a(df_smiles, df_final, 'MetFragSMILES2_inchikey', 'MetFragID2', 'MF2')

There are 3 remaining candidates from MetFragSMILES2_inchikey that match with the 13C NMR workflow.

(420, 'LTS0229171 MF2', '7-oxodehydroabietic-acid')
(816, 'LTS0229171 MF2', '7-oxodehydroabietic-acid')
(942, 'LTS0034017 MF2', 'dehydroabietic-acid')


##### Write results in the resume file

In [65]:
for x in x2:
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(x[0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=None
    Dict_sumup['NAP_SU'][numerous]=x[1]
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
    Dict_sumup['SIRIUS_SU'][numerous]=None
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]='GNPS, SIRIUS'
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]= 1
    Dict_sumup['3rd_Tool_SU'][numerous]=None
    Dict_sumup['Molecular_Name_SU'][numerous]=x[2]
    Dict_sumup['Confidence_Level'][numerous]=str(7)

##### Apply confidence score = 7 to nodes with the GNPS remaining candidates that match with the 13C NMR annotations

In [66]:
for x in x2:
    df['All_Tools_confidence'][df['shared name'].tolist().index(x[0])] = 7

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['All_Tools_confidence'][df['shared name'].tolist().index(x[0])] = 7


#### 3.3.2.3 Compare the annotations from NAP - MetFrag candidate 3, for the remaing features, to the 13C NMR workflow_to_nmr_annotation

In [67]:
x3=compare_to_nmr_annotation4a(df_smiles, df_final, 'MetFragSMILES3_inchikey', 'MetFragID3', 'MF3')

There is no remaining candidate from MetFragSMILES3_inchikey matching with the 13C NMR workflow.


##### Write results in the resume file

In [68]:
for x in x3:
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(x[0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=None
    Dict_sumup['NAP_SU'][numerous]=x[1]
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
    Dict_sumup['SIRIUS_SU'][numerous]=None
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]='GNPS, SIRIUS'
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]= 1
    Dict_sumup['3rd_Tool_SU'][numerous]=None
    Dict_sumup['Molecular_Name_SU'][numerous]=x[2]
    Dict_sumup['Confidence_Level'][numerous]=str(7)

##### Apply confidence score = 7 to nodes with the GNPS remaining candidates that match with the 13C NMR annotations

In [69]:
for x in x3:
    df['All_Tools_confidence'][df['shared name'].tolist().index(x[0])] = 7

#### 3.4 Configure the resume file

In [70]:
df_final = df_final.reset_index()

##### 3.5 Now it is turn for SIRIUS annotations, for the reaming features, to be be compared to 13C NMR workflow ones

#### 3.5.1 Get the corresponding InChIkey for the candidates coming from the corresponding annotation workflow (SIRIUS)
##### /!\ This step could be a bit long... So, it is maybe time for a coffee ? /!\

In [71]:
for i in tqdm(range(len(df_final['shared name'].tolist())), unit ='Feature', desc ="SIRIUS features annotations recovered:" ):
    aa=''    
    filein =[x for x in list_folder if math.isnan(df_final['shared name'].tolist()[i])  == False and  str('_' + str(int(df_final['shared name'].tolist()[i]))) == x[-len(str('_' + str(int(df_final['shared name'].tolist()[i])))):]  ]
    if len(filein) !=0:
        os.chmod(filein[0], stat.S_IROTH)

        if os.path.exists(filein[0]+'/structure_candidates.tsv'):
            df_sirius_formula = pd.read_csv(filein[0]+'/structure_candidates.tsv', sep='\t')

            aa = df_sirius_formula['InChIkey2D'].tolist()
            df_final['SIRIUS_inchikey'][i] = aa
            
            list_formula_adduct=[]
            for formula in range(len(df_sirius_formula['molecularFormula'].tolist())):
                bb= df_sirius_formula['molecularFormula'].tolist()[formula] + df_sirius_formula['adduct'].tolist()[formula]
                list_formula_adduct.append(bb)
            df_final['SIRIUS_formula'][i] = list_formula_adduct
            

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula_adduct
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_inchikey'][i] = aa
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['SIRIUS_formula'][i] = list_formula

#### 3.5.2 Compare the  SIRIUS annotations, for the remaing features, to the ones coming from 13C NMR workflow

In [72]:
def compare_to_nmr_annotation4b(df_smiles, df_common_sumup, cat):
    list_ID1=[]
    if len(df_common_sumup) == 0:
        print('There is no corresponding feature for this comparison.')
    else:
        for x in range(len(df_common_sumup[cat].tolist())):

            for y in df_smiles['InchiKey'].tolist():

                if y in df_common_sumup[cat][x]:

                    txt = int(df_common_sumup['shared name'].tolist()[x]), y,  df_common_sumup['SIRIUS_formula'][x][df_common_sumup[cat][x].index(y)], df_smiles[0].tolist()[df_smiles['InchiKey'].tolist().index(y)]

                    list_ID1.append((txt))
        if len(list_ID1) == 0 :
            print('There is no match between remaing SIRIUS annotations and 13C NMR workflow ones.')
        else:
            print(f'There are {len(list_ID1)} features with a reamaing SIRIUS annotation that match with one of the 13C NMR workflow.' + '\n')
            for x in list_ID1:
                print(x)
    return list_ID1

In [73]:
x4=compare_to_nmr_annotation4b(df_smiles, df_final, 'SIRIUS_inchikey')

There are 42 features with a reamaing SIRIUS annotation that match with one of the 13C NMR workflow.

(250, 'PFTAWBLQPZVEMU', 'C15H14O6[M - H2O + H]+', 'catechin')
(250, 'PFTAWBLQPZVEMU', 'C15H14O6[M - H2O + H]+', 'catechin')
(209, 'OYHQOLUKZRVURQ', 'C18H32O2[M + Na]+', 'linoleic-acid')
(935, 'NFWKVWVWBFBAOV', 'C20H28O2[M - H2O + H]+', 'dehydroabietic-acid')
(700, 'KSEBMYQBYZTDHS', 'C10H10O4[M - H2O + H]+', 'ferulic-acid')
(743, 'NFWKVWVWBFBAOV', 'C20H28O2[M - H2O + H]+', 'dehydroabietic-acid')
(1253, 'NFWKVWVWBFBAOV', 'C20H28O2[M + Na]+', 'dehydroabietic-acid')
(1099, 'MXYATHGRPJZBNA', 'C20H30O2[M - H4O2 + H]+', 'isopimaric-acid')
(76, 'PFTAWBLQPZVEMU', 'C15H14O6[M - H2O + H]+', 'catechin')
(76, 'PFTAWBLQPZVEMU', 'C15H14O6[M - H2O + H]+', 'catechin')
(1230, 'KSEBMYQBYZTDHS', 'C10H10O4[M - H2O + H]+', 'ferulic-acid')
(958, 'OYHQOLUKZRVURQ', 'C18H32O2[M - H4O2 + H]+', 'linoleic-acid')
(460, 'MSWJSDLNPCSSNW', 'C20H26O3[M - H2O + H]+', '7-oxodehydroabietic-acid')
(856, 'NFWKVWVWBFBAOV', '

##### Write results in the resume file 
##### Apply confidence score = 6 to nodes with the SIRIUS candidates that match with the 13C NMR annotations

In [74]:
for x in x4:
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(x[0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=None
    Dict_sumup['NAP_SU'][numerous]=None
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
    Dict_sumup['SIRIUS_SU'][numerous]=str(x[1] + ' ' + x[2])
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]='GNPS, NAP'
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]= 1
    Dict_sumup['3rd_Tool_SU'][numerous]=None
    Dict_sumup['Molecular_Name_SU'][numerous]=x[-1]
    Dict_sumup['Confidence_Level'][numerous]=str(6)

## 4. Final step: compare the annotation from the remaining Mass workflow that did not match with another one

#### 4.1 Compare the last Mass annotation results (GNPS or NAP), to the 13C NMR candidates 

In [75]:
def compare_3rd_tool_NAP_or_GNPS(job_tupple,cat):
    name_out=[]
    
    if len(job_tupple) !=0:
        for x in job_tupple:
            m = Chem.MolFromSmiles(df[cat].loc[df['shared name'] == x[0]].tolist()[0])
            ic=Chem.MolToInchi(m)
            ik=Chem.rdinchi.InchiToInchiKey(ic).split('-')[0]
#             print(ik)
            if ik in df_smiles['InchiKey']:
                name_out.append((x[0], df_smiles[0].loc[df_smiles['InchiKey'] == ik].tolist()[0]), str(cat+'_3rd_tool_that_didnot_match'))
        if len(name_out) == 0 :
            print('There is no match for this last comparison.')
        else:
            print(f'There are {len(name_out)} features with the last Mass tool annotation that match a 13C NMR candidate.' + '\n')
            for x in name_out:
                print(x)
    else:
        print('There is no corresponding feature for this last comparison.')
        
    return name_out    

#### 4.1.1 If the last Mass tool to compare is GNPS

In [76]:
xc1o = compare_3rd_tool_NAP_or_GNPS(xc1,'Smiles')

There is no match for this last comparison.





##### Write results in the resume file 
##### Apply confidence score = 13 to nodes with the GNPS candidate that match with the 13C NMR annotations

In [77]:
for x in xc1o:
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(x[0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=None
    Dict_sumup['NAP_SU'][numerous]=None
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
    Dict_sumup['SIRIUS_SU'][numerous]=None
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]=None
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]= 1
    Dict_sumup['3rd_Tool_SU'][numerous]=str(x[1] + ' ' + x[2])
    Dict_sumup['Molecular_Name_SU'][numerous]=None
    Dict_sumup['Confidence_Level'][numerous]=str(13)

#### 4.1.2 If the last Mass tool to compare is NAP - MetFrag candidate 1, 2 and 3

#### 4.1.2.1 For MetFrag candidate 1

In [78]:
xb1o_a = compare_3rd_tool_NAP_or_GNPS(xb1,'MetFragSMILES1')

There is no corresponding feature for this last comparison.


##### Write results in the resume file 
##### Apply confidence score = 15 to nodes with the MetFrag candidate 1 that match with the 13C NMR annotations

In [79]:
for x in xb1o_a:
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(x[0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=None
    Dict_sumup['NAP_SU'][numerous]=None
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
    Dict_sumup['SIRIUS_SU'][numerous]=None
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]=None
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]= 1
    Dict_sumup['3rd_Tool_SU'][numerous]=str(x[1] + ' ' + x[2])
    Dict_sumup['Molecular_Name_SU'][numerous]=None
    Dict_sumup['Confidence_Level'][numerous]=str(15)

#### 4.1.2.2 For MetFrag candidate 2

In [80]:
xb1o_b = compare_3rd_tool_NAP_or_GNPS(xb1,'MetFragSMILES2')

There is no corresponding feature for this last comparison.


##### Write results in the resume file 
##### Apply confidence score = 15 to nodes with the MetFrag candidate 2  that match with the 13C NMR annotations

In [81]:
for x in xb1o_b:
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(x[0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=None
    Dict_sumup['NAP_SU'][numerous]=None
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
    Dict_sumup['SIRIUS_SU'][numerous]=None
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]=None
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]= 1
    Dict_sumup['3rd_Tool_SU'][numerous]=str(x[1] + ' ' + x[2])
    Dict_sumup['Molecular_Name_SU'][numerous]=None
    Dict_sumup['Confidence_Level'][numerous]=str(15)

#### 4.1.2.3 For MetFrag candidate 3

In [82]:
xb1o_c = compare_3rd_tool_NAP_or_GNPS(xb1,'MetFragSMILES3')

There is no corresponding feature for this last comparison.


##### Write results in the resume file 
##### Apply confidence score = 15 to nodes with the MetFrag candidate 3 that match with the 13C NMR annotations

In [83]:
for x in xb1o_c:
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(x[0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=None
    Dict_sumup['NAP_SU'][numerous]=None
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
    Dict_sumup['SIRIUS_SU'][numerous]=None
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]=None
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]= 1
    Dict_sumup['3rd_Tool_SU'][numerous]=str(x[1] + ' ' + x[2])
    Dict_sumup['Molecular_Name_SU'][numerous]=None
    Dict_sumup['Confidence_Level'][numerous]=str(15)

#### 4.1.3 If the last Mass tool to compare is SIRIUS

In [84]:
def compare_3rd_tool_SIRIUS():
    name_out=[]
    if len(xb2) !=0:
        for y in xb2:
    #         print(y)
            filein =[x for x in list_folder if  str('_' + str(y[0])) == x[-len('_' + str(y[0])):]]
            if len(filein)!=0:
                os.chmod(filein[0], stat.S_IROTH)

                if os.path.exists(filein[0]+'/structure_candidates.tsv'):
                    df_sirius_formula = pd.read_csv(filein[0]+'/structure_candidates.tsv', sep='\t')
                    sirius_inchikey_list=df_sirius_formula['InChIkey2D'].tolist()
#                     print(sirius_inchikey_list)
                    for ik in sirius_inchikey_list:
                        if ik in df_smiles['InchiKey'].tolist():
                            name_out.append((y[0],df_smiles[0].loc[df_smiles['InchiKey'] == ik].tolist()[0], 'SIRIUS_3rd_tool_that_didnot_match'))
        if len(name_out) == 0 :
            print('There is no match for this last comparison.')
        else:
            print(f'There are {len(name_out)} features with the last Mass tool annotation that match a 13C NMR candidate.' + '\n')
            for x in name_out:
                print(x)
    else:
        print('There is no corresponding feature for this last comparison.')
    return name_out

In [85]:
xb2o = compare_3rd_tool_SIRIUS()

There is no corresponding feature for this last comparison.


##### Write results in the resume file 
##### Apply confidence score = 14 to nodes with the SIRIUS candidates that match with the 13C NMR annotations

In [86]:
for x in xb2o:
    numerous=numerous+1
    Dict_sumup['Feature_SU'][numerous]=int(x[0])
    Dict_sumup['Rt_SU'][numerous]=None
    Dict_sumup['GNPS_SU'][numerous]=None
    Dict_sumup['NAP_SU'][numerous]=None
    Dict_sumup['MolNetEnhancer_SU'][numerous]=None
    Dict_sumup['SIRIUS_SU'][numerous]=None
    Dict_sumup['Not_Matching_Tool_Annotation_SU'][numerous]=None
    Dict_sumup['Is_NMR_Annotated_SU'][numerous]= 1
    Dict_sumup['3rd_Tool_SU'][numerous]=str(x[1] + ' ' + x[2])
    Dict_sumup['Molecular_Name_SU'][numerous]=None
    Dict_sumup['Confidence_Level'][numerous]=str(14)

## 5. Finalize the results file

In [87]:
df_resume= pd.DataFrame.from_dict(Dict_sumup)

#### 5.1 Add the corresponding 13C NMR candidate(s) name to the final results

In [90]:
for x in xa:
    if x[0] in df_resume['Feature_SU'].tolist():
        df_resume['Is_NMR_Annotated_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = 1
        df_resume['Molecular_Name_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = x[-1]
        df_resume['Confidence_Level'][df_resume['Feature_SU'].tolist().index(x[0])] = 1

for x in xb1:
    if x[0] in df_resume['Feature_SU'].tolist():
        df_resume['Is_NMR_Annotated_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = 1
        df_resume['Molecular_Name_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = x[-1]
        df_resume['Confidence_Level'][df_resume['Feature_SU'].tolist().index(x[0])] = 4

for x in xb2:
    if x[0] in df_resume['Feature_SU'].tolist():
        df_resume['Is_NMR_Annotated_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = 1
        df_resume['Molecular_Name_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = x[-1]
        df_resume['Confidence_Level'][df_resume['Feature_SU'].tolist().index(x[0])] = 2

for x in xc1:
    if x[0] in df_resume['Feature_SU'].tolist():
        df_resume['Is_NMR_Annotated_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = 1
        df_resume['Molecular_Name_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = x[-1]
        df_resume['Confidence_Level'][df_resume['Feature_SU'].tolist().index(x[0])] = '3+'

for x in xc2:
    if x[0] in df_resume['Feature_SU'].tolist():
        df_resume['Is_NMR_Annotated_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = 1
        df_resume['Molecular_Name_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = x[-1]
        df_resume['Confidence_Level'][df_resume['Feature_SU'].tolist().index(x[0])] = 3



for x in x0:
    if x[0] in df_resume['Feature_SU'].tolist():
        df_resume['Is_NMR_Annotated_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = 1
        df_resume['Molecular_Name_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = x[-1]
        df_resume['Confidence_Level'][df_resume['Feature_SU'].tolist().index(x[0])] = 5

for x in x1:
    if x[0] in df_resume['Feature_SU'].tolist():
        df_resume['Is_NMR_Annotated_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = 1
        df_resume['Molecular_Name_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = x[-1]
        df_resume['Confidence_Level'][df_resume['Feature_SU'].tolist().index(x[0])] = 7

for x in x2:
    if x[0] in df_resume['Feature_SU'].tolist():
        df_resume['Is_NMR_Annotated_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = 1
        df_resume['Molecular_Name_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = x[-1]
        df_resume['Confidence_Level'][df_resume['Feature_SU'].tolist().index(x[0])] = 7

for x in x3:
    if x[0] in df_resume['Feature_SU'].tolist():
        df_resume['Is_NMR_Annotated_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = 1
        df_resume['Molecular_Name_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = x[-1]
        df_resume['Confidence_Level'][df_resume['Feature_SU'].tolist().index(x[0])] = 7

for x in x4:
    if x[0] in df_resume['Feature_SU'].tolist():
        df_resume['Is_NMR_Annotated_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = 1
        df_resume['Molecular_Name_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = x[-1]
        df_resume['Confidence_Level'][df_resume['Feature_SU'].tolist().index(x[0])] = 6

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_resume['Is_NMR_Annotated_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_resume['Molecular_Name_SU'][df_resume['Feature_SU'].tolist().index(x[0])] = x[-1]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_resume['Confidence_Level'][df_resume['Feature_SU'].tolist().index(x[0])] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.o

#### 5.2 Add the corresponding NPClassifier infos, m/z and Retention time (Rt) for each feaute, to the final results

In [91]:
for x in df_resume['Feature_SU'].tolist():
    if x in df['shared name'].tolist():

        df_resume['MolNetEnhancer_SU'][df_resume['Feature_SU'].tolist().index(x)] = df['npclassifier_superclass'][df['shared name'].tolist().index(x)]
        df_resume['Rt_SU'][df_resume['Feature_SU'].tolist().index(x)] = df['RTConsensus'][df['shared name'].tolist().index(x)]
        df_resume['m/z'][df_resume['Feature_SU'].tolist().index(x)] = df['row m/z'][df['shared name'].tolist().index(x)]
df_resume.rename(columns = {'MolNetEnhancer_SU':'NPClassifier_superclass_SU'}, inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_resume['MolNetEnhancer_SU'][df_resume['Feature_SU'].tolist().index(x)] = df['npclassifier_superclass'][df['shared name'].tolist().index(x)]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_resume['Rt_SU'][df_resume['Feature_SU'].tolist().index(x)] = df['RTConsensus'][df['shared name'].tolist().index(x)]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_resume['m/z'][df_resume['Feature_SU'].tolist().index(x)] = df['row m/z'][df['shared name'].tolist().index(x)]


##### If you want to vizualize the final results, please run the cell below.

In [92]:
df_resume

Unnamed: 0,Feature_SU,m/z,Rt_SU,GNPS_SU,NAP_SU,NPClassifier_superclass_SU,SIRIUS_SU,Not_Matching_Tool_Annotation_SU,Is_NMR_Annotated_SU,3rd_Tool_SU,Molecular_Name_SU,Confidence_Level
0,190,303.050409,4.5092,CCMSLIB00005739139,LTS0004651 MF2,Flavonoids,REFJWTPEDVJJIY C15H10O7[M + H]+,,,,,9
1,51,579.150185,2.6637,CCMSLIB00005742589,LTS0066122 MF1,Flavonoids,XFZJEEAOWLFHDH C30H26O12[M + H]+,,,,,9
2,283,287.055699,5.1329,CCMSLIB00005748053,LTS0155822 MF1,Flavonoids,IYRMWMYZSQPJKC C15H10O6[M + H]+,,,,,9
3,367,287.055521,7.0902,CCMSLIB00005749366,LTS0155822 MF1,Flavonoids,IYRMWMYZSQPJKC C15H10O6[M + H]+,,,,,9
4,60,291.086742,2.9239,CCMSLIB00005742701,LTS0265245 MF1,Flavonoids,PFTAWBLQPZVEMU C15H14O6[M + H]+,,1.0,,catechin,1
...,...,...,...,...,...,...,...,...,...,...,...,...
186,2014,301.216685,21.6678,,,,NFWKVWVWBFBAOV C20H28O2[M + H]+,"GNPS, NAP",1.0,,dehydroabietic-acid,6
187,1194,177.055194,17.1851,,,,KSEBMYQBYZTDHS C10H10O4[M - H2O + H]+,"GNPS, NAP",1.0,,ferulic-acid,6
188,1617,177.055104,20.0258,,,,KSEBMYQBYZTDHS C10H10O4[M - H2O + H]+,"GNPS, NAP",1.0,,ferulic-acid,6
189,1160,303.231609,16.8989,,,,MXYATHGRPJZBNA C20H30O2[M + H]+,"GNPS, NAP",1.0,,isopimaric-acid,6


# /!\ Write the results file that resumes all the above comparisons /!\

Just replace the red name between quotation marks in the cell bellow.

In [93]:
filout = df_resume.to_csv(resume_file_path, sep='\t', index=False)

#### Get the annotations that are only found by NMR (CaraMel) workflow

In [94]:
CaraMel_single_annotation = [(x,df_smiles[1].tolist()[df_smiles[0].tolist().index(x)]) for x in df_smiles[0].tolist() if x not in df_resume['Molecular_Name_SU'].unique().tolist()]
for x in CaraMel_single_annotation:
    print(x)

('epicatechin', 'C1[C@H]([C@H](OC2=CC(=CC(=C21)O)O)C3=CC(=C(C=C3)O)O)O')
('acetic_acid', 'CC(=O)O')
('piceatannol-3-o-glucoside', 'C1=CC(=C(C=C1/C=C/C2=CC(=CC(=C2)O)O)O[C@H]3[C@@H]([C@H]([C@@H]([C@H](O3)CO)O)O)O)O')
('glucosyl-frambinone', 'CC(=O)CCC1=CC=C(C=C1)OC2C(C(C(C(O2)CO)O)O)O')
('glycerol-monoacetate', 'CC(=O)OCC(CO)O')
('lavandoside', 'COC1=C(C=CC(=C1)/C=C/C(=O)O)O[C@H]2[C@@H]([C@H]([C@@H]([C@H](O2)CO)O)O)O')
('glucosyl-trans-paracoumaric-acid', 'C1=CC(=CC=C1/C=C/C(=O)O)O[C@H]2[C@@H]([C@H]([C@@H]([C@H](O2)CO)O)O)O')
('compound-a', 'OC1=C(CC(O)=O)C(OC(C2=CC=C(O)C=C2)=O)=CC(O)=C1')
('dianthoside', 'CC1=C(C(=O)C=CO1)O[C@H]2[C@@H]([C@H]([C@@H]([C@H](O2)CO)O)O)O')
('arabinosyl-glucosyl-myrtenic-acid', 'CC1([C@H]2C(C(O[C@H]3[C@@H]([C@H]([C@@H]([C@H](O3)CO[C@H]4[C@@H]([C@H]([C@H](CO4)O)O)O)O)O)O)=O)=CC[C@@H]1C2)C')
('rhamnosyl-glucosyl-myrtenic-acid', 'CC1(C)C2C(C(O[C@H]3[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO[C@H]4[C@H](O)[C@H](O)[C@@H](O)[C@H](C)O4)O3)=O)=CCC1C2')
('oleic-acid', 'CCCCCC

This is a sum-up of the name of the list for the different comparisons:
- xa → gnps_sirus_nap common annotation features
- xb1 →gnps_nap common annotation features
- xb2 → gnps_sirius common annotation features
- xc1→ nap_sirius common annotation features, with a different gnps candidate
- xc2 → nap_siriuscommon annotation features without any gnps candidate

- x0 → gnps_13CNMR, if the features do not have a common annotation between at least 2 different Mass annotation workflow
- x1 → MetFrag1_13CNMR, , if the features do not have a common annotation between at least 2 different Mass annotation workflow
- x2 → MetFrag2_13CNMR, , if the features do not have a common annotation between at least 2 different Mass annotation workflow
- x3 → MetFrag3_13CNMR, , if the features do not have a common annotation between at least 2 different Mass annotation workflow
- x4 → sirius_13CNMR, , if the features do not have a common annotation between at least 2 different Mass annotation workflow

- xb1o_a → MetFrag1_13CNMR, if the features have a common annotation between gnps and sirius
- xb1o_b → MetFrag2, if the features have a common annotation between gnps and sirius
- xb1o_c → MetFrag3, if the features have a common annotation between gnps and sirius
- xb2o → SIRIUS, if the features have a common annotation between gnps and nap
- xc1o → GNPS, if the features have a common annotation between nap and sirius