CDExcelMessenger.py

CDExcelMessenger.py is the python module that contains the functions that allows you to pass data between an Excel file and a Compound Discoverer (CD) Results file. This module is dependant on Pandas. The Python code was written with Python 3.9.12.


tidyData() function

This function is expecting an Excel file with a "Compounds" sheet,
and a "Meta" sheet with the columns - 'Filename', 'SampleType', 'SampleID', 'Order', and 'Batch'.

This function will convert the Compounds table and Meta table into the TidyData format with a Data sheet and a Peak sheet.

In [None]:
import CDExcelMessenger

# Set this variable to the path of your excel file
excelFilePath = "C:/Users/CIMCB/Desktop/dataBeforeTidy.xlsx"

# Columns to keep and the names you would like to use for those columns
# If you put a '<' and '>' around a name, 
# this function will look for any column that starts with that name.
colsToKeepDict = {
    "Idx": "Idx",
    "UID": "UID",
    "Name": "Name",
    "Checked": "Checked",
    "Tags": "Tags",
    "Formula": "Formula",
    "RT [min]": "RT",
    "Calc. MW": "MW", 
    "MS2": "MS2",
    "# ChemSpider Results": "ChemSpiderRes",
    "# mzVault Results": "MzVaultRes",
    "# mzCloud Results": "MzCloudRes",
    "mzCloud Best Match": "mzCloudMatch",
    "mzVault Best Match": "mzVaultMatch",
    "<Mass List Match: >": "mzList_"
}

# Set "CIMCBlib" to True if you want to split peak names that start with ECU into two columns,
# Set "MSHit" to True if you want to get the sum of rows that have a MS2 hit 
# and to create the MS2Hit column.
# Set "mzmatch" to your chosen mzVault and mzCloud threshold
# Set "UIDPrefix" to the prefix you would like to have in the UID values
optionsDict = {
    "CIMCBlib": True,
    "MSHit": True,
    "mzmatch": 70,
    "UIDPrefix": "M"
}

CDExcelMessenger.tidyData(excelFilePath, colsToKeepDict, optionsDict)


updateCDResultsFile() function

updateCDResultsFile() is the function that allows you to import data from an Excel file into a CD Results file. This function can add new columns to the CD compound table, or update certain columns in the CD compound table (Tags, Checked, Name, or any columns already added by this function). A 'Cleaned' column will be added to the CD compound table that flags the rows in the Compound table that were found in the Excel file. An editible column called 'Notes' will be added to CD by default. A non-editible column called 'originalName' will be added to CD that contains the same data as the Name column. This will allow you to update the Name column while keeping the original Name data. This function also gives you the option of updating the Tag names and values in CD if you use 'tagList'. An MSI column will be added to the CD Results file based on whether or not certain Tags are checked.. If your Excel file doesn't have a 'compoundID' column, this function will add that column (make sure you don't use compoundID for something else). Keep 'compoundID' in the Excel file to improve performance the next time you run this function.



In [2]:
import CDExcelMessenger

# Set these variables to the paths of the CD Results file and the Excel file.
cdResultsFilePath = "C:/Users/CIMCB/Desktop/data.cdResult"
excelFilePath = "C:/Users/CIMCB/Desktop/data.xlsx"

# Set this variable to the name of your Peak sheet in the Excel file
peakSheetName = "Peak"

# List of column names in the Excel file that you would like to update/add to the CD Results file.
excelColList = ["UID", "CIMCBlib", "Name", "qcRSD", "dRatio", "blankRatio", "Notes", "Checked"]

# If you include "Tags" in the tag list, 
# CD will contain the Tags found in the Tags Excel column (; should be the delimiter).
# You can use boolean/binary Excel columns as Tags if you inlcude the names of those columns.
# Any other names included in the tag list will still be added to the CD Tags.
# You can have up to 15 tags.
tagList = ["ms2Hit", "goodPeakShape", "goodRT", "queryMS", "msError"]

# Update the CD results file with Excel data.
CDExcelMessenger.updateCDResultsFile(
    cdResultsFilePath, 
    excelFilePath, 
    peakSheetName, 
    excelColList = excelColList, 
    tagList = tagList
)



Importing data from C:/Users/CIMCB/Desktop/data.xlsx into C:/Users/CIMCB/Desktop/data.cdResult
Tag names and visibility updated in C:/Users/CIMCB/Desktop/data.cdResult
Column: Cleaned added to C:/Users/CIMCB/Desktop/data.cdResult
Column: originalName added to C:/Users/CIMCB/Desktop/data.cdResult
Column: MSI added to C:/Users/CIMCB/Desktop/data.cdResult
Column: UID added to C:/Users/CIMCB/Desktop/data.cdResult
Column: UID updated
Column: Name updated
Column: Checked updated
Column: Notes added to C:/Users/CIMCB/Desktop/data.cdResult
Column: Notes updated
Column: Tags updated
C:/Users/CIMCB/Desktop/data.cdResult updated


updateExcelFile() function

updateExcelFile() is the function that allows you to import data from a CD results file into an Excel file. This function doesn't allow you to export certain data from the CD compound table (e.g. Area). If you import the Tags data into the Excel file, this function will also add new columns with the data of the individual Tags, or update those columns if they already exist. An MSI column will be added to the Excel file based on whether or not certain Tags are checked. This function also gives you the option of removing rows that have been Checked in CD. If your Excel file doesn't have a 'compoundID' column, this function will add that column (make sure you don't use compoundID for something else). Keep 'compoundID' in the Excel file to improve performance the next time you run this function.

In [1]:
import CDExcelMessenger

# Set these variables to the paths of the CD Results file and the Excel file.
cdResultsFilePath = "C:/Users/CIMCB/Desktop/data.cdResult"
excelFilePath = "C:/Users/CIMCB/Desktop/data.xlsx"

# Set these variables to the names of your Peak sheet and Data sheet in the Excel file
# Including the data sheet is optional
peakSheetName = "Peak"
dataSheetName = "Data"

# List of columns in the Excel file that you would like to update. 
excelColList = ["Tags", "Checked", "Notes"]

# Set this variable to True if you want to remove rows from the Excel file 
# that have been checked in CD (default is False)
removeCheckedRows = True

# Set this variable to True if you want to append the new sheets to the Excel file 
# (the sheet names will be given the suffix 'Appended'),
# If you set this variable to False, your Excel sheets will get written over (default is False)
appendSheets = True 

# Update the Excel file with CD compound data 
CDExcelMessenger.updateExcelFile(
    cdResultsFilePath, 
    excelFilePath, 
    peakSheetName, 
    dataSheetName = dataSheetName,
    excelColList = excelColList,
    removeCheckedRows = removeCheckedRows,
    appendSheets = appendSheets
)



Importing data from C:/Users/CIMCB/Desktop/data.cdResult into C:/Users/CIMCB/Desktop/data.xlsx
Column: MSI added to C:/Users/CIMCB/Desktop/data.xlsx
Column: compoundID added to C:/Users/CIMCB/Desktop/data.xlsx
Column: compoundID added to C:/Users/CIMCB/Desktop/data.cdResult
Column: A added to C:/Users/CIMCB/Desktop/data.xlsx
Column: B added to C:/Users/CIMCB/Desktop/data.xlsx
Column: C added to C:/Users/CIMCB/Desktop/data.xlsx
Column: D added to C:/Users/CIMCB/Desktop/data.xlsx
Column: E added to C:/Users/CIMCB/Desktop/data.xlsx
Column: Tags updated in C:/Users/CIMCB/Desktop/data.xlsx
Column: Checked updated in C:/Users/CIMCB/Desktop/data.xlsx
Dropped rows from peak sheet
C:/Users/CIMCB/Desktop/data.xlsx updated
