CDExcelMessenger.py

CDExcelMessenger.py is the python module that contains the functions that allows you to pass data between an Excel file and a Compound Discoverer (CD) Results file. This module is dependant on Pandas.


tidyData() function

This function is expecting an Excel file with a "Compounds" sheet,
a "Meta" sheet with the columns - 'Filename', 'SampleType', 'SampleID', 'Order', and 'Batch',
a "Data" sheet with the columns - 'Idx', 'Filename' and the Transposed Areas (labeled 'M1' - 'Mn'),
a "Peak" sheet with the columns - 'Idx', 'Name', 'Label' and other columns from the Compounds sheet.

In [1]:
import CDExcelMessenger

excelFilePath = "C:/Users/CIMCB/Desktop/dataBeforeTidy.xlsx"

# Columns to keep and the names you would like to use for those columns
# If you put a '<' and '>' around a name, 
# this function will look for any column that starts with that name.
colsToKeepDict = {
    "Idx": "Idx",
    "UID": "UID",
    "Name": "Name",
    "Checked": "Checked",
    "Tags": "Tags",
    "Formula": "Formula",
    "RT [min]": "RT",
    "Calc. MW": "MW", 
    "MS2": "MS2",
    "# ChemSpider Results": "ChemSpiderRes",
    "# mzVault Results": "MzVaultRes",
    "# mzCloud Results": "MzCloudRes",
    "mzCloud Best Match": "mzCloudMatch",
    "mzVault Best Match": "mzVaultMatch",
    "<Mass List Match: >": "mzList_"
}

# Set "CIMCBlib" to True if you want to split peak names that start with ECU,
# Set "MSHit" to True if you want to get the sum of rows that have a MS2 hit 
# and to create the MS2Hit column.
# Set "mzmatch" to your chosen mzVault and mzCloud threshold
# Set "UIDPrefix" to the prefix you would like to have in the UID values
optionsDict = {
    "CIMCBlib": True,
    "MSHit": True,
    "mzmatch": 70,
    "UIDPrefix": "M"
}

CDExcelMessenger.tidyData(excelFilePath, colsToKeepDict, optionsDict)


C:/Users/CIMCB/Desktop/dataBeforeTidy.xlsx updated

Stats:
19 peaks with MS2 spectra
10 MassList hits
19 mzVault hits
14 mzCloud hits
19 unique hits


updateCDResultsFile() function

updateCDResultsFile() is the function that allows you to import data from an Excel file into a CD Results file. This function can add new columns to the CD compound table, or update certain columns in the CD compound table (Tags, Checked, Name, or any columns already added by this function). If you use this function, a 'Cleaned' column will be added to the CD compound table that flags the rows in the Compound table that were found in the Excel file. If you run this function a second time, 'Cleaned' will be renamed 'OldCleaned', and a new 'Cleaned' column will be added to the CD compound table. If your Excel file doesn't have a 'CD_ID' column, this function will add that column. Keep 'CD_ID' in the Excel file to improve performance the next time you run this function.

In [3]:
import CDExcelMessenger

# Set these variables to the paths of the CD Results file and the Excel file.
cdResultsFilePath = "C:/Users/CIMCB/Desktop/data.cdResult"
excelFilePath = "C:/Users/CIMCB/Desktop/data.xlsx"

# Set this variable to the name of your Peak sheet in the Excel file
peakSheetName = "Peak"

# List of column names in the Excel file that you would like to update/add to the CD Results file.
updateColNameList = ["UID", "CIMCBlib", "Name", "qcRSD", "dRatio", "blankRatio", "Checked", "Notes"]

# If you include "Tags" in the tag list, CD will contain the Tags found in the Tags Excel column.
# You can use boolean/binary Excel columns as Tags if you inlcude the names of those columns.
# Any other names included in the tag list will contain default values.
# You can have up to 15 tags
tagList = ["Tags", "ms2Hit", "goodPeakShape", "goodRT", "queryMS", "msError", "aTag"]

# Update the CD results file with Excel data.
CDExcelMessenger.updateCDResultsFile(
    cdResultsFilePath, 
    excelFilePath, 
    peakSheetName, 
    updateColNameList = updateColNameList, 
    tagList = tagList
)

#TO DO: default column (msi)(0-4 values, or null)


Importing data from C:/Users/CIMCB/Desktop/data.xlsx into C:/Users/CIMCB/Desktop/data.cdResult
Tag names and visibility updated in C:/Users/CIMCB/Desktop/data.cdResult
Column: UID updated
Column: Name updated
Column: Checked updated
Column: Notes updated
Column: Tags updated
C:/Users/CIMCB/Desktop/data.cdResult updated


updateExcelFile() function

updateExcelFile() is the function that allows you to import data from a CD results file into an Excel file. This function doesn't allow you to export certain data from the CD compound table (e.g. Area). If your Excel file doesn't have a 'CD_ID' column, this function will add that column. Keep this column in the Excel file to improve performance the next time you run this function.

In [2]:
import CDExcelMessenger

# Set these variables to the paths of the CD Results file and the Excel file.
cdResultsFilePath = "C:/Users/CIMCB/Desktop/data.cdResult"
excelFilePath = "C:/Users/CIMCB/Desktop/data.xlsx"

# Set this variable to the name of your Peak sheet in the Excel file
peakSheetName = "Peak"

# List of columns in the Excel file that you would like to update. 
updateColNameList = ["Checked", "Tags", "Notes"]

# Set this value to True if you want to remove rows from the Excel file 
# that have been checked in CD
removeCheckedRows = True

# Update the Excel file with CD compound data 
CDExcelMessenger.updateExcelFile(
    cdResultsFilePath, 
    excelFilePath, 
    peakSheetName, 
    updateColNameList = updateColNameList,
    removeCheckedRows = removeCheckedRows
)

#TO DO: create msi 0-4


Importing data from C:/Users/CIMCB/Desktop/data.cdResult into C:/Users/CIMCB/Desktop/data.xlsx
Column: ms2Hit updated in C:/Users/CIMCB/Desktop/data.xlsx
Column: goodPeakShape added to C:/Users/CIMCB/Desktop/data.xlsx
Column: goodRT added to C:/Users/CIMCB/Desktop/data.xlsx
Column: queryMS added to C:/Users/CIMCB/Desktop/data.xlsx
Column: msError added to C:/Users/CIMCB/Desktop/data.xlsx
Column: aTag added to C:/Users/CIMCB/Desktop/data.xlsx
Column: Checked updated in C:/Users/CIMCB/Desktop/data.xlsx
Column: Tags updated in C:/Users/CIMCB/Desktop/data.xlsx
Column: Notes added to C:/Users/CIMCB/Desktop/data.xlsx
Dropped rows that were Checked
C:/Users/CIMCB/Desktop/data.xlsx updated
