CPH and ML 08/2022

# MOCCA data analysis of cyanation well plate screening

MOCCA is a tool for the analysis of *High-Performance Liquid Chromatography–Diode Array Detection* (HPLC–DAD) datasets which are recorded in the context of reaction (process) controls. It only requires HPLC–DAD raw data and some basic user input for the data analysis.

## Reaction and case study background

This case study deals with a screening of discrete reaction parameters on a well plate and its data analysis using MOCCA. The chosen reaction is a palladium-catalyzed cyanation of 2-chlorotoluene yielding o-tolunitrile. We screened all possible combinations of seven different O-protected cyanohydrins, four different bases, and three different ligands. Experimental details and a discussion of the results are provided within the manuscript.

## Imports

In [1]:
# folders handling
import os
from glob import glob

# user interaction
from mocca.user_interaction.campaign import HplcDadCampaign
from mocca.user_interaction.user_objects import Gradient
from mocca.user_interaction.user_objects import Compound
from mocca.user_interaction.user_objects import InternalStandard
from mocca.user_interaction.user_objects import HplcInput
from mocca.user_interaction.settings import Settings

# reporting
from mocca.report.main import report

"""
#creation of separate reports
from mocca.report.hplc_input import report_hplc_input
from mocca.report.gradient import report_gradients
from mocca.report.chromatograms import report_chroms
from mocca.report.bad_chromatograms import report_bad_chroms
from mocca.report.compound_tracking import report_comp_tracking
from mocca.report.peak_library import report_peak_library
from mocca.report.compound_library import report_comp_library
from mocca.report.calibration_library import report_calib_library
from mocca.report.deconvolution import report_deconvolution
"""

%load_ext autoreload
%autoreload 2

### Cyanation data folder handling

The data corresponding to this notebook can be found in mocca -> notebooks -> cyanation_data. The data was recorded on an Shimadzu system with an automatic export of the raw data to a .txt file.

In [6]:
# get path of this notebook
ipynb_path = os.path.dirname(os.path.realpath("__file__"))

# add the path to the test data folder
cyan_data_path = os.path.join(ipynb_path, "cyanation_data")

# find all folders containing Shimadzu/Labsolutions HPLC data (.txt file extension)
folders = glob(cyan_data_path + '/*' + '.txt') 
folders = sorted(folders, key=lambda x: int(x.split('_')[-1][:-4]))

In [7]:
for folder in folders:
    print(os.path.basename(folder))

09072021_sample_4.txt
09072021_sample_5.txt
09072021_sample_6.txt
09072021_sample_7.txt
09072021_sample_8.txt
09072021_sample_9.txt
09072021_sample_10.txt
09072021_sample_11.txt
09072021_sample_12.txt
09072021_sample_13.txt
09072021_sample_14.txt
09072021_sample_15.txt
09072021_sample_16.txt
09072021_sample_17.txt
09072021_sample_18.txt
09072021_sample_19.txt
09072021_sample_20.txt
09072021_sample_21.txt
09072021_sample_22.txt
09072021_sample_23.txt
09072021_sample_24.txt
09072021_sample_25.txt
09072021_sample_26.txt
09072021_sample_27.txt
09072021_sample_28.txt
09072021_sample_29.txt
09072021_sample_30.txt
09072021_sample_31.txt
09072021_sample_32.txt
09072021_sample_33.txt
09072021_sample_34.txt
09072021_sample_35.txt
09072021_sample_36.txt
09072021_sample_37.txt
09072021_sample_38.txt
09072021_sample_39.txt
09072021_sample_40.txt
09072021_sample_41.txt
09072021_sample_42.txt
09072021_sample_43.txt
09072021_sample_44.txt
09072021_sample_45.txt
09072021_sample_46.txt
09072021_sample_4

### Campaign initialization

In [8]:
cyan_campaign = HplcDadCampaign()

We create a Gradient object as well as an InternalStandard object. The same gradient run was used for the baseline correction of all recorded chromatograms within this screening. The same amount of internal standard was added to all reaction runs as well as the calibration standards of the substrate and product.

In [9]:
gradient = Gradient(next(folder for folder in folders if "gradient" in folder))

In [10]:
internal_standard = InternalStandard('tetralin', 0.06094)

### User input for calibration runs

We manually enter the concentration values/molar amounts of the calibration standards.

In [11]:
ArCl_concs = [0.0603, 0.04422]
ArCN_concs = [0.05955, 0.04367]

We create an HplcInput object for the internal standard run and add it to the campaign.

In [12]:
istd_run = HplcInput(next(folder for folder in folders if "istd" in folder),
                     gradient, compound=Compound('tetralin', is_istd=True))
cyan_campaign.add_hplc_input(istd_run)

We create HplcInput objects for each calibration run and add them to the campaign.

In [13]:
ArCl_folders = [folder for folder in folders if "educt" in folder]
for i, folder in enumerate(ArCl_folders):
    compound = Compound('2-chlorotoluene', ArCl_concs[i])
    exp = HplcInput(folder, gradient, compound=compound, istd=internal_standard)
    cyan_campaign.add_hplc_input(exp)

ArCN_folders = [folder for folder in folders if "product" in folder]
for i, folder in enumerate(ArCN_folders):
    compound = Compound('o-tolunitrile', ArCN_concs[i])
    exp = HplcInput(folder, gradient, compound=compound, istd=internal_standard)
    cyan_campaign.add_hplc_input(exp)

cn_source_a_folder = next(folder for folder in folders if "cnsource_a" in folder)
compound = Compound('protected_cyanohydrin_9a')
exp = HplcInput(cn_source_a_folder, gradient, compound=compound, istd=internal_standard)
cyan_campaign.add_hplc_input(exp)

cn_source_d_folder = next(folder for folder in folders if "cnsource_d" in folder)
compound = Compound('protected_cyanohydrin_9d')
exp = HplcInput(cn_source_d_folder, gradient, compound=compound, istd=internal_standard)
cyan_campaign.add_hplc_input(exp)

### User input for reaction runs
We create HplcInput objects for each reaction run and add them to the campaign.

In [14]:
for folder in [folder for folder in folders if "sample" in folder]:
    exp = HplcInput(folder, gradient, istd=internal_standard)
    cyan_campaign.add_hplc_input(exp)

### Settings for data processing

In [15]:
settings = Settings('labsolutions',
                    absorbance_threshold = 500, wl_high_pass = 215, 
                    peaks_high_pass = 1, peaks_low_pass = 5,
                    spectrum_correl_thresh=0.99, relative_distance_thresh=0.0025)

### Data processing

In [16]:
%%time
cyan_campaign.process_all_hplc_input(settings)

CPU times: total: 1h 9min
Wall time: 19min 48s


Save campaign as .pkl-file

In [17]:
#the campaign is saved within the notebook folder or alternatively to a given path
cyan_campaign.save_campaign('cyan_campaign.pkl', remove_raw_data=False)

Load campaign from .pkl-file

In [14]:
cyan_campaign.load_campaign('cyan_campaign.pkl')

### Reporting

In [18]:
#directory at which reports are saved
cyan_report_path = os.path.join(ipynb_path, "cyanation reports")

In [20]:
report(cyan_campaign, cyan_report_path)

Report saved to ./C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\GitHub\mocca\notebooks\cyanation reports\hplc_input.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to ./C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\GitHub\mocca\notebooks\cyanation reports\gradient.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to ./C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\GitHub\mocca\notebooks\cyanation reports\peak_library.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

error uploading: HTTPSConnectionPool(host='events.datapane.com', port=443): Max retries exceeded with url: /batch/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001CF04F16650>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
error uploading: HTTPSConnectionPool(host='events.datapane.com', port=443): Max retries exceeded with url: /batch/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001CF05200C40>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))


Report saved to ./C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\GitHub\mocca\notebooks\cyanation reports\chromatograms.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

No chromatograms given or all chromatograms are good data!


error uploading: HTTPSConnectionPool(host='events.datapane.com', port=443): Max retries exceeded with url: /batch/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001CF044D65F0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))


Report saved to ./C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\GitHub\mocca\notebooks\cyanation reports\compound_tracking.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

error uploading: HTTPSConnectionPool(host='events.datapane.com', port=443): Max retries exceeded with url: /batch/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001CF06758EE0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))


Report saved to ./C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\GitHub\mocca\notebooks\cyanation reports\deconvolution.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

error uploading: HTTPSConnectionPool(host='events.datapane.com', port=443): Max retries exceeded with url: /batch/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001CF071B4FD0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))


Report saved to ./C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\GitHub\mocca\notebooks\cyanation reports\compound_library.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to ./C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\GitHub\mocca\notebooks\cyanation reports\calibration_library.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

error uploading: HTTPSConnectionPool(host='events.datapane.com', port=443): Max retries exceeded with url: /batch/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001CF0458E6E0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
error uploading: HTTPSConnectionPool(host='events.datapane.com', port=443): Max retries exceeded with url: /batch/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001CF04588070>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))


### Customized data analysis

This data analysis tool cannot include all possible data analysis needs of potential users. Therefore, we expect the user to do customized data analysis independently of the tool's reporting function.

As an example, we show how the yields and conversions obtained by screening in a 96-well plate can be extracted from the MOCCA campaign and visualized. Additionally, the relative intensity of a side product, labelled as unknown_3, is calculated and visualized. Note that only 84 wells were used for reaction runs and the others for calibration purposes.

In [21]:
import pandas as pd
import altair as alt

We extract the results out of the campaign object and create a dataframe that contains all relevant information.

In [22]:
c0 = 0.06 #c0 refers to the molar amount of substrate that was used
results = {
    '2-chlorotoluene': [],
    'o-tolunitrile': [],
    'tetralin': [],
    'column': [],
    'row': [],
    'path': [],
    'conversion': [],
    'yield': [],
    'unknown_3': [],
    'unknown_3_rel_area': []
}
for i, chrom in enumerate(cyan_campaign.chroms):
    if not chrom.experiment.compound:
        results['2-chlorotoluene'].append(0)
        results['o-tolunitrile'].append(0)
        results['tetralin'].append(0)
        results['conversion'].append(1)
        results['yield'].append(0)
        results['unknown_3'].append(0)
        results['unknown_3_rel_area'].append(0)
        results['path'].append(chrom.experiment.path)
        for peak in chrom.peaks:
            if peak.compound_id == '2-chlorotoluene':
                results['2-chlorotoluene'][-1] = peak.concentration
                results['conversion'][-1] = (c0 - peak.concentration) / c0
            if peak.compound_id == 'o-tolunitrile':
                results['o-tolunitrile'][-1] = peak.concentration
                results['yield'][-1] = peak.concentration / c0
            if peak.compound_id == 'tetralin':
                results['tetralin'][-1] = peak.integral
            if peak.compound_id == 'unknown_3':
                results['unknown_3'][-1] = peak.integral

In [23]:
#add row and column indices
for path in results['path']:
    num = int(path.split('_')[-1][:-4]) - 4
    results['row'].append(str(num // 12 + 1))
    results['column'].append(str(num - (num // 12 * 12) + 1))

In [24]:
#convert dictionary into pandas dataframe
results_df = pd.DataFrame(results)
row_order = [str(val) for val in list(range(1, 13))]
results_df['unknown_3_rel_area'] = results_df['unknown_3'] / results_df['tetralin']
results_df



Unnamed: 0,2-chlorotoluene,o-tolunitrile,tetralin,column,row,path,conversion,yield,unknown_3,unknown_3_rel_area
0,0.031222,0.026152,72908.483002,1,1,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,0.479633,0.435863,339527.424758,4.656899
1,0.053005,0.003441,69100.487023,2,1,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,0.116577,0.057352,301732.937144,4.366582
2,0.056189,0.000000,66686.723268,3,1,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,0.063519,0.000000,288470.985884,4.325763
3,0.000000,0.040743,72699.519576,4,1,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,1.000000,0.679048,279534.968465,3.845073
4,0.000000,0.029075,66779.207989,5,1,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,1.000000,0.484581,246759.246137,3.695151
...,...,...,...,...,...,...,...,...,...,...
79,0.059307,0.000000,68115.138326,8,7,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,0.011549,0.000000,0.000000,0.000000
80,0.059044,0.000000,77373.684722,9,7,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,0.015931,0.000000,0.000000,0.000000
81,0.044453,0.000000,162296.594456,10,7,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,0.259115,0.000000,0.000000,0.000000
82,0.052634,0.000000,64807.098116,11,7,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,0.122758,0.000000,0.000000,0.000000


We add the type of protected cyanohydrin and choice of base and ligand to the result table.

In [25]:
protected_cyanohydrins = ['9a', '9b', '9c', '9d', '9e', '9f', '9g']
base_ligand_combinations = ['DBU/XPhos', 'DBU/'+ 'tBuXPhos', 'DBU/CM-Phos', 'TMG/XPhos', 'TMG/tBuXPhos', 'TMG/CM-Phos', 'DMAP/XPhos', 'DMAP/tBuXPhos', 'DMAP/CM-Phos', 'DIPEA/XPhos', 'DIPEA/tBuXPhos', 'DIPEA/CM-Phos']



In [27]:
for index, row in results_df.iterrows():
    results_df.at[index,'protected_cyanohydrin'] = protected_cyanohydrins[int(row['row'])-1]
    results_df.at[index,'B/L'] = base_ligand_combinations[int(row['column'])-1]
results_df

Unnamed: 0,2-chlorotoluene,o-tolunitrile,tetralin,column,row,path,conversion,yield,unknown_3,unknown_3_rel_area,protected_cyanohydrin,B/L
0,0.031222,0.026152,72908.483002,1,1,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,0.479633,0.435863,339527.424758,4.656899,9a,DBU/XPhos
1,0.053005,0.003441,69100.487023,2,1,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,0.116577,0.057352,301732.937144,4.366582,9a,DBU/tBuXPhos
2,0.056189,0.000000,66686.723268,3,1,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,0.063519,0.000000,288470.985884,4.325763,9a,DBU/CM-Phos
3,0.000000,0.040743,72699.519576,4,1,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,1.000000,0.679048,279534.968465,3.845073,9a,TMG/XPhos
4,0.000000,0.029075,66779.207989,5,1,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,1.000000,0.484581,246759.246137,3.695151,9a,TMG/tBuXPhos
...,...,...,...,...,...,...,...,...,...,...,...,...
79,0.059307,0.000000,68115.138326,8,7,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,0.011549,0.000000,0.000000,0.000000,9g,DMAP/tBuXPhos
80,0.059044,0.000000,77373.684722,9,7,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,0.015931,0.000000,0.000000,0.000000,9g,DMAP/CM-Phos
81,0.044453,0.000000,162296.594456,10,7,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,0.259115,0.000000,0.000000,0.000000,9g,DIPEA/XPhos
82,0.052634,0.000000,64807.098116,11,7,C:\Users\GMJAZ\OneDrive - Bayer\Personal Data\...,0.122758,0.000000,0.000000,0.000000,9g,DIPEA/tBuXPhos


We round the extracted yield and conversion values for visualization.

In [28]:
#round results and convert to %-values
results_rounded = pd.DataFrame()
results_rounded = results_df
results_rounded['yield'] = results_rounded['yield'] * 100
results_rounded = results_rounded.round({'yield': 0})
results_rounded['conversion'] = results_rounded['conversion'] * 100
results_rounded = results_rounded.round({'conversion': 0})
#print(results_rounded['yield'])

The yield and conversion values are plotted as a heatmap.

In [29]:
heatmap = alt.Chart(results_rounded, title = 'Yield (%)').mark_rect().encode(
            x=alt.X('B/L', sort=row_order, axis=alt.Axis(title=None)),
            y=alt.Y('protected_cyanohydrin', axis=alt.Axis(title=None)),
            color=alt.Color('yield', title="Yield (%)", scale=alt.Scale(domain=['0','100'], scheme='viridis', reverse=True))
            )
text = alt.Chart(results_rounded, title = 'Yield').mark_text().encode(
            x=alt.X('B/L', sort=row_order),
            y=alt.Y('protected_cyanohydrin'),
            text='yield'
            )

display(heatmap + text)
#display(heatmap)

In [30]:
heatmap = alt.Chart(results_rounded, title = 'Conversion').mark_rect().encode(
            x=alt.X('B/L', sort=row_order, axis=alt.Axis(title=None)),
            y=alt.Y('protected_cyanohydrin', axis=alt.Axis(title=None)),
            color=alt.Color('conversion', title="Conversion (%)", scale=alt.Scale(domain=['0','100'], scheme='viridis', reverse=True))
            )
text = alt.Chart(results_rounded, title = 'Conversion').mark_text().encode(
            x=alt.X('B/L', sort=row_order),
            y=alt.Y('protected_cyanohydrin'),
            text='conversion'
            )

display(heatmap)

The relative intensity of compound 'unknown_3' is plotted as a heatmap.

In [31]:
heatmap = alt.Chart(results_rounded, title = 'Unknown_3').mark_rect().encode(
            x=alt.X('B/L', sort=row_order, axis=alt.Axis(title=None)),
            y=alt.Y('protected_cyanohydrin', axis=alt.Axis(title=None)),
            color=alt.Color('unknown_3_rel_area', title="rel. integral", scale=alt.Scale(scheme='viridis', reverse=True))
            )
text = alt.Chart(results_rounded, title = 'Unknown_3').mark_text().encode(
            x=alt.X('B/L', sort=row_order),
            y=alt.Y('protected_cyanohydrin'),
            text='unknown_3_rel_area'
            )

display(heatmap)
