CPH 02/04/2022

# MOCCA data analysis on cyanation wellplate screening

MOCCA is a tool for the analysis of *High-Performance Liquid Chromatography–Diode Array Detection* (HPLC–DAD) datasets which are recorded in the context of reaction (process) controls. It only uses HPLC–DAD raw data and some basic user input for the data analysis.

## Reaction and case study background

This case study investigates a screening of discrete reaction parameters on a well plate. The chosen reaction is a palladium-catalyzed cyanation of 2-chlorotoluene yielding o-tolunitrile. We investigate the reaction with seven different cyanide precursors (protected cyanohydrines), four different bases and three different ligands.

## Imports

In [1]:
# folders handling
import os
from glob import glob

# user interaction
from mocca.user_interaction.campaign import HplcDadCampaign
from mocca.user_interaction.user_objects import Gradient
from mocca.user_interaction.user_objects import Compound
from mocca.user_interaction.user_objects import InternalStandard
from mocca.user_interaction.user_objects import HplcInput
from mocca.user_interaction.settings import Settings

# customized data analysis
import datetime
import time

%load_ext autoreload
%autoreload 2

## Cyanation data folder handling

The data corresponding to this notebook can be found in mocca -> notebooks -> cyanation_data. The data was recorded on an Shimadzu system with an automatic export of the raw data to a .txt file.

In [2]:
# get path of this notebook
ipynb_path = os.path.dirname(os.path.realpath("__file__"))

# add the path to the test data folder
cyan_data_path = os.path.join(ipynb_path, "cyanation_data")

# find all folders containing Agilent HPLC data (.D file extension)
folders = glob(cyan_data_path + '/*' + '.txt') 
folders = sorted(folders, key=lambda x: int(x.split('_')[-1][:-4]))

In [3]:
for folder in folders:
    print(os.path.basename(folder))

09072021_sample_4.txt
09072021_sample_5.txt
09072021_sample_6.txt
09072021_sample_7.txt
09072021_sample_8.txt
09072021_sample_9.txt
09072021_sample_10.txt
09072021_sample_11.txt
09072021_sample_12.txt
09072021_sample_13.txt
09072021_sample_14.txt
09072021_sample_15.txt
09072021_sample_16.txt
09072021_sample_17.txt
09072021_sample_18.txt
09072021_sample_19.txt
09072021_sample_20.txt
09072021_sample_21.txt
09072021_sample_22.txt
09072021_sample_23.txt
09072021_sample_24.txt
09072021_sample_25.txt
09072021_sample_26.txt
09072021_sample_27.txt
09072021_sample_28.txt
09072021_sample_29.txt
09072021_sample_30.txt
09072021_sample_31.txt
09072021_sample_32.txt
09072021_sample_33.txt
09072021_sample_34.txt
09072021_sample_35.txt
09072021_sample_36.txt
09072021_sample_37.txt
09072021_sample_38.txt
09072021_sample_39.txt
09072021_sample_40.txt
09072021_sample_41.txt
09072021_sample_42.txt
09072021_sample_43.txt
09072021_sample_44.txt
09072021_sample_45.txt
09072021_sample_46.txt
09072021_sample_4

## Campaign initialization

In [4]:
cyan_campaign = HplcDadCampaign()

We create the first Gradient object for all runs.

In [5]:
gradient = Gradient(next(folder for folder in folders if "gradient" in folder))

This campaign works in all runs (except the tetralin compound run) with an InternalStandard object of tetralin.

In [6]:
internal_standard = InternalStandard('tetralin', 0.06094)

### User input for calibration runs

We transfer the concentration values of the calibration standards.

In [7]:
ArCl_concs = [0.0603, 0.04422]
ArCN_concs = [0.05955, 0.04367]

First, we create a HplcInput object for the internal standard run and add it to the campaign.

In [8]:
istd_run = HplcInput(next(folder for folder in folders if "istd" in folder),
                     gradient, compound=Compound('tetralin', is_istd=True))
cyan_campaign.add_hplc_input(istd_run)

We create HplcInput objects for each calibration run and add it to the campaign

In [9]:
ArCl_folders = [folder for folder in folders if "educt" in folder]
for i, folder in enumerate(ArCl_folders):
    compound = Compound('2-chlorotoluene', ArCl_concs[i])
    exp = HplcInput(folder, gradient, compound=compound, istd=internal_standard)
    cyan_campaign.add_hplc_input(exp)

ArCN_folders = [folder for folder in folders if "product" in folder]
for i, folder in enumerate(ArCN_folders):
    compound = Compound('o-tolunitrile', ArCN_concs[i])
    exp = HplcInput(folder, gradient, compound=compound, istd=internal_standard)
    cyan_campaign.add_hplc_input(exp)

cn_source_a_folder = next(folder for folder in folders if "cnsource_a" in folder)
compound = Compound('cn_source_a')
exp = HplcInput(cn_source_a_folder, gradient, compound=compound, istd=internal_standard)
cyan_campaign.add_hplc_input(exp)

cn_source_d_folder = next(folder for folder in folders if "cnsource_d" in folder)
compound = Compound('cn_source_d')
exp = HplcInput(cn_source_d_folder, gradient, compound=compound, istd=internal_standard)
cyan_campaign.add_hplc_input(exp)

### User input for reaction runs

In [10]:
for folder in [folder for folder in folders if "sample" in folder]:
    exp = HplcInput(folder, gradient, istd=internal_standard)
    cyan_campaign.add_hplc_input(exp)

### Settings for data processing

In [11]:
settings = Settings('labsolutions',
                    absorbance_threshold = 500, wl_high_pass = 215, 
                    peaks_high_pass = 1, peaks_low_pass = 5,
                    spectrum_correl_thresh=0.99, relative_distance_thresh=0.0025)

### Data processing

In [12]:
%%time
cyan_campaign.process_all_hplc_input(settings)

CPU times: user 1h 25min 46s, sys: 2min 2s, total: 1h 27min 49s
Wall time: 11min 20s


## Reporting

In [13]:
cyan_report_path = os.path.join(ipynb_path, "cyanation_reports")

In [20]:
cyan_campaign.generate_reports(cyan_report_path)

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/cyanation_reports/report_hplc_input.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/cyanation_reports/report_gradient.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/cyanation_reports/report_peak_db.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/cyanation_reports/report_chroms.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

How is your experience of Datapane? Please take two minutes to answer our anonymous product survey <a href='https://bit.ly/3lWjRlr' target='_blank'>here</a>

No chromatograms given or all chromatograms are good data!


Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/cyanation_reports/report_runs.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/cyanation_reports/report_parafac.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/cyanation_reports/report_quali_comp_db.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/cyanation_reports/report_quant_comp_db.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

## Customized data analysis by the user

This data analysis tool cannot include all possible data analysis needs of potential users. Therefore, we expect the user to do customized data analysis independently from the tool.

We give an example here, how the yield over a 96-wellplate can be visualized.

In [15]:
import pandas as pd
import altair as alt

Extract results out of the campaign object

In [16]:
c0 = 0.06 #c0 refers to the molar amount of substrate that was used
results = {
    '2-chlorotoluene': [],
    'o-tolunitrile': [],
    'tetralin': [],
    'column': [],
    'row': [],
    'path': [],
    'conversion': [],
    'yield': []
}
for i, chrom in enumerate(cyan_campaign.chroms):
    if not chrom.experiment.compound:
        results['2-chlorotoluene'].append(0)
        results['o-tolunitrile'].append(0)
        results['tetralin'].append(0)
        results['conversion'].append(1)
        results['yield'].append(0)
        results['path'].append(chrom.experiment.path)
        for peak in chrom.peaks:
            if peak.compound_id == '2-chlorotoluene':
                results['2-chlorotoluene'][-1] = peak.concentration
                results['conversion'][-1] = (c0 - peak.concentration) / c0
            if peak.compound_id == 'o-tolunitrile':
                results['o-tolunitrile'][-1] = peak.concentration
                results['yield'][-1] = peak.concentration / c0
            if peak.compound_id == 'tetralin':
                results['tetralin'][-1] = peak.integral

In [17]:
for path in results['path']:
    num = int(path.split('_')[-1][:-4]) - 4
    results['column'].append(str(num // 12 + 1))
    results['row'].append(str(num - (num // 12 * 12) + 1))

In [18]:
results_df = pd.DataFrame(results)
row_order = [str(val) for val in list(range(1, 13))]
results_df

Unnamed: 0,2-chlorotoluene,o-tolunitrile,tetralin,column,row,path,conversion,yield
0,0.031222,0.026152,72908.483002,1,1,/Users/haascp/Documents/GitHub/mocca/notebooks...,0.479633,0.435863
1,0.053005,0.003441,69100.487022,1,2,/Users/haascp/Documents/GitHub/mocca/notebooks...,0.116577,0.057354
2,0.056189,0.000000,66686.723268,1,3,/Users/haascp/Documents/GitHub/mocca/notebooks...,0.063519,0.000000
3,0.000000,0.040743,72699.519576,1,4,/Users/haascp/Documents/GitHub/mocca/notebooks...,1.000000,0.679048
4,0.000000,0.029075,66779.207989,1,5,/Users/haascp/Documents/GitHub/mocca/notebooks...,1.000000,0.484581
...,...,...,...,...,...,...,...,...
79,0.059307,0.000000,68115.138325,7,8,/Users/haascp/Documents/GitHub/mocca/notebooks...,0.011549,0.000000
80,0.059044,0.000000,77373.684722,7,9,/Users/haascp/Documents/GitHub/mocca/notebooks...,0.015931,0.000000
81,0.044454,0.000000,162296.594456,7,10,/Users/haascp/Documents/GitHub/mocca/notebooks...,0.259098,0.000000
82,0.052634,0.000000,64807.098114,7,11,/Users/haascp/Documents/GitHub/mocca/notebooks...,0.122758,0.000000


In [19]:
results_rounded = results_df
results_rounded['yield'] = results_rounded['yield'] * 100
results_rounded = results_rounded.round({'yield': 0})
#display(results_rounded['yield'])

heatmap = alt.Chart(results_rounded, title = 'Yield').mark_rect().encode(
            x=alt.X('row', sort=row_order),
            y=alt.Y('column'),
            color=alt.Color('yield', title="Yield")
            )
text = alt.Chart(results_rounded, title = 'Yield').mark_text().encode(
            x=alt.X('row', sort=row_order),
            y=alt.Y('column'),
            text='yield'
            )

display(heatmap + text)