CPH 02/11/2022

# MOCCA data analysis on Knoevenagel case study

MOCCA is a tool for the analysis of *High-Performance Liquid Chromatography–Diode Array Detection* (HPLC–DAD) datasets which are recorded in the context of reaction (process) controls. It only uses HPLC–DAD raw data and some basic user input for the data analysis.

## Reaction and case study background

This case study was designed to verify that we get the same results when we are analyzing non-overlapping data manually compared to when we are using MOCCA. We perform four different Knoevenagel condensations in four HPLC vials directly in the autosampler of an Agilent LC system. For that, we use four different benzaldehydes (benzaldehyde, 4-chlorobenzaldehyde, 4-methoxybenzaldehyde, 4-dimethylaminobenzaldehyde) and malononitrile as substrates and piperidine as basic catalyst.

To highlight MOCCA's decomposition capabilities of impure peaks, we compare the obtained results of the single reactions with results we recorded for a reaction mixture of all four single reactions (1:1:1:1).

## Imports

In [1]:
# folders handling
import os
from glob import glob

# user interaction
from mocca.user_interaction.campaign import HplcDadCampaign
from mocca.user_interaction.user_objects import Gradient
from mocca.user_interaction.user_objects import Compound
from mocca.user_interaction.user_objects import InternalStandard
from mocca.user_interaction.user_objects import HplcInput
from mocca.user_interaction.settings import Settings

# reporting
from mocca.report.hplc_input import report_hplc_input
from mocca.report.chroms import report_chroms
from mocca.report.results import report_runs
from mocca.report.parafac import report_parafac
from mocca.report.peaks import report_peaks
from mocca.report.quali_comps import report_quali_comps
from mocca.report.quant_comps import report_quant_comps

# customized data analysis
import datetime
import time

%load_ext autoreload
%autoreload 2

## Knoevenagel data folder handling

The data corresponding to this notebook can be found in mocca -> notebooks -> knoevenagel_data. The data was recorded on an Agilent system on two different days. Therefore, we have two different gradient runs.

In [2]:
# get path of this notebook
ipynb_path = os.path.dirname(os.path.realpath("__file__"))

# add the path to the test data folder
knoev_data_path = os.path.join(ipynb_path, "mix_knoevenagel_data")

# find all folders containing Agilent HPLC data (.D file extension)
folders = glob(knoev_data_path + '/*' + '.D') 
folders = sorted(folders)

In [3]:
for folder in folders:
    print(os.path.basename(folder))

2022-01-26_17-13-23_gradient.D
2022-01-26_17-21-29_ba_1.D
2022-01-26_17-30-02_ba_0.75.D
2022-01-26_17-38-33_ba_0.5.D
2022-01-26_17-47-02_ba_0,25.D
2022-01-26_17-55-37_cl_1.D
2022-01-26_18-04-10_cl_0.75.D
2022-01-26_18-12-42_cl_0.5.D
2022-01-26_18-21-11_cl_0,25.D
2022-01-26_18-29-43_ome_1.D
2022-01-26_18-38-15_ome_0.75.D
2022-01-26_18-46-45_ome_0.5.D
2022-01-26_18-55-17_ome_0,25.D
2022-01-26_19-03-52_nme2_1.D
2022-01-26_19-12-24_nme2_0.75.D
2022-01-26_19-20-54_nme2_0.5.D
2022-01-26_19-29-25_nme2_0,25.D
2022-01-26_19-43-27_gradient.D
2022-01-26_19-48-52_ba_1.D
2022-01-26_19-54-51_ba_0.75.D
2022-01-26_20-00-51_ba_0.5.D
2022-01-26_20-06-51_ba_0,25.D
2022-01-26_20-12-55_cl_1.D
2022-01-26_20-18-55_cl_0.75.D
2022-01-26_20-24-53_cl_0.5.D
2022-01-26_20-30-51_cl_0,25.D
2022-01-26_20-36-52_ome_1.D
2022-01-26_20-43-08_ome_0.75.D
2022-01-26_20-49-05_ome_0.5.D
2022-01-26_20-55-06_ome_0,25.D
2022-01-26_21-01-10_nme2_1.D
2022-01-26_21-07-11_nme2_0.75.D
2022-01-26_21-13-09_nme2_0.5.D
2022-01-26_21-19-0

## Campaign initialization 5 min runs

In [4]:
knoev_campaign_5 = HplcDadCampaign()

### User input for calibration runs

We transfer the concentration values of the calibration standards.

In [5]:
ba_concs = [0.965, 0.761, 0.482, 0.231]
cl_concs = [1.01, 0.744, 0.478, 0.234]
ome_concs = [0.990, 0.723, 0.455, 0.231]
nme2_concs = [0.982, 0.719, 0.468, 0.229]

We create the first Gradient object for all calibration runs.

In [6]:
gradient_calib_5 = Gradient(folders[0])

We create HplcInput objects for each calibration run and add it to the campaign

In [7]:
for i, folder in enumerate(folders[1:5]):
    compound = Compound('benzaldehyde', ba_concs[i])
    exp = HplcInput(folder, gradient_calib_5, compound=compound)
    knoev_campaign_5.add_hplc_input(exp)

for i, folder in enumerate(folders[5:9]):
    compound = Compound('4-chlorobenzaldehyde', cl_concs[i])
    exp = HplcInput(folder, gradient_calib_5, compound=compound)
    knoev_campaign_5.add_hplc_input(exp)

for i, folder in enumerate(folders[9:13]):
    compound = Compound('4-methoxybenzaldehyde', ome_concs[i])
    exp = HplcInput(folder, gradient_calib_5, compound=compound)
    knoev_campaign_5.add_hplc_input(exp)

for i, folder in enumerate(folders[13:17]):
    compound = Compound('4-dimethylaminobenzaldehyde', nme2_concs[i])
    exp = HplcInput(folder, gradient_calib_5, compound=compound)
    knoev_campaign_5.add_hplc_input(exp)

### User input for 5 minute gradient mix reaction runs

First, we generate the Gradient object of the second day.

In [8]:
gradient_react_5 = Gradient(folders[-1])

Then, we generate HplcInput objects for all reaction runs and add them to the campaign.

In [9]:
react_folders_5 = folders[35:-2][::2]

In [10]:
react_folders_5

['/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_12-06-47_mix.D',
 '/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_12-46-39_mix.D',
 '/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_13-26-31_mix.D',
 '/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_14-06-24_mix.D',
 '/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_14-46-10_mix.D',
 '/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_15-26-02_mix.D',
 '/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_16-05-56_mix.D',
 '/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_16-45-44_mix.D']

In [11]:
for folder in react_folders_5:
    exp = HplcInput(folder, gradient_react_5)
    knoev_campaign_5.add_hplc_input(exp)

### Settings for data processing

In [12]:
settings_5 = Settings('chemstation',
                    absorbance_threshold = 500, wl_high_pass = 215, 
                    peaks_high_pass = 1, peaks_low_pass = 4.5,
                    spectrum_correl_thresh=0.99, relative_distance_thresh=0.01)

### Data processing

In [13]:
knoev_campaign_5.process_all_hplc_input(settings_5)

## Reporting

In [14]:
knoev_report_path_5 = os.path.join(ipynb_path, "mix_knoevenagel_reports", "5_min")

In [15]:
report_hplc_input(knoev_campaign_5.hplc_runs, knoev_report_path_5)
report_chroms(knoev_campaign_5.chroms, knoev_campaign_5.settings, knoev_report_path_5)
report_runs(knoev_campaign_5.chroms, knoev_campaign_5.quali_comp_db, knoev_campaign_5.quant_comp_db, knoev_report_path_5)
report_parafac(knoev_campaign_5.chroms, knoev_report_path_5)
report_peaks(knoev_campaign_5.peak_db, knoev_report_path_5)
report_quali_comps(knoev_campaign_5.quali_comp_db, knoev_report_path_5)
report_quant_comps(knoev_campaign_5.quant_comp_db, knoev_report_path_5)

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_reports/5_min/report_hplc_input.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_reports/5_min/report_chroms.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_reports/5_min/report_runs.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

How is your experience of Datapane? Please take two minutes to answer our anonymous product survey <a href='https://bit.ly/3lWjRlr' target='_blank'>here</a>

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_reports/5_min/report_peak_db.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_reports/5_min/report_quali_comp_db.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

How is your experience of Datapane? Please take two minutes to answer our anonymous product survey <a href='https://bit.ly/3lWjRlr' target='_blank'>here</a>

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_reports/5_min/report_quant_comp_db.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

## Campaign initialization 2 min runs

In [16]:
knoev_campaign_2 = HplcDadCampaign()

### User input for calibration runs

We transfer the concentration values of the calibration standards.

In [17]:
ba_concs = [0.965, 0.761, 0.482, 0.231]
cl_concs = [1.01, 0.744, 0.478, 0.234]
ome_concs = [0.990, 0.723, 0.455, 0.231]
nme2_concs = [0.982, 0.719, 0.468, 0.229]

We create the first Gradient object for all calibration runs.

In [18]:
gradient_calib_2 = Gradient(folders[17])

We create HplcInput objects for each calibration run and add it to the campaign

In [19]:
for i, folder in enumerate(folders[18:22]):
    compound = Compound('benzaldehyde', ba_concs[i])
    exp = HplcInput(folder, gradient_calib_2, compound=compound)
    knoev_campaign_2.add_hplc_input(exp)

for i, folder in enumerate(folders[22:26]):
    compound = Compound('4-chlorobenzaldehyde', cl_concs[i])
    exp = HplcInput(folder, gradient_calib_2, compound=compound)
    knoev_campaign_2.add_hplc_input(exp)

for i, folder in enumerate(folders[26:30]):
    compound = Compound('4-methoxybenzaldehyde', ome_concs[i])
    exp = HplcInput(folder, gradient_calib_2, compound=compound)
    knoev_campaign_2.add_hplc_input(exp)

for i, folder in enumerate(folders[30:34]):
    compound = Compound('4-dimethylaminobenzaldehyde', nme2_concs[i])
    exp = HplcInput(folder, gradient_calib_2, compound=compound)
    knoev_campaign_2.add_hplc_input(exp)

### User input for 5 minute gradient mix reaction runs

First, we generate the Gradient object of the second day.

In [20]:
gradient_react_2 = Gradient(folders[-2])

Then, we generate HplcInput objects for all reaction runs and add them to the campaign.

In [21]:
react_folders_2 = folders[34:-2][::2]

In [22]:
react_folders_2

['/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_12-00-35_mix.D',
 '/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_12-40-25_mix.D',
 '/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_13-20-16_mix.D',
 '/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_14-00-11_mix.D',
 '/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_14-39-58_mix.D',
 '/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_15-19-49_mix.D',
 '/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_15-59-43_mix.D',
 '/Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_data/2022-01-28_16-39-32_mix.D']

In [23]:
for folder in react_folders_2:
    exp = HplcInput(folder, gradient_react_2)
    knoev_campaign_2.add_hplc_input(exp)

### Settings for data processing

In [24]:
settings_2 = Settings('chemstation',
                      absorbance_threshold = 500, wl_high_pass = 215, 
                      peaks_high_pass = 1, peaks_low_pass = 3.2,
                      spectrum_correl_thresh=0.99, relative_distance_thresh=0.01)

### Data processing

In [25]:
knoev_campaign_2.process_all_hplc_input(settings_2)

## Reporting

In [26]:
knoev_report_path_2 = os.path.join(ipynb_path, "mix_knoevenagel_reports", "2_min")

In [27]:
report_hplc_input(knoev_campaign_2.hplc_runs, knoev_report_path_2)
report_chroms(knoev_campaign_2.chroms, knoev_campaign_2.settings, knoev_report_path_2)
report_runs(knoev_campaign_2.chroms, knoev_campaign_2.quali_comp_db, knoev_campaign_2.quant_comp_db, knoev_report_path_2)
report_parafac(knoev_campaign_2.chroms, knoev_report_path_2)
report_peaks(knoev_campaign_2.peak_db, knoev_report_path_2)
report_quali_comps(knoev_campaign_2.quali_comp_db, knoev_report_path_2)
report_quant_comps(knoev_campaign_2.quant_comp_db, knoev_report_path_2)

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_reports/2_min/report_hplc_input.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

How is your experience of Datapane? Please take two minutes to answer our anonymous product survey <a href='https://bit.ly/3lWjRlr' target='_blank'>here</a>

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_reports/2_min/report_chroms.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

How is your experience of Datapane? Please take two minutes to answer our anonymous product survey <a href='https://bit.ly/3lWjRlr' target='_blank'>here</a>

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_reports/2_min/report_runs.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

How is your experience of Datapane? Please take two minutes to answer our anonymous product survey <a href='https://bit.ly/3lWjRlr' target='_blank'>here</a>

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_reports/2_min/report_parafac.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_reports/2_min/report_peak_db.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_reports/2_min/report_quali_comp_db.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

Report saved to .//Users/haascp/Documents/GitHub/mocca/notebooks/mix_knoevenagel_reports/2_min/report_quant_comp_db.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

## Customized data analysis by the user

This data analysis tool cannot include all possible data analysis needs of potential users. Therefore, we expect the user to do customized data analysis independently from the tool.

We give an example here, how a reaction kinetics data analysis could look like. First, we have to determine reaction times. For that, we noted the times, when we added all components together in the HPLC vial and use the time stamps in the folder names to calculate the reaction times.

In [28]:
def timestamp(path):
    x = os.path.basename(path)
    date, times = x[:x.rfind('_')].split('_')
    date = list(map(int, date.split('-')))
    times = list(map(int, times.split('-')))
    return time.mktime(datetime.datetime(date[0], date[1], date[2], times[0], times[1], times[2]).timetuple())

In [29]:
# start times
mix_time = time.mktime(datetime.datetime(2022, 1, 28, 11, 33, 30).timetuple())

In [30]:
ba_5 = []  # list of time, conc tuples
cl_5 = []  # list of time, conc tuples
ome_5 = []  # list of time, conc tuples
nme2_5 = []  # list of time, conc tuples
ba_2 = []  # list of time, conc tuples
cl_2 = []  # list of time, conc tuples
ome_2 = []  # list of time, conc tuples
nme2_2 = []  # list of time, conc tuples

chroms_5 = [chrom for chrom in knoev_campaign_5.chroms if not chrom.experiment.compound]
chroms_2 = [chrom for chrom in knoev_campaign_2.chroms if not chrom.experiment.compound]
for chrom in chroms_5:
    t = timestamp(chrom.experiment.path) - mix_time
    if "benzaldehyde" in chrom:
        ba_5.append((t, chrom["benzaldehyde"].concentration))
    else:
        ba_5.append((t, 0))
    if "4-chlorobenzaldehyde" in chrom:
        cl_5.append((t, chrom["4-chlorobenzaldehyde"].concentration))
    else:
        cl_5.append((t, 0))
    if "4-methoxybenzaldehyde" in chrom:
        ome_5.append((t, chrom["4-methoxybenzaldehyde"].concentration))
    else:
        ome_5.append((t, 0))
    if "4-dimethylaminobenzaldehyde" in chrom:
        nme2_5.append((t, chrom["4-dimethylaminobenzaldehyde"].concentration))
    else:
        nme2_5.append((t, 0))

for chrom in chroms_2:
    t = timestamp(chrom.experiment.path) - mix_time
    if "benzaldehyde" in chrom:
        ba_2.append((t, chrom["benzaldehyde"].concentration))
    else:
        ba_2.append((t, 0))
    if "4-chlorobenzaldehyde" in chrom:
        cl_2.append((t, chrom["4-chlorobenzaldehyde"].concentration))
    else:
        cl_2.append((t, 0))
    if "4-methoxybenzaldehyde" in chrom:
        ome_2.append((t, chrom["4-methoxybenzaldehyde"].concentration))
    else:
        ome_2.append((t, 0))
    if "4-dimethylaminobenzaldehyde" in chrom:
        nme2_2.append((t, chrom["4-dimethylaminobenzaldehyde"].concentration))
    else:
        nme2_2.append((t, 0))


    

In [31]:
import altair as alt
import numpy as np
import pandas as pd
from mocca.visualization.utils import round_to_n
from sklearn.linear_model import LinearRegression

In [32]:
def plot_reaction_data(reaction_data, title=""):
    x, y = zip(*reaction_data)
    x = list(x)
    y = [1 / val for val in y]

    linear_model = LinearRegression()
    x_fit = np.array(x).reshape((-1,1))
    linear_model.fit(x_fit, y)
    score = linear_model.score(x_fit, y)

    slope = linear_model.coef_[0]
    y_intercept = linear_model.intercept_

    # visualization
    curve_annotation_formula = f"y = {round_to_n(slope, 6)} x + {round_to_n(y_intercept, 3)}"
    curve_annotation_accuracy = f"R\u00B2 = {round(score, 4)}"

    xlabel='Reaction time (min)'
    ylabel='Inverse Concentration (1/mM)'

    df_scatter = pd.DataFrame({
        'x': x,
        'y': y
    })

    df_line = pd.DataFrame({
        'x': x,
        'y': [item * slope + y_intercept for item in x]
    })

    scatter = alt.Chart(df_scatter, title=title).mark_circle(size=80, opacity=1).encode(
        x=alt.X(df_scatter.columns[0], axis=alt.Axis(title=xlabel)),
        y=alt.Y(df_scatter.columns[1], axis=alt.Axis(title=ylabel)),
        tooltip=[alt.Tooltip(df_scatter.columns[0], title=xlabel), 
                 alt.Tooltip(df_scatter.columns[1], title=ylabel)]
    ).interactive()

    chart = alt.Chart(df_line).mark_line(color='black').encode(
            x=alt.X(df_line.columns[0], scale = alt.Scale(domain = (0.9 * np.min(x), np.max(x) * 1.1))),
            y=alt.Y(df_line.columns[1], scale = alt.Scale(domain = (y[0] - 0.01 * x[0] * slope, 
                                                                    y[-1] + 0.01 * x[-1] * slope)))
        )

    if len(df_scatter[df_scatter.columns[0]]) > 1:
        annotation_x_loc = (min(df_scatter[df_scatter.columns[0]]) +
                            (max(df_scatter[df_scatter.columns[0]] -
                                 min(df_scatter[df_scatter.columns[0]]))
                             * 0.1))
        annotation_y_loc_1 = (min(df_scatter[df_scatter.columns[1]]) +
                              (max(df_scatter[df_scatter.columns[1]] -
                                   min(df_scatter[df_scatter.columns[1]]))
                               * 0.95))
        annotation_y_loc_2 = (min(df_scatter[df_scatter.columns[1]]) +
                              (max(df_scatter[df_scatter.columns[1]] -
                                   min(df_scatter[df_scatter.columns[1]]))
                               * 0.75))
    else:
        annotation_x_loc = df_scatter[df_scatter.columns[0]][0] * 0.98
        annotation_y_loc_1 = df_scatter[df_scatter.columns[1]][0] * 0.95
        annotation_y_loc_2 = df_scatter[df_scatter.columns[1]][0] * 0.9

    annotation_formula = alt.Chart({'values':[{'x': annotation_x_loc,
                                               'y': annotation_y_loc_1}]}).mark_text(
        text=curve_annotation_formula, align='left'
    ).encode(
        x='x:Q', y='y:Q'
    )

    annotation_accuracy = alt.Chart({'values':[{'x': annotation_x_loc,
                                               'y': annotation_y_loc_2}]}).mark_text(
        text=curve_annotation_accuracy, align='left'
    ).encode(
        x='x:Q', y='y:Q'
    )

    fig = chart + scatter + annotation_formula + annotation_accuracy
    fig = fig.configure_axis(
            grid=False,
            titleFontSize = 16,
            titleFontWeight='normal'
        ).configure_view(
            strokeWidth=0
        )
    return fig

In [33]:
plot_reaction_data(ba_5, "Kinetics benzaldehyde 5 minutes gradient")

In [34]:
plot_reaction_data(cl_5, "Kinetics 4-chlorobenzaldehyde 5 minutes gradient")

In [35]:
plot_reaction_data(ome_5, "Kinetics 4-Methoxybenzaldehyde 5 minutes gradient")

In [36]:
plot_reaction_data(nme2_5, "Kinetics 4-dimethylaminobenzaldehyde 5 minutes gradient")

In [37]:
plot_reaction_data(ba_2, "Kinetics benzaldehyde 2 minutes gradient")

In [38]:
plot_reaction_data(cl_2, "Kinetics 4-chlorobenzaldehyde 2 minutes gradient")

In [39]:
plot_reaction_data(ome_2, "Kinetics 4-Methoxybenzaldehyde 2 minutes gradient")

In [40]:
plot_reaction_data(nme2_2, "Kinetics 4-dimethylaminobenzaldehyde 2 minutes gradient")