# pH-rate Data AutoFitter

This workbook contains the code and imports the data from multiple files and fits the data to the model..

The data being pklotted is from 'On the Rearrangement in Dioxane/Water of (*Z*)-Arylhydrazones of 5-Amino-3-benzoyl-1,2,4-oxadiazole into (2-Aryl-5-phenyl-2*H*-1,2,3-triazol-4-yl)ureas: Substituent Effects on the Different Reaction Pathways." F. D'Anna, V. Frenna, G. Macaluso, S. Marullo, S. Morganti, V. Pace, D. Spinelli, R. Spisani, C. Tavani, *J. Org. Chem.*, **2006**, *71*, 5616-5624. https://doi.org/10.1021/jo0605849

The data is found in tables within the supplementary material at https://ndownloader.figstatic.com/files/4775281

This notebook will import the pH-rate profile data for each reactant and curve fir the profile to obtain kinetic parameters.  It will plot each profile and export each plot as a PDF file and it will collect the data and combine it with Hammett substituent constants from the database.

The output will be the pdf files and a csv file of the kinetic parameters.

## Setup Tools and Read Data Table

Here the data table is read in and processed. Also the libraries are imported and any functions defined

In [19]:
##############################################################
### Set up libraries and global variables
##############################################################

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.optimize import curve_fit

github_location = "https://raw.githubusercontent.com/blinkletter/4410PythonNotebooks/main/Class_23/data/"
#github_location = "./data/"
github_location_styles = "https://raw.githubusercontent.com/blinkletter/LFER-QSAR/main/styles/"
github_location_LFER_tables = "https://raw.githubusercontent.com/blinkletter/LFER-QSAR/main/data/"

result_file_name = "results.csv"

## Read Data Tables and make Plots

Here we curve fit each set of data and collect the results in a dataframe.

In [20]:
##############################################################
### Define a function that calculates the equation being
### used for the model that will be fit to the data.
##############################################################

def model(pH, Ka = 0.1, kOH=0.1, kH2O = 0.1, kH = 0.1):    
    """
    model(x, Ka, kOH, kH2O, kH)
    pH is an array of pH values
    returns an array of log(k_obs) values
    """
    KW = 10**(-14)
    H = 10**(-pH)
    k_obs = (kOH * (KW/H) + kH2O + kH * (H/Ka))*(Ka/(Ka + H))
    return(np.log10(k_obs))

##############################################################
### Declare list of data files to be used
##############################################################


data_files_info = (["1b-data.csv", "H"],
                   ["1c-data.csv", "p-OCH3"],
                   ["1d-data.csv", "m-CH3"],
                   ["1e-data.csv", "p-CH3"],
                   ["1f-data.csv", "m-Cl"],
                   ["1g-data.csv", "p-Cl"],
                   ["1h-data.csv", "m-Br"],
                   ["1i-data.csv", "p-Br"],
                   ["1j-data.csv", "p-CN"],
                   ["1k-data.csv", "m-NO2"],
                   ["1l-data.csv", "p-NO2"] 
                   )


##############################################################
### Create empty lists to collect data from line fits
##############################################################

n=0
substituent_list = []
molecule_list = []
file_list = []
Ka_list = []
kOH_list = []
kH2O_list = []
kH_list = []
Ka_sd_list = []
kOH_sd_list = []
kH2O_sd_list = []
kH_sd_list = []

##############################################################
### Perform a curve fit for each data file, collect results in 
### lists, plot each data set and export pdf files
##############################################################

for line in data_files_info:
    datafile_name = line[0]
    substituent = line[1]
    molecule_code = datafile_name[0:2]

    df = pd.read_csv(github_location + datafile_name, 
                 delimiter = ",", 
                 skipinitialspace=True, 
                 index_col="pS+", 
                 comment = "#") 

    #########################################################
    ### Determine scale from column header name           ###
    #########################################################
    
    if df.columns[0] == "k(A,R)x10^5":
        name = "k(A,R)x10^5"
        colname = df.columns[0]
        factor = 10**-5
    elif df.columns[0] == "k(A,R)x10^4":
        name = "k(A,R)x10^4"
        colname = df.columns[0]
        factor = 10**-4
    else:
        name = "ERROR"           # This will still crash everything in the next part of the program but at least you will know why.
        colname = df.columns[0]
        factor = 0

    
    ###############################
    ### Calculations            ###
    ###############################


    df["log_k"] = np.log10(df[name] * factor)

    ###############################
    ### Curve Fit to model      ###
    ###############################

    x = df.index
    y = df["log_k"]

    lower_bounds = [0.01, 0.0, 0.0, 0.0]            # lower bounds for Ka, kOH, kH2O, kH
    upper_bounds = [2,10000,10000,10000]           # upper bounds for Ka, kOH, kH2O, kH
    bounds_list = (lower_bounds, upper_bounds)


    parameters, pcov = curve_fit(model, x, y, bounds = bounds_list)  # Curve fit the model to the x,y data using bounding limits


    [Ka, kOH, kH2O, kH] = parameters
    perr = np.sqrt(np.diag(pcov))

    #######################################################
    ### Print out Parameters and standard deviations    ###
    #######################################################
    if False:
        print(f"Compound {molecule_code}")
        print(f"Ka = {Ka:0.2G} pm {perr[0]:0.2G}")
        print(f"kOH = {kOH:0.3G} pm {perr[1]:0.3G}")
        print(f"kH2O = {kH2O:0.3G} pm {perr[2]:0.3G}")
        print(f"kH = {kH:0.3G} pm {perr[3]:0.3G}")
        print(f"pKa = {-np.log10(Ka):0.2f} \n")

    substituent_list.append(substituent)
    file_list.append(datafile_name)
    molecule_list.append(molecule_code)
    Ka_list.append(Ka)
    kOH_list.append(kOH)
    kH2O_list.append(kH2O)
    kH_list.append(kH)
    Ka_sd_list.append(perr[0])
    kOH_sd_list.append(perr[1])
    kH2O_sd_list.append(perr[2])
    kH_sd_list.append(perr[3])


    step = 0.1
    #x1 = np.arange(np.min(x), np.max(x)+step, step)
    x1 = np.arange(-2, 14 + step, step)   # make an array of points to calculate y-values from
    y1 = model(x1, Ka, kOH, kH2O, kH)     # Calculate those y-values using the model

    style_file = "tufte.mplstyle"
    plt.style.use(github_location_styles + style_file)        
    
    fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(5,4))  
    ax.margins(x=.07, y=.07, tight=True)      # add 7% empty space around outside of plot area   
    
    ############################
    ### Set labels and scale ###
    ############################
    
    ax.set(
    #    title="pH rate profilr",       
              ylabel=r"$\log{k_{obs}}$", 
              xlabel=r"$pS^+$",                
              xlim=[-2,14],                  
              ylim=[-7,0]
             )
    
    #########################################
    ### Plot the data                     ###
    #########################################
    
    
    plt.vlines([1,3.8,11.5],-6,-1, colors = "lightgray", linewidth = 0.5)
    
    ax.scatter(x,y, s=64, color="white",  edgecolors = "none", zorder=2)
    ax.scatter(x,y, s=32, color="black",  edgecolors = "none", zorder=2)
    ax.scatter(x,y, s=16, color="white",  edgecolors = "none", alpha = 1.0, zorder=2)
    
    
    #########################################
    ### Plot the line fit.                ###
    #########################################
    
    ax.plot(x1, y1, color='black', zorder=0, linewidth=0.7)
    plt.text(4, -3, molecule_code+"  "+ substituent) 


    fig.savefig("plot"+ molecule_code +".pdf")   # use this to save the figure in PDF format
#    plt.show()                 # output the combined plot plots
    plt.close()



    

##############################################

df1 = pd.DataFrame(data = {'Substituent':substituent_list,
                           'file':file_list, 
                           'molecule':molecule_list, 
                           'Ka':Ka_list, 
                           'kOH':kOH_list, 
                           'kH2O':kH2O_list, 
                           'kH':kH_list, 
                           'Ka_sd':Ka_sd_list, 
                           'kOH_sd':kOH_sd_list, 
                           'kH2O_sd':kH2O_sd_list, 
                           'kH_sd':kH_sd_list})
display(df1)



Unnamed: 0,Substituent,file,molecule,Ka,kOH,kH2O,kH,Ka_sd,kOH_sd,kH2O_sd,kH_sd
0,H,1b-data.csv,1b,0.559668,1.310814,6.619635e-06,0.001909,0.100101,0.047005,3.460165e-07,0.00024
1,p-OCH3,1c-data.csv,1c,0.468178,2.176537,1.244626e-05,0.002334,0.077796,0.063239,4.12148e-07,0.000304
2,m-CH3,1d-data.csv,1d,0.591199,1.367986,8.77574e-06,0.002069,0.090153,0.039806,3.291872e-07,0.000252
3,p-CH3,1e-data.csv,1e,0.452471,1.563831,1.197328e-05,0.001968,0.055863,0.039574,3.418705e-07,0.000182
4,m-Cl,1f-data.csv,1f,1.023844,10.732163,2.539285e-06,0.000777,0.297195,0.401754,1.372109e-07,0.000187
5,p-Cl,1g-data.csv,1g,0.807553,6.305972,3.822435e-06,0.000963,0.231589,0.281435,2.267198e-07,0.000228
6,m-Br,1h-data.csv,1h,0.959064,11.644248,2.417489e-06,0.000689,0.260561,0.437624,1.245552e-07,0.000152
7,p-Br,1i-data.csv,1i,0.973985,7.725533,3.286769e-06,0.000961,0.336623,0.352477,2.02091e-07,0.000276
8,p-CN,1j-data.csv,1j,1.100969,143.937412,8.246206e-07,0.000222,0.368725,5.779481,6.531463e-08,6.2e-05
9,m-NO2,1k-data.csv,1k,1.356778,75.301965,9.372255e-07,0.000299,0.625847,3.557652,7.173111e-08,0.00012


## Combine with Hammett Data and Write Result Table

read in the data set created above and Hammett parameters.  Now combine with the data from above then edit and export the dataframe as a csv file.

In [21]:
#################################################################
### a function to fill in sigma for empty spaces in s+ and s- 
#################################################################
def fill_sigma(df):     
    for z in df.index:
        if np.isnan(df.loc[z,"s_plus"]):
            df.loc[z,"s_plus"] = df.loc[z,"sigma"]
        if np.isnan(df["s_minus"][z]):
            df.loc[z,"s_minus"] = df.loc[z,"sigma"]
    return(df)

################################################################################
### Read Hammett data set. The fields are separated by commas; comments are enabled  
################################################################################

LFER_Data = "LFER_HanschLeoTaft.csv"   # Choose which set of Hammett parameters you prefer
#LFER_Data = "LFER_Williams.csv"

Filename = github_location_LFER_tables + LFER_Data

df2 = pd.read_csv(Filename, 
                 delimiter = ",", 
                 skipinitialspace=True, 
                 index_col="Substituent", 
                 comment = "#") 

########################################################
### Fill across sigma values and select substituents 
########################################################

df2 = fill_sigma(df2)

###############################
### Remove unneeded columns 
###############################
 
df2.drop(labels = ["TABLE V", "TABLE I"],    #Trim "LFER_HanschLeoTaft.csv" data
#df2.drop(labels = ["Page"],                   #Trim "LFER_Williams.csv"" data
        axis = 1,
        inplace = True)


########################################################
### Combine data sets
########################################################

df3 = df1.set_index('Substituent')


#display(df1)
#display(df2)

result = pd.concat([df2, df3], axis=1, join="inner")

### Trim unneeded data
#result.drop(labels = ["Unnamed: 0", "file"],axis = 1, inplace=True) 

result.sort_values(by=['sigma'], inplace=True)

display(result)

result.to_csv(result_file_name, float_format ="%.4G")


Unnamed: 0_level_0,sigma,s_plus,s_minus,file,molecule,Ka,kOH,kH2O,kH,Ka_sd,kOH_sd,kH2O_sd,kH_sd
Substituent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
p-OCH3,-0.27,-0.78,-0.26,1c-data.csv,1c,0.468178,2.176537,1.244626e-05,0.002334,0.077796,0.063239,4.12148e-07,0.000304
p-CH3,-0.17,-0.31,-0.17,1e-data.csv,1e,0.452471,1.563831,1.197328e-05,0.001968,0.055863,0.039574,3.418705e-07,0.000182
m-CH3,-0.07,-0.07,-0.07,1d-data.csv,1d,0.591199,1.367986,8.77574e-06,0.002069,0.090153,0.039806,3.291872e-07,0.000252
H,0.0,0.0,0.0,1b-data.csv,1b,0.559668,1.310814,6.619635e-06,0.001909,0.100101,0.047005,3.460165e-07,0.00024
p-Br,0.23,0.15,0.25,1i-data.csv,1i,0.973985,7.725533,3.286769e-06,0.000961,0.336623,0.352477,2.02091e-07,0.000276
p-Cl,0.23,0.11,0.19,1g-data.csv,1g,0.807553,6.305972,3.822435e-06,0.000963,0.231589,0.281435,2.267198e-07,0.000228
m-Cl,0.37,0.37,0.37,1f-data.csv,1f,1.023844,10.732163,2.539285e-06,0.000777,0.297195,0.401754,1.372109e-07,0.000187
m-Br,0.39,0.39,0.39,1h-data.csv,1h,0.959064,11.644248,2.417489e-06,0.000689,0.260561,0.437624,1.245552e-07,0.000152
p-CN,0.66,0.66,1.0,1j-data.csv,1j,1.100969,143.937412,8.246206e-07,0.000222,0.368725,5.779481,6.531463e-08,6.2e-05
m-NO2,0.71,0.71,0.71,1k-data.csv,1k,1.356778,75.301965,9.372255e-07,0.000299,0.625847,3.557652,7.173111e-08,0.00012
