# Taft Plot Investigations

This workbook contains the code and imports the data to explore how Taft used acid and base-catalyzed ester hydrolysis to develope the taft steric and electronic parameters.

## Plot $pK_a$ vs $\sigma^*$

Plot $pK_a$ vs $\sigma^*$ to confirm that Taft $\sigma^*$ values track with Hammett $\sigma$ values. The slope should be near one if the substituent parameters were scaled propertly (recall that taft used the Hammett $\rho$ value of 2.3 when setting his $\sigma^*$ values).

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

github_location = "https://raw.githubusercontent.com/blinkletter/4410PythonNotebooks/main/Class_17/data/"
github_location_styles = "https://raw.githubusercontent.com/blinkletter/LFER-QSAR/main/styles/"
github_location_LFER_tables = "https://raw.githubusercontent.com/blinkletter/LFER-QSAR/main/data/"

sigmatype = "s_plus"    # change to "sigma", "s_plus", or "s_minus"


###################################################################
### a function to fill in sigma for empty spaces in s+ and s-   ###
###################################################################

def fill_sigma(df):     
    for z in df.index:
        if np.isnan(df.loc[z,"s_plus"]):
            df.loc[z,"s_plus"] = df.loc[z,"sigma"]
        if np.isnan(df["s_minus"][z]):
            df.loc[z,"s_minus"] = df.loc[z,"sigma"]
    return(df)


################################################################################
### Read data set. The fields are separated by commas; comments are enabled  ###
################################################################################

#LFER_file = "LFER_HanschLeoTaft.csv"
LFER_file = "LFER_Williams.csv"

data_set = pd.read_csv(github_location_LFER_tables + LFER_file,
                 delimiter = ",", 
                 skipinitialspace=True, 
                 index_col="Substituent",   
                 comment = "#") 


########################################################
### Fill across sigma values and select substituents ###
########################################################

data_set=fill_sigma(data_set)
print(data_set)
### Remove unneeded columns
if LFER_file == "LFER_HanschLeoTaft.csv":
    data_set.drop(labels = ["TABLE V", "TABLE I"],      #Trim "LFER_HanschLeoTaft.csv" data
    axis = 1,
    inplace = True)
elif LFER_file == "LFER_Williams.csv":
    data_set.drop(labels = ["Page"],                   #Trim "LFER_Williams.csv"" data
    axis = 1,
    inplace = True)
else:
    print("ERROR: No filename")


################################################################################
### Read table 1 data. The fields are separated by commas; comments are enabled  ###
################################################################################

data_file = "17-Table_4.csv"

table_data_df = pd.read_csv(github_location + data_file,
                 delimiter = ",", 
                 skipinitialspace=True, 
#                 index_col="Substituent",    # Cant use Substituent as index - duplicate entries in data series
                 comment = "#") 

### Join the two dataframes according to the index column (Substituent)
#df = pd.concat([table_data_df, data_set], axis=1, join="inner")

#df.sort_values(by=[sigmatype], inplace=True)    # sort according to sigma so we can pick the left-most and right-most points more easily

#df["logk"] = np.log10(df["k"])


print(table_data_df)

             sigma  s_plus  s_minus Page
Substituent                             
m-Br          0.39    0.39     0.39  259
p-Br          0.23    0.15     0.25  259
m-C6H5        0.06    0.06     0.06  278
p-C6H5       -0.01    0.02    -0.18  278
m-CCCH3       0.10    0.10     0.10  265
...            ...     ...      ...  ...
o-CH3         0.29    0.29     0.29   46
o-Cl          1.26    1.26     1.26   46
o-NO2         2.03    2.03     2.03   46
o-Br          1.35    1.35     1.35   46
o-I           1.34    1.34     1.34   46

[67 rows x 4 columns]
    Formula  pKa_COOH  pKa_CH2OH Substituent  Number_of_substituents
0       CH3      4.76      16.00           H                     1.0
1     CH2Cl      2.86      14.31        m-Cl                     1.0
2     CH2Br      2.86        NaN        m-Br                     1.0
3     CHCl2      1.29      12.89        m-Cl                     2.0
4      CCl3      0.65      12.24        m-Cl                     3.0
5      CBr3      0.66        N

In [10]:
##########################################################################
### A little program to copy in the sigma values for each substituent. ###
### Perhaps someday I will make it a proper function for general use.  ###
##########################################################################

# We are doing this because the substituent column contains double entries for many substituents and cannot be used as an index column.

table_data_df["sigma"] = np.NaN     # Create new columns in the dataframe to reference
table_data_df["s_plus"] = np.NaN
table_data_df["s_minus"] = np.NaN

for x in table_data_df.index:
    substituent_name = table_data_df["Substituent"][x]   # Get the name of the substituent in row x
    print(substituent_name)
    sigma_series = data_set.loc[substituent_name]    # get the set of sigma values from the Hammett data
    table_data_df["sigma"][x] = sigma_series["sigma"]    # insert the Hammett parameters 
    table_data_df["s_plus"][x] = sigma_series["s_plus"]
    table_data_df["s_minus"][x] = sigma_series["s_minus"]
    
#########################################################################
### Perform calculations on columns and add new columns with results  ###
#########################################################################

display(table_data_df)

H
m-Cl
m-Br
m-Cl
m-Cl
m-Br
m-F
m-Cl
m-C6H5
m-CN
m-OCH3
nan


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  table_data_df["sigma"][x] = sigma_series["sigma"]    # insert the Hammett parameters
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  table_data_df["s_plus"][x] = sigma_series["s_plus"]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  table_data_df["s_minus"][x] = sigma_series["s_minus"]


KeyError: nan

In [5]:
table_data_df["Substituent"][x]

'H'

In [8]:
data_set.loc[0]

Substituent    m-Br
sigma          0.39
s_plus         0.39
s_minus        0.39
Name: 0, dtype: object

In [11]:
table_data_df.dropna()

Unnamed: 0,Formula,pKa_COOH,pKa_CH2OH,Substituent,Number_of_substituents,sigma,s_plus,s_minus
0,CH3,4.76,16.0,H,1.0,0.0,0.0,0.0
1,CH2Cl,2.86,14.31,m-Cl,1.0,0.37,0.37,0.37
3,CHCl2,1.29,12.89,m-Cl,2.0,0.37,0.37,0.37
4,CCl3,0.65,12.24,m-Cl,3.0,0.37,0.37,0.37
6,CF3,0.0,12.37,m-F,3.0,0.34,0.34,0.34
8,C6H5,4.31,15.4,m-C6H5,1.0,0.06,0.06,0.06
