Dr Oliviero Andreussi, olivieroandreuss@boisestate.edu

Boise State University, Department of Chemistry and Biochemistry

# Fitting and Data Analysis for the UV-Vis Particle in a Box Experiment {-}

Before we start, let us import the main modules that we will need for this lecture. You may see some new modules in the list below, we will add more details in the right sections.

In [None]:
# @title Notebook Setup { display-mode: "form" }
# Import the main modules used in this worksheet
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
# Load the google drive with your files 
from google.colab import drive
drive.mount('/content/drive')
# The following needs to be the path of the folder with all your datafile in .csv format
base_path = '/content/drive/MyDrive/'

Set the local path, even though we will not need to read files

In [None]:
# @title Set Local Path { display-mode: "form" }
# The following needs to be the path of the folder with all your collected data in .csv format
local_path="Colab Notebooks/ParticleBox_Data/" # @param {type:"string"}
path = base_path+local_path

## Visualize the Systems {-}

The following module needs to be installed on Colab. We won't need it too much for this analysis, but they offer a lot of nice features for chemistry programming. 

In [None]:
# @title Install and load RDKit { display-mode: "form" }
!pip install rdkit
from rdkit import Chem
from rdkit.Chem import Draw
!pip install cirpy
import cirpy

In particular we can use them to draw the molecules in our experiments. Here are the CAS numbers (note that for one of the molecules CirPy is not able to find the SMILES and you will need to pass it directly):
* 977-96-8
* 605-91-4
* 4727-49-5
* 14187-31-6
* 4727-50-8
* 18300-31-7 ('[I-].CCN1C=CC(=CC=CC=CC2=CC=[N+](C3=CC=CC=C23)CC)C4=CC=CC=C41')
* 2197-01-5
* 905-97-5
* 514-73-8
* 3071-70-3
* 905-96-4
* 14806-50-9

In [None]:
# @title Choose the molecule to draw { display-mode: "form" }
input = '14806-50-9' # @param {type:"string"}
input_type = 'cas' # @param ["smiles", "name", "cas"] {allow-input: true}
if input_type != 'smiles' :
    smiles=cirpy.resolve( input, 'smiles')
else:
    smiles=input
img = Draw.MolToImage( Chem.MolFromSmiles(smiles), size=(300, 300) )
display(img)

Also, if you want to konw the name of one of the molecules, you can use CirPy as follows:

In [None]:
input = '905-96-4'
cirpy.resolve( input, 'names')

## Fit the Absorption Wavelengths

Let's start by defining a dictionary for eachsome of the molecules in our experiments. NOTE: you will need to put your own values of absorption wavelengths, `lambda_exp`.

Apart from this quantity, these dictionaries will contain some basic information about your molecule (`hasO` and `hasS` refer to the presence of oxygen or sulfur atoms), but you can (should) add any additional characteristic of the molecule that you think may be involved with the absorption wavelength. If you add more features, make sure to add them to each dye and pay attention to avoid typos.

In [None]:
dye0 = {'cas' : '977-96-8', 'p' : 3, 'hasO' : 0, 'hasS' : 0, 'lambda_exp' : 520}
dye1 = {'cas' : '605-91-4', 'p' : 5, 'hasO' : 0, 'hasS' : 0, 'lambda_exp' : 600}
dye2 = {'cas' : '4727-49-5', 'p' : 7, 'hasO' : 0, 'hasS' : 0, 'lambda_exp' : 590}

As seen in previous notebooks, we can convert a list of dictionaries into a `Pandas.DataFrame`, which will be convenient for math operations and for producing tables.

In [None]:
dyes = [dye0, dye1, dye2]
dyes_data = pd.DataFrame(dyes)

In [None]:
print(dyes_data.to_markdown())

## Python Functions {-}

Here we introduce the Python way to define functions. We will use the following function to compute the relation between the absorption wavelength and the conjugation length of the dyes. 

In [None]:
def lambda_FE(p,alpha=0.):
    """
    Function to compute the absorption wavelenght (in nm)
    as a function of the number of carbon atoms in the conjugated chain
    and an optional parameter alpha that accounts for some wiggle room
    due to the aromatic rings
    """
    return 63.7*(p+3+alpha)**2/(p+4)

Given the function above, we can compute a new column of the dataframe with the predicted lambdas in one command

In [None]:
dyes_data['lambda_FE']=lambda_FE(dyes_data['p'])
dyes_data

We can visualize the results as usual

In [None]:
plt.scatter(dyes_data['p'],dyes_data['lambda_exp'])
plt.plot(dyes_data['p'],dyes_data['lambda_FE'],color='red')
plt.show()

Given our experimental and predicted values of lambda, we can compute the sum of squares as follows:

In [None]:
TSS=np.sum((dyes_data['lambda_exp']-dyes_data['lambda_exp'].mean())**2)
print("The total sum of squares is {:10.4f} ".format(TSS))

In [None]:
RSS=np.sum((dyes_data['lambda_exp']-dyes_data['lambda_FE'])**2)
print("The residual sum of squares is {:10.4f} ".format(RSS))

In [None]:
R2=1-(RSS/TSS)
print("The coefficient of determination R2 for the particle in a box model is {:5.4f} ".format(R2))

## Huckel Model for Conjugated Molecules {-}

While for the Full Report you will only worry about modeling your experiments with a Free Electron (Particle in a Box) kind of model, here we will try to understand how we can use a slightly more advanced quantum mechanical model to predict absorption wavelengths. The Huckel model is a semi-empirical quantum mechanical model that has been developed to describe conjugted systems. While the model relies on some adjustable parameters and involves several possibly strong assumptions, it allows to reproduce some key properties of conjugated and aromatic molecules.

You can learn more on the theory and assumptions of the model in the lectures and in the following online resources: 
* MIT Physical Chemistry on [The Huckel Molecular Orbital Theory](https://dspace.mit.edu/bitstream/handle/1721.1/120336/5-61-fall-2013/contents/lecture-notes/MIT5_61F13_Lecture27-28.pdf)
* P-Chem Lab from Duke University [The Huckel Approximation](https://chem.libretexts.org/Courses/Duke_University/CHEM310L_-_Physical_Chemistry_I_Lab_Manual/04%3A_Absorption_Spectrum_of_Conjugated_Dyes/4.07%3A_Appendix_B_-_The_Huckel_Approximation)
* Columbia Notes on [The Huckel Approximation](http://www.columbia.edu/itc/chemistry/chem-c2407_archive/recitations/huckel.pdf)


The key component of the Huckel model is the fact that the molecular orbitals are univoquely determined by the topology of the conjugated network. The model assumes that we only consider the $p_z$ orbitals of each conjugated atom and we build molecular orbitals from them. As a further approximation, only atomic orbitals that are in connected atoms are allowed to 'interact'. Eventually, we assume that all atoms and bonds are equivalent, so that the relative components of the molecular Hamiltonian are identical. The Hamiltonian of the system can thus be represented as a matrix, where diagonal cells $H_{ii}$ have identical values, while off-diagonal elements $H_{ij}$ are different from zero only if there is a bond between atom $i$ and atom $j$. For carbon based molecules, the model only relies on two parameters $\alpha$ and $\beta$. However, the difference in energy between the electronic states only depends on the latter parameter, which is usually estimated to be $\beta\approx-70.4\; kcal/mol$.

In [None]:
n_conjugated = 2
alpha = -2
beta = -1
topology = np.zeros((n_conjugated,n_conjugated))
diagonal = np.ones(n_conjugated)*alpha
offdiagonal = np.ones(n_conjugated-1)*beta
topology = topology + np.diag(diagonal,0) + np.diag(offdiagonal,1) + np.diag(offdiagonal,-1)
print(topology)

Molecular orbitals are given by the linear combinations of atomic orbitals that make the Hamiltonian diagonal and minimize the energy of the system. Diagonalization of the Hamiltonian is a linear algebra problem (eigenvalue problem). We can use numpy to compute eigenvalues (a.k.a. orbital energies) and eigenvectors (a.k.a. the values of the coefficients that enter the definition of the molecular orbitals).

In [None]:
eigenvalues,eigenvectors=np.linalg.eig(topology)
print(eigenvalues)

For the more complicated case of butane we can build the Huckel Hamiltonian as follows:

In [None]:
n_conjugated = 4
alpha = -2
beta = -1
topology = np.zeros((n_conjugated,n_conjugated))
diagonal = np.ones(n_conjugated)*alpha
offdiagonal = np.ones(n_conjugated-1)*beta
topology = topology + np.diag(diagonal,0) + np.diag(offdiagonal,1) + np.diag(offdiagonal,-1)
print(topology)

which corresponds to the following orbital energies and coefficients

In [None]:
eigenvalues,eigenvectors=np.linalg.eig(topology)
print(eigenvalues)

You can note that the energies above are all different from the starting energy of the $p_z$ orbitals, which was arbitrary set to $-200\;kcal/mol$. For this kind of Hamiltonians, the eigenvalues can be expressed with an analytical formula $E_j=\alpha+2\beta\cos\left(\frac{\pi}{N+1}J\right)$, with $J=1,2,\dots ,N$. For longer conjugated chains, assuming that we have one electron for each atomic orbital (so number of electrons is $N$ and only the first $N/2$ orbitals are filled), we can estimate the energy difference associated with the absorption maximum as $\Delta E=-4\beta\sin\left(\frac{\pi}{2}\frac{1}{N+1}\right)$. When accounting for the different units, this energy corresponds to a wavelenght of $\lambda^{HMO}=-\frac{28585\;(kcal/mol)}{2\pi\beta\;(kcal/mol)}N$. As for the free electron model, the Huckel Molecular Orbital approach predicts an absorption wavelength that is linear in the number of conjugated centers. The tunable parameter in this case is the $\beta$ constant. 

However, the Huckel model can be automatically extended to account for the aromatic rings and it can, with additional parameters, be extended to include heteroatoms. For example, it is worth looking at the results of the Huckel model for a benzene molecule.

In [None]:
n_conjugated = 6
alpha = -2.
beta = -1.
topology = np.zeros((n_conjugated,n_conjugated))
diagonal = np.ones(n_conjugated)*alpha
offdiagonal = np.ones(n_conjugated-1)*beta
topology[0,n_conjugated-1]=beta
topology[n_conjugated-1,0]=beta
topology = topology + np.diag(diagonal,0) + np.diag(offdiagonal,1) + np.diag(offdiagonal,-1)
print(topology)

Note that in the following we re-order the results of the eigenvalue problem so that we can see the states in order of their energies (from the largest less stable to the smallest more stable)

In [None]:
eigenvalues,eigenvectors=np.linalg.eig(topology)
idx = eigenvalues.argsort()[::-1]   
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:,idx]

We can use matplotlib to visualize the coefficients in terms of their signs and magnitudes as follows.

In [None]:
from matplotlib.colors import BoundaryNorm
bounds = np.arange(np.min(eigenvectors),np.max(eigenvectors),.05)
cmap = plt.get_cmap('seismic')
idx=np.searchsorted(bounds,0)
bounds=np.insert(bounds,idx,0)
norm = BoundaryNorm(bounds, cmap.N)
plt.imshow(eigenvectors,interpolation='none',norm=norm,cmap=cmap)
plt.colorbar()
plt.show()


If you have time, you could try to build a topology matrix for a few of the molecules involved in the experiments and compare the HOMO-LUMO energy difference with the formula reported above for a linear conjugated chain.