## Computational Chemistry for Experimentalists
## Module 4: Conformational Analysis

Most experiments treat ensembles of Avogadro's number of molecules, rather than single structures of single molecules. Modeling conformational ensembles can be critical for modeling real experiments. This module discusses some ideas and pitfalls for conformational analysis. 

This first block imports all of the necessary Python modules. If these aren't installed, this will fail. Ignore the warnings from the "under testing" PySCF properties modules. 

In [None]:
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
from pyscf import gto,scf
from pyscf.tools import cubegen 
import py3Dmol
import numpy 
import matplotlib.pyplot as plt
from pyscf.geomopt.geometric_solver import optimize
from pyscf.prop import nmr 

### Example 1: NMR shielding

Compute NMR shieldings of acetaldehyde, at the MMFF optimized geometry 

In [None]:
m=Chem.MolFromSmiles('CC=O')
m2=Chem.AddHs(m)
img=Draw.MolToImage(m2)
img

In [None]:
AllChem.EmbedMolecule(m2)
AllChem.MMFFOptimizeMolecule(m2)
elementsCO = [atom.GetSymbol() for atom in m2.GetAtoms()]
coordinates = m2.GetConformer().GetPositions()
atoms = [(element, coordinate) for element, coordinate in zip(elementsCO, coordinates)]

m3 = gto.Mole(basis="3-21g")
m3.atom = atoms
m3.build();
mf=scf.RHF(m3)
mf.kernel()
mn=nmr.RHF(mf)
BsCO=mn.kernel()
print('Atom types',elementsCO)
print(BsCO.shape)

In [None]:
print('Atom types',elementsCO)
for i in range(len(elementsCO)):
    print(numpy.trace(BsCO[i]))

Compute shieldings of tetra-methyl silane TMS as the reference for proton chemical shifts. 

In [None]:
m5=Chem.MolFromSmiles('C[Si](C)(C)C')
m6=Chem.AddHs(m5)
img=Draw.MolToImage(m6)
img

In [None]:
AllChem.EmbedMolecule(m6)
AllChem.MMFFOptimizeMolecule(m6)
elementsTMS = [atom.GetSymbol() for atom in m6.GetAtoms()]
coordinates = m6.GetConformer().GetPositions()
atoms = [(element, coordinate) for element, coordinate in zip(elementsTMS, coordinates)]
m3 = gto.Mole(basis="3-21g")
m3.atom = atoms
m3.build();
mf=scf.RHF(m3)
mf.kernel()
mn=nmr.RHF(mf)
BsTMS=mn.kernel()

Compute the relative chemical shieldings of each proton in acetaldehyde from the isotropic total shieldings 

In [None]:
TMSref=numpy.trace(BsTMS[6])/3
deltas=[]
print('Chemical shifts of acetaldehyde relative to TMS: %.4f'%(TMSref))
for iat in range(len(elementsCO)):
    deltas.append(0)
    if(elementsCO[iat]=='H'):
        val=numpy.trace(BsCO[iat])/3
        delta=TMSref-val
        deltas[iat]=delta
        print('%2d %2s %.3f'%(iat,elementsCO[iat],delta))

Plot the acetaldehyde optimized structure to compare the different chemical shifts 

In [None]:
mb=Chem.MolToMolBlock(m2)
p=py3Dmol.view(width=400,height=400)
p.addModel(mb,'sdf')
p.setStyle({'stick':{},'sphere':{"scale":0.3}})
#p.addPropertyLabels('index',{elem:{'H'}},{fontColor:'black'})
#p.addPropertyLabels("index",{not:{elem:'H'}}, {fontColor:'black',font: 'sans-serif', fontSize: 28, showBackground:false,alignment:'center'});
#p.addPropertyLabels("atom","",{'showBackground':'false','alignment': 'center'})
p.addPropertyLabels("index","",{'showBackground':'false','alignment': 'center'})
p.zoomTo()
p.render()

The in-plane methyl proton 4 has a chemical shift different from the out-of-plane methyl protons 3 and 5. 

## Example 2: Reaction Energies

Choosing reasonable conformations of reactants and products can be essential for computing accurate reaction energies. Here we consider a simple example, the gas-phase proton affinity of ethylenediamine en. Choosing different conformations for the reactant, or for the product, can dramatically change the predicted proton affinities 

First we use the RDKit EmbedMultipleConfs conformational search to  compute two low-lying conformations for the neutral reactant. The lowest-energy conformation has an internal hydrogen bond, the other conformation is extended. 

In [None]:
l=Chem.MolFromSmiles('NCCN')
l2=Chem.AddHs(l)
lconfs=AllChem.EmbedMultipleConfs(l2,numConfs=100)
print('Number of conformers: %d'%(len(lconfs)))
uniqueEs=[]
uniqueIDs=[]
for confid in lconfs:
    AllChem.MMFFOptimizeMolecule(l2,confId=confid)
    ff = AllChem.MMFFGetMoleculeForceField(l2, AllChem.MMFFGetMoleculeProperties(l2), confId=confid)
    E=ff.CalcEnergy()
    keep=1
    for Eold in uniqueEs:
        if((E-Eold)**2<0.1):
            keep=0
    if(keep>0):
        uniqueEs.append(E)
        uniqueIDs.append(confid)
        
# Sort by energy 
lEs=[(x,y) for x,y in sorted(zip(uniqueEs,uniqueIDs))]
Emin=lEs[0][0]
print('Lowest energy: %.4f'%(Emin))
    
p = py3Dmol.view(width=600,height=200,viewergrid=(1,2))
for i in range(2):
    ss=lEs[i]
    confid=ss[1]
    E=ss[0]
    DE=(E-Emin)
    DElabel='%.2f'%(DE)
    p.addModel(Chem.MolToMolBlock(l2,confId=confid), 'sdf',viewer=(0,i))
    p.addLabel(DElabel,{'inFront':True,'fontColor':'black','backgroundColor':'white'},viewer=(0,i))
    p.setStyle({'stick':{},'sphere':{"scale":0.3}},viewer=(0,i))
p.zoomTo()
#p.update()
p.render()


Next, we use the RDKit EmbedMultipleConfs to find low-lying conformations of the protonated molecule. Again, the lowest-lying conformation has an internal hydrogen bond from the protonated amine, but now the next lowest conformation is much higher in energy. 

In [None]:
c=Chem.MolFromSmiles('NCC[NH3+]')
c2=Chem.AddHs(c)
img=Draw.MolToImage(c2)
img

In [None]:
cconfs=AllChem.EmbedMultipleConfs(c2,numConfs=100)
print('Number of conformers: %d'%(len(cconfs)))
uniqueEs=[]
uniqueIDs=[]
for confid in cconfs:
    AllChem.MMFFOptimizeMolecule(c2,confId=confid)
    ff = AllChem.MMFFGetMoleculeForceField(c2, AllChem.MMFFGetMoleculeProperties(c2), confId=confid)
    E=ff.CalcEnergy()
    keep=1
    for Eold in uniqueEs:
        if((E-Eold)**2<0.1):
            keep=0
    if(keep>0):
        uniqueEs.append(E)
        uniqueIDs.append(confid)
        
# Sort by energy 
cEs=[(x,y) for x,y in sorted(zip(uniqueEs,uniqueIDs))]
Emin=cEs[0][0]
print('Lowest energy: %.4f'%(Emin))
    
p = py3Dmol.view(width=600,height=200,viewergrid=(1,2))
for i in range(2):
    ss=cEs[i]
    confid=ss[1]
    E=ss[0]
    DE=(E-Emin)
    DElabel='%.2f'%(DE)
    p.addModel(Chem.MolToMolBlock(c2,confId=confid), 'sdf',viewer=(0,i))
    p.addLabel(DElabel,{'inFront':True,'fontColor':'black','backgroundColor':'white'},viewer=(0,i))
    p.setStyle({'stick':{},'sphere':{"scale":0.3}},viewer=(0,i))
p.zoomTo()
#p.update()
p.render()


Next we use PySCF to compute the ground-state energies of the two low-lying reactant geometries and the two low-lying product geometries. Note that the ground state energy of the other reactant, an isolated gas-phase H+ cation, is zero.

In [None]:
le = [atom.GetSymbol() for atom in l2.GetAtoms()]
lc1 = l2.GetConformer(lEs[0][1]).GetPositions()
lc2 = l2.GetConformer(lEs[1][1]).GetPositions()
lcde=lEs[1][0]-lEs[0][0]
ac1 = [(element, coordinate) for element, coordinate in zip(le, lc1)]
ac2 = [(element, coordinate) for element, coordinate in zip(le, lc2)]

lm1 = gto.Mole(atom=ac1,basis="3-21g").build()
lm2 = gto.Mole(atom=ac2,basis="3-21g").build()

print(lEs[0][0])
lm1f=scf.RHF(lm1)
lm1f.kernel() 
lm1o=optimize(lm1f)
lm1of=scf.RHF(lm1o)
lm1of.kernel()
print(lEs[1][0])
lm2f=scf.RHF(lm2)
lm2o=optimize(lm2f)
lm2of=scf.RHF(lm2o)
lm2of.kernel()
lEs2=[]
lEs2.append(lm1of.e_tot)
lEs2.append(lm2of.e_tot)
print('Reactant geometry energy difference (kcal/mol)')
print('RDKit: %.2f  PySCF: %.2f'%(lcde,627.5095*(lm2of.e_tot-lm1of.e_tot)))

In [None]:
ce = [atom.GetSymbol() for atom in c2.GetAtoms()]
cc1 = c2.GetConformer(cEs[0][1]).GetPositions()
cc2 = c2.GetConformer(cEs[1][1]).GetPositions()
ccde=cEs[1][0]-cEs[0][0]
ac1 = [(element, coordinate) for element, coordinate in zip(ce, cc1)]
ac2 = [(element, coordinate) for element, coordinate in zip(ce, cc2)]

cm1 = gto.Mole(atom=ac1,basis="3-21g",charge=1).build()
cm2 = gto.Mole(atom=ac2,basis="3-21g",charge=1).build()

cm1f=scf.RHF(cm1)
cm1o=optimize(cm1f)
cm1of=scf.RHF(cm1o)
cm1of.kernel()

cm2f=scf.RHF(cm2)
cm2o=optimize(cm2f)
cm2of=scf.RHF(cm2o)
cm2of.kernel()
cEs2=[]
cEs2.append(cm1of.e_tot)
cEs2.append(cm2of.e_tot)
print('Product geometry energy difference (kcal/mol)')
print('RDKit: %.2f  PySCF: %.2f'%(ccde,627.5095*(cm2of.e_tot-cm1of.e_tot)))

Finally, we compute four possible values of the gas-phase proton affinity, using the four different possible conformations. The *correct* answer is the first value, using the  most stable reactant conformation and the most stable product conformation. Note that the proton affinity is the negative of the reaction energy 

en + H(+) ---> en-H(+) 

Positive values of the proton affinity correspond to stable proton binding. For comparison, the gas-phase proton affinity of NH3 is measured at 204 kcal/mol (853.6 kJ/mol), see https://webbook.nist.gov/cgi/cbook.cgi?ID=C7664417 

In [None]:
print("Proton affinities (kcal/mol)")
print("Product Reactant E(RDKit) E(PySCF)")
for i in range(2):
    for j in range(2):
        print('%d %d %6.1f %6.1f'%(i,j,-1.0*(cEs[0][i]-lEs[0][j]),-627.5095*(cEs2[i]-lEs2[j])))

Your assignment for this module is as follows 

(1) Plot the computed NMR spectrum for acetaldehyde, and compare to experiment. Make two plots: one using the chemical shifts as computed, and one assuming that the methyl group rotation is fast on the NMR timescale. Which plot better fits experiment? 
    
(2) Compute 5 low-energy structures of protonated 2-aminoethanol, and compute the gas-phase proton affinity of each structure. What is the range of proton affinity, and the range of pKa, obtained from different strutures? 

For 50 points extra credit, fix the RDKit geometry optimization of the charged species to correctly account for the charge. Once this is fixed, the RDKit proton affinity should be near 250 kcal/mol, not near 0 kcal/mol. 