# SI: Caffeine Solvation in Electrolyte Solutions
Stefan Hervø-Hansen<sup>a,b</sup>, Nobuyuki Matubayasi<sup>b,*</sup>, Mikael Lund<sup>a,*</sup>.<br><br>
<sup>a</sup> Division of Therotical Chemistry, Department of Chemistry, Lund University, Lund SE 221 00, Sweden.<br> <sup>b</sup> Division of Chemical Engineering, Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka 560-8531, Japan.<br>
<sup>*</sup> To whom correspondence may be addressed nobuyuki@cheng.es.osaka-u.ac.jp and mikael.lund@teokem.lu.se.

## Introduction

We present a study of the solvation free energy of caffeine in electrolyte solutions using the energy representation description in combination with all-atom simulations.

The Setschenow coefficient, $k_s$, is on the $\ln$-scale defined as

$$ \ln S/S_0 = \ln\gamma = -k_sc_s = \mu^{ex}$$

where $S$ and $S_0$ are solubilities in pure water and an electrolyte solution of concentration $c_s$.

_Note:_ The $\log_{10}$ scale is often used in the literature used.

## Methods & Materials
Molecular dynamics simulations was conducted using the openMM (7.4.0)[<sup>1</sup>](#fn1) software package modded with the openmmtools[<sup>2</sup>](#fn2) and parmed[<sup>3</sup>](#fn3) packages. For the simulation of caffeine a GROMOS (ffGF53a6) derived Kirkwood-Buff force field with adjustments to the partial charges and geometrical parameters, which has previously been able able to reproduce experimental solubilities of caffeine in water and urea solutions [<sup>4</sup>](#fn4) was employed with the SPC/E [<sup>5</sup>](#fn5) force field employed for water and optimized ion parameters for alkali cations and halide anions[<sup>6</sup>](#fn6). The isothermal-isobaric ensemble was sampled using a combination of a geodesic Langevin integrator [<sup>7</sup>](#fn7) and a Monte Carlo barostat[<sup>8,</sup>](#fn8)[<sup>9</sup>](#fn9). The trajectories were analysed using MDtraj[<sup>10</sup>](#fn10) for structual properties, while ERmod[<sup>11</sup>](#fn11) was utilized for the calculation of solvation free energies.


## References
1. <span id="fn1"> Eastman P, et al. (2017) OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLOS Computational Biology 13(7):e1005659.</span><br>
2. <span id="fn2"> https://github.com/choderalab/openmmtools</span><br>
3. <span id="fn3"> https://github.com/ParmEd/ParmEd </span><br>
4. <span id="fn4"> Sanjeewa R, Weerasinghe S (2010) Development of a molecular mechanics force field for caffeine to investigate the interactions of caffeine in different solvent media. Journal of Molecular Structure: THEOCHEM 944(1–3):116–123. </span><br>
5. <span id="fn5"> Berendsen HJC, Grigera JR, Straatsma TP (1987) The missing term in effective pair potentials. The Journal of Physical Chemistry 91(24):6269–6271. </span><br>
6. <span id="fn6"> Heyda J, Vincent JC, Tobias DJ, Dzubiella J, Jungwirth P (2010) Ion Specificity at the Peptide Bond: Molecular Dynamics Simulations ofN-Methylacetamide in Aqueous Salt Solutions. The Journal of Physical Chemistry B 114(2):1213–1220. </span><br>
7. <span id="fn8"> Leimkuhler B, Matthews C (2016) Efficient molecular dynamics using geodesic integration and solvent–solute splitting. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 472(2189):20160138. </span><br>
8. <span id="fn9"> Chow K-H, Ferguson DM (1995) Isothermal-isobaric molecular dynamics simulations with Monte Carlo volume sampling. Computer Physics Communications 91(1–3):283–289. </span><br>
9. <span id="fn10"> Åqvist J, Wennerström P, Nervall M, Bjelic S, Brandsdal BO (2004) Molecular dynamics simulations of water and biomolecules with a Monte Carlo constant pressure algorithm. Chemical Physics Letters 384(4–6):288–294. </span><br>
10. <span id="fn11"> McGibbon RT, et al. (2015) MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophysical Journal 109(8):1528–1532. </span><br>
11. <span id="fn12"> Sakuraba S, Matubayasi N (2014) Ermod: Fast and versatile computation software for solvation free energy with approximate theory of solutions. Journal of Computational Chemistry 35(21):1592–1608. </span><br>

### Import of python models

In [None]:
# Notebook dependent libs
import parmed as pmd
import math
import numpy as np
import matplotlib.pyplot as plt
import mdtraj as md
import os

# Simulation specific libs
import sys
from simtk.openmm import app
import simtk.openmm as mm
import openmmtools as mmtools
import parmed as pmd
from parmed.openmm import StateDataReporter

homedir='/home/stefan/Caffeine_solubility'

### Simulation settings

In [None]:
# State of simulations, (outFreq is steps per frame)
states = {'iso': {'Nsteps': 50000000, 'OutFreq': 50},
          'sol': {'Nsteps': 1000000,  'OutFreq': 50},
          'aqs': {'Nsteps': 500000,   'OutFreq': 500}}

salts = {'NaCl': {'Cation': 'Na', 'Anion': 'Cl'},
         'NaI' : {'Cation': 'Na', 'Anion': 'I' }
        }

# Concentrations of salts behaiving ideally
salt_concentrations = {0.00: {'Caffeine': 1, 'Water': 2760, 'Cation':0,   'Anion':0},
                       0.25: {'Caffeine': 1, 'Water': 2760, 'Cation':16,  'Anion':16},
                       0.50: {'Caffeine': 1, 'Water': 2760, 'Cation':27,  'Anion':27},
                       0.55: {'Caffeine': 1, 'Water': 2760, 'Cation':30,  'Anion':30},
                       0.73: {'Caffeine': 1, 'Water': 2760, 'Cation':40,  'Anion':40},
                       1.00: {'Caffeine': 1, 'Water': 2760, 'Cation':55,  'Anion':55},
                       1.25: {'Caffeine': 1, 'Water': 2760, 'Cation':69,  'Anion':69},   # Check avg volume from sim.
                       1.50: {'Caffeine': 1, 'Water': 2760, 'Cation':83,  'Anion':83},   # Check avg volume from sim.
                       1.75: {'Caffeine': 1, 'Water': 2760, 'Cation':96,  'Anion':96},   # Check avg volume from sim.
                       2.00: {'Caffeine': 1, 'Water': 2760, 'Cation':110, 'Anion':110}   # Check avg volume from sim.
                      }

### INPUT FILES: make PDB and .TOP files

In [None]:
%cd -q $homedir

for concentration, Nparticles in salt_concentrations.items():
    conc = '{0:.2f}'.format(concentration)
    %cd -q $homedir/Simulations/NaCl/$conc
    
    # Packmol Input
    packmol_script="""
tolerance 2.0
filetype pdb
output Caffeine_NaCl_sol.pdb
add_box_sides 1.0

structure ../../../PDB_files/single-caffeine-molecule.pdb
        number {N_caffeine}
        fixed 22.5 22.5 22.5 0. 0. 0.
        centerofmass
end structure

structure ../../../PDB_files/water.pdb
        number {N_wat}
        inside cube 0. 0. 0. 45.
end structure

{salt}structure ../../../PDB_files/Na.pdb
{salt}        number {N_Na}
{salt}        inside cube 0. 0. 0. 45.
{salt}end structure

{salt}structure ../../../PDB_files/Cl.pdb
{salt}        number {N_Cl}
{salt}        inside cube 0. 0. 0. 45.
{salt}end structure
"""
    with open('packmol.in', 'w') as text_file:
        # fix for no salt:
        if concentration:
            salt=''
        else:
            salt='#'
        text_file.write(packmol_script.format(N_caffeine=Nparticles['Caffeine'], N_wat=Nparticles['Water'],
                                              N_Na=Nparticles['Cation'], N_Cl=Nparticles['Anion'], salt=salt))
    !packmol < packmol.in

    # Topology input
    topology="""
[ system ]
; Name
Caffeine in NaCl {conc} M aqueous solution.

[ defaults ]
; nbfunc        comb-rule       gen-pairs       fudgeLJ fudgeQQ
1               3               yes             0.5     0.8333

; Include all atomtypes
#include "/home/stefan/Caffeine_solubility/force_fields/atomtypes_spc.itp"
#include "/home/stefan/Caffeine_solubility/force_fields/atomtypes_ions.itp"
#include "/home/stefan/Caffeine_solubility/force_fields/atomtypes_caffeine.itp"

; Include all topologies
#include "/home/stefan/Caffeine_solubility/force_fields/ions.itp"
#include "/home/stefan/Caffeine_solubility/force_fields/spce.itp"
#include "/home/stefan/Caffeine_solubility/force_fields/caffeine-KBFF.itp"

[ molecules ]
; Compound         #mols
2S09               {N_caffeine}
SOL                {N_wat}
{salt}Cl                 {N_Cl}
{salt}Na                 {N_Na}
"""
    with open('Caffeine_NaCl_sol.top', 'w') as text_file:
        # fix for no salt:
        if concentration:
            salt=''
        else:
            salt=';'
        text_file.write(topology.format(conc=concentration, salt=salt,
                                        N_caffeine=Nparticles['Caffeine'], N_wat=Nparticles['Water'],
                                        N_Na=Nparticles['Cation'], N_Cl=Nparticles['Anion']))
    
    # Solvated state
    mol = pmd.load_file('Caffeine_NaCl_sol.top', xyz='Caffeine_NaCl_sol.pdb')
    mol.save('Caffeine_NaCl_sol.top', overwrite=True)
    
    # Isolated state
    mol.strip('!:2S09')
    mol.save('Caffeine_NaCl_iso.top')
    mol.save('Caffeine_NaCl_iso.pdb')
    
    # Aqueous state
    mol = pmd.load_file('Caffeine_NaCl_sol.top', xyz='Caffeine_NaCl_sol.pdb')
    mol.strip(':2S09')
    mol.save('Caffeine_NaCl_aqs.top')
    mol.save('Caffeine_NaCl_aqs.pdb')
    
    print('Wrote initial configurations and topology files to'+os.getcwd())

## Molecular dynamics simulations
All-atomic molecular dynamics was conducted on pure liquids using the OPLS-aa force field in combination with the openMM 7.3.1 software package modded using the openmmtools package. The initial configuration was created using the Packmol software. The simulation was run on the Aurora supercomputer in Lund, and thus the following code provides the simulation settings in terms of which molecules to simulate, system size, and temperature, followed by an OpenMM run script (```run_openMM.py```) containing the simulation setup including constrains, barostat, thermostat, simulation length, and calculated properties and finally a submit script (```aurora.sh```) for clusters opperated with Slurm.

### Simulation setup using OpenMM

In [None]:
%cd -q $homedir
N_simulations = 0

for concentration in salt_concentrations:
    conc = '{0:.2f}'.format(concentration)
    %cd -q $homedir/Simulations/NaCl/$conc
    for state, settings in states.items():
        
        openmm_script="""
# Imports
import sys
import os
from simtk.openmm import app
import simtk.openmm as mm
import openmmtools as mmtools
from parmed import load_file, unit as u
from parmed.openmm import StateDataReporter

print('Loading initial configuration and toplogy')
init_conf = load_file('Caffeine_NaCl_{state}.top', xyz='Caffeine_NaCl_{state}.pdb')

# Creating system
print('Creating OpenMM System')
system = init_conf.createSystem(nonbondedMethod=app.PME, ewaldErrorTolerance=0.0005,
                                nonbondedCutoff=1.2*u.nanometers, constraints=app.HBonds)

# Calculating total mass of system
total_mass = 0
for i in range(system.getNumParticles()):
    total_mass += system.getParticleMass(i).value_in_unit(u.dalton)
total_mass *= u.dalton
                                                    
# Particle constraints
state = '{state}'
if state == 'iso':
    for i in range(system.getNumParticles()):
        system.setParticleMass(i, 0)

# Temperature-coupling by geodesic Langevin integrator (NVT)
integrator = mmtools.integrators.GeodesicBAOABIntegrator(K_r = 3,
                                                         temperature = 298.15*u.kelvin,
                                                         collision_rate = 1.0/u.picoseconds,
                                                         timestep = 2.0*u.femtoseconds
                                                        )

# Pressure-coupling by a Monte Carlo Barostat (NPT)
if state != 'iso':
    system.addForce(mm.MonteCarloBarostat(1*u.bar, 298.15*u.kelvin, 25))

platform = mm.Platform.getPlatformByName('CUDA')
properties = {{'CudaPrecision': 'mixed', 'CudaDeviceIndex': '0'}}

# Create the Simulation object
sim = app.Simulation(init_conf.topology, system, integrator, platform, properties)

# Set the particle positions
sim.context.setPositions(init_conf.positions)

# Minimize the energy
print('Minimizing energy')
if state != 'iso':
    sim.minimizeEnergy(tolerance=1*u.kilojoule/u.mole, maxIterations=500000)
    
# Draw initial MB velocities
sim.context.setVelocitiesToTemperature(298.15*u.kelvin)

# Set up the reporters
sim.reporters.append(app.StateDataReporter('output_{state}.dat', {outFreq}, totalSteps={Nsteps},
    time=True, potentialEnergy=True, kineticEnergy=True, temperature=True, density=True, systemMass=total_mass,
    remainingTime=True, speed=True, separator='\t'))

# Set up trajectory reporter
sim.reporters.append(app.DCDReporter('trajectory_{state}.dcd', {outFreq}, append=False))

# Run dynamics
print('Running dynamics! (NPT)')
sim.step({Nsteps}) 
"""

        with open('openMM_{state}.py'.format(state=state), 'w') as text_file:
            text_file.write(openmm_script.format(state=state, Nsteps=settings['Nsteps'], outFreq=settings['OutFreq']))
            N_simulations+=1
    print('Wrote run_openMM.py files to '+os.getcwd())

print('Simulations about to be submitted: {}'.format(N_simulations))

## Submit script

In [None]:
for concentration in salt_concentrations:
    conc = '{0:.2f}'.format(concentration)
    %cd -q $homedir/Simulations/NaCl/$conc
    for state in states:
        
        submit_script="""#!/bin/bash
#PBS -l nodes=1:ppn=36:nu-G02:gpus=1  # 1 node, 36 cores, GPU node nu-G02, 1 gpu.
#PBS -N {conc}_M_NaCl_{state}         # Name of job
#PBS -e run_{state}.err               # error output
#PBS -o run_{state}.out               # output file name

source ~/.bashrc
source ~/.bash_profile
cd {path}

python openMM_{state}.py"""

        with open('submit_{state}.pbs'.format(state=state), 'w') as text_file:
            text_file.write(submit_script.format(conc=conc, state=state, path=os.getcwd()))
#    !qsub submit_iso.pbs   # WARNING! ONLY NEED TO SUBMIT ONE AS THIS IS THE SAME FOR ALL EXPERIMENTS! THE ONE SUBMITTED IS FOUND IN /home/stefan/Caffeine_solubility/Simulations/NaCl/0.00
    !qsub submit_sol.pbs
    !qsub submit_aqs.pbs

## Load and plot solvation free energies

The results from the ERmod analysis is stored in the file `_results.yml` which we now load and plot. The results are obtained from 50 ns long MD simulations of caffeine in electrolyte solutions.
We see that both NaCl and NaI leads to _salting out_, i.e. the solvation free energy of caffeine is increased compared to pure water.

In [None]:
with open('_results.yml') as f: # open structured result file (YAML)
    
    r = yaml.load(f, Loader=yaml.Loader)
    
    fig, (ax1, ax2) = plt.subplots(1,2, sharex=True)
    fig.set_size_inches(10,4.5)
    fig.tight_layout()

    for d in r['salts']: # loop over all salt types
        conc   = np.array(d['conc'])     # molar conc.
        mu     = np.array(d['muexcess']) # excess chem. pot.
        error  = np.array(d['error'])    # error on mu
        mu0    = mu[0]                   # in pure water
        gamma  = np.exp( (mu-mu0)*kcal_to_kT )          # activity coefficient
        fit    = np.polyfit(conc, mu, 1, w=1/error)
        fit_fn = np.poly1d(fit)
        #print("Setschenow coefficient ("+d['label']+") =", fit[0])
        ax1.errorbar(conc, mu, yerr=error, fmt='o', c=d['color'], alpha=0.6, ms=10)
        ax1.plot(conc, fit_fn(conc), label=d['label'], c=d['color'], lw=2, ms=10)
        ax2.plot(conc, gamma, label=d['label'], c=d['color'], lw=2, marker='o', ms=10)

    ax1.set_title('Solvation Free Energy')
    ax2.set_title('Activity Coefficient')
    ax1.set_xlabel('Salt concentration (mol/l)')
    ax1.set_ylabel('$\Delta G_{sol}$ (kcal/mol)')
    ax2.set_xlabel('Salt concentration (mol/l)')
    ax2.set_ylabel('$\gamma$')
    ax1.legend(loc=2, frameon=False, fontsize='large')
plt.savefig('solvation.pdf', bbox_inches='tight')

## Free Energy Decomposition by Solvent and Co-solutes

The ERmod output also contain information about how solvent and co-solutes contribute to the total free energy. This is plotted below for the case of 2 molar salt.

In [None]:
with open('_results.yml') as f: # open structured result file (YAML)
    r = yaml.load(f, Loader=yaml.Loader)
    fig, ax = plt.subplots()
    fig.set_size_inches(6,6)
    width=0.3
    offset=0
    index = np.arange(4)
    
    for d in r['salts']: # loop over all salt types
        mu = d['decomposition'][0]['mu']
        error = d['decomposition'][0]['error']
        ax.bar( index+offset, mu,
                color=d['color'], label=d['label'], width=width,
                alpha=0.7, yerr=error, capsize=2)
        offset = offset + width
ax.set_xticks(index + width / 2)
ax.set_xticklabels(('total', 'water', 'cations', 'anions'))
ax.set_ylabel('$\Delta G_{sol}$ (kcal/mol)')
#ax.set_title('Free energy decomposition (2 M)')
ax.legend(loc=0, frameon=False, fontsize='large')
plt.savefig('solvent-decomposition.pdf', bbox_inches='tight')

## Decomposition by Caffeine Motifs

Todo...