Add support for AMBER as a free-energy perturbation engine #272

lohedges · 2024-04-10T13:46:26Z

This PR adds support for using AMBER as an FEP engine, using either pmemd or pmemd.cuda. This ports the code from the Exscientia sandpit into the core, making sure that it is as interoperable as possible. Thankfully this wasn't too painful since I already did the work of writing all of the AMBER output parsing in the Sandpit anyway, so could copy most straight across.

The major changes are:

Added support for using are reference system for position restraints to all engine. (Previously, the system that is passed to the process is used as the reference.)
Added a custom find_exe function to the AMBER process to aid finding a supported AMBER executable for the requested simulation protocol. This can also handle different variants of the pmemd executables via globbing.
Improved AMBER configuration options for pmemd and pmemd.cuda. There are now tests that I can run locally in order to validate that single-point energies agree for sander, pmemd, and pmemd.cuda. (More on this later.)
Improved support for passing addition kwargs through to the Process objects when setting up FEP simulations.
Added a hidden somd1_compatibility option to AMBER and GROMACS for FEP comparisons.
Added test of AMBER FEP analysis to the test suite. (The code for the analysis was already written, I just needed to add some data to test that it works reproducibly.)
Updated the hydration free-energy tutorial to also use AMBER as an example backend.

The main challenge with the PR was getting things to work reliably with pmemd and pmemd.cuda, in particular, for vacuum simulations. One thing that I hadn't realised was that the Exscientia sandpit code is actually overloaded by private internal functionality that I don't have access too, hence some configuration options (on which I was basing my port) are not what is used in practice. The code also doesn't even work as is, i.e. in the way in which a regular BioSimSpace user would interface with it (how the other engines work).

With regards to differences between pmemd and pmemd.cuda, the main pain is that pmemd.cuda doesn't support implicit solvent for vacuum simulations so you need to make sure to add a simulation box and run with a cutoff instead. (pmemd can run with no box and an "infinite" cutoff.) From my single-point energy tests I found that I can get essentially perfect agreement in energy between pmemd and pmemd.cuda, except for vacuum FEP simulations. For my test system I find that the energy components are:

pmemd:

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1       3.0582E+00     1.1813E+01     2.4354E+01     C           1

 BOND    =        0.0328  ANGLE   =        2.5203  DIHED      =        0.1513
 VDWAALS =        0.0000  EEL     =        0.0000  HBOND      =        0.0000
 1-4 VDW =        0.0426  1-4 EEL =        0.3112  RESTRAINT  =        0.0000
 DV/DL  =        12.8088

pmemd.cuda:

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1       3.7337E+00     1.1813E+01     2.4354E+01     C           1

 BOND    =        0.0328  ANGLE   =        2.5203  DIHED      =        0.1513
 VDWAALS =       -0.0010  EEL     =       -0.0000  HBOND      =        0.0000
 1-4 VDW =        0.1185  1-4 EEL =        0.9118  RESTRAINT  =        0.0000
 DV/DL  =        12.1272

From this it is clear that the only difference is in non-bonded 1-4 VDW and 1-4 EEL terms.

At first I thought that this difference was because of performing one simulation using a periodic box with a cutoff, and the other with no box and no cutoff. However, I then checked single-point energies for an ethane-->methanol merged molecule using two approaches:

Perform a regular single-point calculation using the extracted lambda=0 state via BioSimSpace.Protocol.Minimisation(steps=1). For non-FEP simulations, both pmemd and pmemd.cuda support use of implicit solvent with no box. Here the results are in agreement:

pmemd:

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1       5.0487E+00     1.3658E+01     4.3557E+01     C1          1

 BOND    =        0.3937  ANGLE   =        3.4642  DIHED      =        0.2291
 VDWAALS =        0.0000  EEL     =        0.0000  HBOND      =        0.0000
 1-4 VDW =        0.0521  1-4 EEL =        0.9096  RESTRAINT  =        0.0000

pmemd.cuda:

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1       5.0487E+00     1.3658E+01     4.3557E+01     C1          1

 BOND    =        0.3937  ANGLE   =        3.4642  DIHED      =        0.2291
 VDWAALS =        0.0000  EEL     =        0.0000  EGB        =        0.0000
 1-4 VDW =        0.0521  1-4 EEL =        0.9096  RESTRAINT  =        0.0000

Do the same using the AMBER FEP code, via BioSimSpace.Protocol.FreeEnergyMinimisation(steps=1). Here the results differ (as we saw earlier), but it is clear that despite pmemd.cuda having the different setup (now using a box with cutoff) it's actually the pmemd result that is the outlier, i.e. pmemd.cuda does agree with the non-FEP result.

pmemd:

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1       2.5251E+00     1.8820E+01     4.3557E+01     C1          1

 BOND    =        0.3937  ANGLE   =        1.7375  DIHED      =        0.0514
 VDWAALS =        0.0000  EEL     =        0.0000  HBOND      =        0.0000
 1-4 VDW =        0.0338  1-4 EEL =        0.3088  RESTRAINT  =        0.0000
 DV/DL  =        10.4548

pmemd.cuda:

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1       3.1434E+00     1.8820E+01     4.3557E+01     C1          1

 BOND    =        0.3937  ANGLE   =        1.7375  DIHED      =        0.0514
 VDWAALS =       -0.0010  EEL     =        0.0001  HBOND      =        0.0000
 1-4 VDW =        0.0521  1-4 EEL =        0.9096  RESTRAINT  =        0.0000
 DV/DL  =         9.8312

Clearly something in the non-bonded calculation differs between pmemd and pmemd.cuda, but only for vacuum FEP simulations. Not being an expert here, I'm wondering if there is some additional configuration option that I need to set. (I've looked around, but can't see anything.) I believe that pmemd was used for previous hydration free-energy calculations, e.g. some of @jmichel80's, so it would be good to know if this was an issue then? It could be that the code is broken, in which case we should probably only use pmemd.cuda for FEP in vacuum when it's available. (I have also checked that this difference gives a difference in dG when performing a vacuum hydration free-energy leg, i.e. it doesn't somehow cancel out.) Let me know what you think.

Checklist:

I confirm that I have merged the latest version of devel into this branch before issuing this pull request (e.g. by running git pull origin devel): [y]
I confirm that I have added a test for any new functionality in this pull request: [y]
I confirm that I have added documentation (e.g. a new tutorial page or detailed guide) for any new functionality in this pull request: [y]
I confirm that I have permission to release this code under the GPL3 license: [y]

Suggested reviewers:

@chryswoods

[ref OpenBioSim/sire#183]

[closes #270]

lohedges · 2024-04-10T15:52:15Z

It appears that all OpenMM tests are failing on Windows. Will debug tomorrow.

lohedges · 2024-04-10T17:00:47Z

Okay, it seems that I can get the correct electrostatics for pmemd by using gti_add_sc=1. According the the manual, this should be the default, but clearly isn't the case for pmemd. (Wouldn't be the first thing that was wrong.) I can get the correct angle and dihedral energies using gti_bat_sc=1.

No idea about the Windows failure. It passes on all other platforms. All I can think is that os.splitext isn't doing what I think, although it's used elsewhere in the code that's also tested on Windows 🤷‍♂️

lohedges · 2024-04-10T19:10:09Z

Okay, the Windows error with the OpenMM tests is:

---------------------------- Captured stdout call -----------------------------
Starting %PREFIX%\Library\bin\sire_python.exe: number of threads equals 4
None
  File "C:\Users\RUNNER~1\AppData\Local\Temp\tmp0j8ahz0t/test_script.py", line 21
    prm = parmed.load_file('C:\Users\RUNNER~1\AppData\Local\Temp\tmp0j8ahz0t/test.prm7', 'C:\Users\RUNNER~1\AppData\Local\Temp\tmp0j8ahz0t/test.rst7')
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

The tests in the Exscientia Sandpit pass since the codet doesn't use the full path to the file, so doesn't run into the unicode issue. I made some modifications to handle the new restraint files so I'll need to check the logic for the other engines too and avoid using the absolute path.

chryswoods

The changes are good, and can be approved.

However, I think the root cause of your unicode issue that you are constructing file paths manually. I used to do this as well, but it ended up really biting me on windows because my file paths ended up with a mixture of forward and backward slashs (as your paths do in the error). This really confuses windows, and led to lots of random behaviour, especially when sire was launched from different shells (i.e. cmd versus powershell versus whatever jupyterlab provided).

The (annoying) solution is to replace all manual path construction with calls to os.path.join. For example, see src/sire/_load.py (or search for this in other files).

lohedges · 2024-04-11T16:26:44Z

Yes, I've used os.path.join elsewhere where other issues were found. It's been on my TODO list to go through and update everywhere for consistency. I'll see how much more needs changing.

lohedges · 2024-04-11T16:28:33Z

In reality, it would probably be best to use pathlib throughout, but it didn't exist when we started.

lohedges added 30 commits February 21, 2024 14:13

Add support for using a reference system for position restraints.

7ab5c8d

Add support for FEP with AMBER molecular dynamics engine.

5dc399a

Merge branch 'devel' into feature_amber_fep

24cbc65

Merge branch 'devel' into feature_amber_fep

4f32f2d

Add custom _find_exe function for AMBER.

c9a7f0e

Appears that igb=6 is only needed for pmemd.cuda.

004dbb3

Remove invalid kwarg.

cdc962c

Append process object for first lambda window.

5b68d72

Add way of specifying extra command-line arguments for AMBER and GROMACS

34e2730

Add **kwargs to all Process classes.

d01f254

Don't use pmemd.cuda for vacuum FEP simulations.

4399905

Add AMBER FEP analysis test.

777d459

Add SOMD1 compatibility mode as an option for AMBER/GROMACS FEP.

31b34df

Need to pass system to hasWater method.

b87d1ed

Merge branch 'devel' into feature_amber_fep

b4f9ca5

Pass explicit_dummies kwargs through to _generate_amber_fep_masks.

68bcc2c

Remove redundant openff.Topology import.

6da2b9d

Need to use tishake=1 and an appropriate noshakemask in vacuum.

33bdd2d

Use is_vacuum flag.

188fb68

Make sure kwargs are supported in __init__.

7f11fbd

Handle vaccum FEP simulations with pmemd and pmemd.cuda.

044ad4a

Remove the parameters property before creating a partial molecule.

f1fb052

[ref OpenBioSim/sire#183]

Remove the parameters property when extracting a partial molecule.

185ccaa

[ref OpenBioSim/sire#183]

Merge branch 'devel' into feature_amber_fep

2f0a0f9

Add pmemd single-point energy tests.

698473b

Use list rather than set so search strings are reproducible.

692849e

[closes #270]

Join protein and nucleic acid residue strings correctly.

81d8233

Use try/except when matching by coordinates.

f44b35e

Merge branch 'devel' into feature_amber_fep

de06ed7

Exclude sander from free-energy simulations.

9b870cd

Don't use SHAKE for minimisation. [ci skip]

844fb1d

lohedges added 3 commits April 10, 2024 18:11

Figured out how to match vacuum energies.

6ded910

Use correct file for reference system.

7294841

Convert water topology of reference system so naming matches.

daad0e1

lohedges temporarily deployed to biosimspace-build April 10, 2024 17:21 — with GitHub Actions Inactive

lohedges had a problem deploying to biosimspace-build April 10, 2024 17:21 — with GitHub Actions Failure

lohedges temporarily deployed to biosimspace-build April 10, 2024 17:21 — with GitHub Actions Inactive

Try debugging Windows error.

d33004c

lohedges temporarily deployed to biosimspace-build April 10, 2024 18:16 — with GitHub Actions Inactive

lohedges had a problem deploying to biosimspace-build April 10, 2024 18:16 — with GitHub Actions Failure

lohedges temporarily deployed to biosimspace-build April 10, 2024 18:16 — with GitHub Actions Inactive

lohedges added 2 commits April 10, 2024 20:18

Remove debugging statements.

59323f5

Use relative file names in OpenMM Python script.

3951502

lohedges temporarily deployed to biosimspace-build April 10, 2024 19:19 — with GitHub Actions Inactive

chryswoods approved these changes Apr 11, 2024

View reviewed changes

lohedges merged commit fdff786 into devel Apr 12, 2024
5 checks passed

lohedges deleted the feature_amber_fep branch April 12, 2024 12:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for AMBER as a free-energy perturbation engine #272

Add support for AMBER as a free-energy perturbation engine #272

lohedges commented Apr 10, 2024

lohedges commented Apr 10, 2024

lohedges commented Apr 10, 2024

lohedges commented Apr 10, 2024 •

edited

chryswoods left a comment

lohedges commented Apr 11, 2024

lohedges commented Apr 11, 2024 •

edited

Add support for AMBER as a free-energy perturbation engine #272

Add support for AMBER as a free-energy perturbation engine #272

Conversation

lohedges commented Apr 10, 2024

Suggested reviewers:

lohedges commented Apr 10, 2024

lohedges commented Apr 10, 2024

lohedges commented Apr 10, 2024 • edited

chryswoods left a comment

Choose a reason for hiding this comment

lohedges commented Apr 11, 2024

lohedges commented Apr 11, 2024 • edited

lohedges commented Apr 10, 2024 •

edited

lohedges commented Apr 11, 2024 •

edited