Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Espaloma bonded terms discrepancies #209

Open
laurenwinkler opened this issue Feb 13, 2024 · 15 comments
Open

Espaloma bonded terms discrepancies #209

laurenwinkler opened this issue Feb 13, 2024 · 15 comments

Comments

@laurenwinkler
Copy link

Hi Espaloma Devs,

I am exploring the use of Espaloma (with RdKit under the hood) to describe a protein + ligand complex.

When comparing system XML files from an (unsolvated) Espaloma described system vs one described by amber/protein.ff14SB.xml and GAFF, I noticed that ligands described with Espaloma contain more than (ligand count) choose 2 additional number of bonded terms in the section of the output XML file. For example, if the ligand has let's say 12 atoms, the XML of a system where the ligand is parameterized with Espaloma has:
additional_number__terms > 3 x (12 choose 2)

If the ligand described with Espaloma is in a complex, is it intentional to add nonphysical harmonic bonds between the ligand and the protein?

I've attached the script used to build the Espaloma systems as well as the input files. I should also note that I am using:
openff-interchange=0.3.18
openff-toolkit=0.14.3
openmm=8.0.0
espaloma=0.3.2
rdkit=2023.03.3
python=3.10

Any insight would be helpful. Thanks in advanced.
Espaloma-test.zip

@laurenwinkler
Copy link
Author

Also adding the system XML files here.
Amber-gaff.xml is with GAFF describing the ligand and amber/ff14SB.xml describing the protein
Espaloma-lig-only.xml is with Espaloma describing the ligand and ambre/ff14SB.xml describing the protein
Espaloma-full-system.xml is with both ligand and protein described using the EspalomaTemplateGenerator
system-xml-files.zip

@ijpulidos
Copy link
Contributor

ijpulidos commented Feb 14, 2024

@laurenwinkler Hi, thank you for your interest in using Espaloma. I can reproduce your results using both RDkit and Openeye as cheminformatics backends.

As far as I can tell, Espaloma is behaving as expected, since the protein sdf has 2632 bonds, and that's exactly what we get in the system XML file for the full system (complex with Espaloma). Considering the extra 21 bonds from the ligand file, of course.

That makes me suspicious about the protein.sdf file, how was this file created? If this was created from a PDB file, maybe it can be helpful if you share the PDB file from which the SDF was created Thanks!

@kntkb
Copy link
Contributor

kntkb commented Feb 14, 2024

@laurenwinkler Hi! Thank you for bringing this issue to our attention. It's something I haven't encountered before. Building on @ijpulidos's suggestion, perhaps creating a system using Amber/GAFF force fields with a PDB file for the protein may help debug the problem.

Sorry, never mind. I got confused.

@laurenwinkler
Copy link
Author

Thank you for you response. I have attached the protein pdb file as well as the script used to convert the pdb to an SDF.

Would you mind sharing the script you used to produce the results as well as the XML file?
Uploading protein.zip…

@ijpulidos
Copy link
Contributor

@laurenwinkler I think the zip file wasn't successfully uploaded, can you retry uploading it? Thanks!

@laurenwinkler
Copy link
Author

My apologies!

protein.zip

@pitmanme
Copy link

pitmanme commented Feb 16, 2024

Just following up, could you share the script that you guys used to produce an XML file that had the correct bond terms for a complex with both protein and ligand described by Espaloma and no OE backend? @ijpulidos @kntkb

If you look at the XML files in Lauren's first comment (system-xml-files) the complex system (espaloma-full-system.xml) has 2653 bonds where as amber-gaff.xml and espaloma-lig-only.xml have 1323 bonds. If you have a way to start from the PDB file and have this work, our issue may be solved.

@kntkb
Copy link
Contributor

kntkb commented Feb 16, 2024

The extra bonds in espaloma-full-system.xml seems to come from bonds that involve hydrogens. For example, there are bonds between particles 2 (heavy atom) and 3 (hydrogen atom) in espaloma-full-system.xml but not in espaloma-lig-only.xml and amber-gaff.xml.

I also realized that espaloma-full-system.xml is missing the <Constraints> information for bonds that involve hydrogens. The bonds involved with hydrogens from the <Bonds> section should be in <Constraints>. I guess the EspalomaTemplateGenerator doesn't have a way to work cooperatively with constraint arguments (e.g., app.HBonds) passed to openmm.app.ForceField.

Although this is not elegant, a workaround could involve writing a parser to transfer the <Bonds> information to <Constraints> after the XML file is created. Alternatively, one could update the solvated topology, generated by modeller.addSolvent, by deleting the bonds involving hydrogens and then adding that information back as constraints to the system.

There is a sample script here that starts from a PDB file but requires OE in the backend.

@ijpulidos
Copy link
Contributor

@kntkb Great catch! Yes, I agree, this explains the issue. Maybe what we want is for the EspalomaTemplateGenerator to have a way to respect the constraints, but we need a way to make it general for both proteins and small (non-protein) molecules. Thanks!

@laurenwinkler
Copy link
Author

Thank you for your help. We redid the XML file generation for espaloma and amber and now they have the same bond count. I've attached the system XML files as well as our scripts and inputs. Would you be able to verify if the XML files look correct to you?
Espaloma_system_test.zip

@diogomart
Copy link

Not sure this is related, but in the hope it helps, it may be worth testing openmmforcefields 0.11.2 and 0.12.0. We noticed differences in espaloma charges of symmetric atoms and @nbruciaferri observed that 0.12.0 fixed such asymmetries.

@kntkb
Copy link
Contributor

kntkb commented Feb 21, 2024

@diogomart Ah, yes. The asymmetric charges are expected when using openmmforcefields 0.11.2 because it uses the old espaloma model (0.2.2). This issue was resolved in openmmforcefields 0.12.0 which now uses the improved espaloma model (0.3.x) as default.

@pitmanme
Copy link

pitmanme commented Feb 21, 2024

Thanks for the pointers @diogomart @kntkb

Thank you for your help. We redid the XML file generation for espaloma and amber and now they have the same bond count. I've attached the system XML files as well as our scripts and inputs. Would you be able to verify if the XML files look correct to you? Espaloma_system_test.zip

We would love to get to the bottom of if Espaloma is able to run with rdkit under the hood so we can benchmark. If you get the chance to check out these XML files, we would like to know if you still see an issue with them?

If you don't see any problems, this issue is resolved on our end.

@kntkb
Copy link
Contributor

kntkb commented Feb 23, 2024

Thank you for your help. We redid the XML file generation for espaloma and amber and now they have the same bond count. I've attached the system XML files as well as our scripts and inputs. Would you be able to verify if the XML files look correct to you? Espaloma_system_test.zip

If the intention was to parameterize the system using amber14SB for the protein and espaloma for the ligand, then the XML file appears to be correct. However, if the goal was to parameterize both the protein and ligand with espaloma, then the XML file has not been created as intended.

@kntkb
Copy link
Contributor

kntkb commented Feb 23, 2024

The proteins are parameterized with amber, not espaloma, because you are specifying amber/protein.ff14SB.xml in output_espaloma_complex.py. If you want to parameterize with espaloma, you will need to remove amber/protein.ff14SB.xml. However, I think this might cause an error if the OpenEye Toolkits are not installed in the backend.

As mentioned earlier, I believe you could use RDKit as an alternative, but you would need to use SDF format for the protein and transfer the information to after creating the XML file, as mentioned earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants