Strip water molecules from all topology/coordinate files in current database #21

davidlmobley · 2015-04-17T17:18:50Z

Because of prior manual curation of files, not all topology and coordinate files contain water molecules. And additionally, I just found out (from Sereina Riniker - e-mail excerpt below) that some of these contain TIP4P-Ew water molecules rather than TIP3P. Again, this is a result of manually gathering the topology/coordinate files for these (in some cases by students). The best long-term solution is to re-generate all topology/coordinate files from original source data (Issue #20), but an interim solution is just to strip all water molecules from existing topology/coordinate files.

Riniker's e-mail said this, in part:
"Regarding the [input files] I noticed two things which I thought you might like to know if you do not already. In the most recent version v0.31, I encountered 78 molecules where the GROMACS coordinate file .gro does not contain the solvent coordinates. In addition, there are 23 molecules where the solvent model in the coordinate file is not TIP3P (it contains 4 coordinates per solvent molecule). I attach the list of molecule numbers in case you would like to have a look at them."

The compound ID numbers for setups with TIP4P are:
1323538
1728386
186894
1873346
1875719
1923244
2005792
2049967
20524
2068538
2178600
2972906
3053621
3727287
3738859
4035953
511661
5157661
525934
5449201
8427539
9055303
9979854

And those for setups with no water are:
1034539
1160109
1469079
172879
1893815
1905088
1944394
2126135
2316618
242480
2484519
2492140
2613240
2636578
2659552
2844990
2845466
2850833
2960202
2972345
3040612
3083321
3211679
3265457
3269819
3359593
3515580
3686115
3802803
3976574
4149784
4371692
4479135
4587267
4603202
4613090
4678740
4689084
486214
4936555
5003962
5006685
5282042
5371840
5456566
5510474
5538249
5561855
5616693
5917842
6102880
6190089
6195751
6198745
628951
6359156
667278
6688723
6935906
7239499
7417968
7676709
7913234
8052240
819018
8208692
8311303
8337722
8823527
8827942
8883511
9257453
9510785
9653690
9717937
9741965
9821936
9897248

jchodera · 2015-04-17T17:25:21Z

Which topology/coordinate files in particular are of interest? The Amber ones?

I might have some time to make progress on this today.

davidlmobley · 2015-04-17T17:51:12Z

This should be the GROMACS ones, as I always solvated things after
converting to GROMACS.

On Fri, Apr 17, 2015 at 10:25 AM, John Chodera notifications@github.com
wrote:

Which topology/coordinate files in particular are of interest? The Amber
ones?

I might have some time to make progress on this today.

—
Reply to this email directly or view it on GitHub
#21 (comment).

David Mobley
dmobley@gmail.com
949-385-2436

jchodera · 2015-04-17T18:05:12Z

Is a workflow in which we first solvate in AMBER tleap and then use acpype to convert to gromacs acceptable, or would that generate undesirable topology files?

Also, if there's already an issue on the preferred way to generate these files, my apologies---feel free to just post a pointer.

davidlmobley · 2015-04-17T19:00:49Z

I have not validated whether acpype handles box conversions properly. (At one point in the past, it did not). So normally I just prep the molecule itself in AMBER and then solvate in GROMACS. Do you know? (We should create an issue on GitHub to lay out the protocol for re-generating everything from the source data. I'm working on figuring out who in my lab can go ahead and do this, but as noted that's a separate issue - the most immediate solution is just to strip the waters.)

jchodera · 2015-04-17T19:07:35Z

I don't think we can invest any time in trying to fix up manually curated files with throwaway scripts. If we do put time into this, it has to be to establish automated pipelines that build this from the ground up.

Creating a workflow to create unsolvated and solvated AMBER prmtop/inpcrd files and convert to gromacs via acpype would be pretty easy if we find this acceptable for now. There are other options too, such as using OpenMM to solvate and write a PDB file and then converting directly to AMBER and gromacs, but that might be a bit trickier right now. Eventually, these protocols can be reworked to use tools like gaff2xml once the public API is stable.

Info on acpype testing is here:
https://code.google.com/p/acpype/wiki/TestingAcpypeAmb2gmx

jchodera · 2015-04-17T19:09:09Z

See #22

davidlmobley · 2015-04-17T19:22:26Z

For now, we absolutely ought to be doing the same thing we (in my group)
have always done for these which is to create AMBER files exactly as you
describe and convert to GROMACS. If you think you have time to do so today,
that's awesome. Otherwise I can put a student on it shortly.

(And, if my student for some reason takes a while to get this done, I'm not
ruling out that I will whip out a one-off script to just quickly strip the
waters so at least everything is consistent - since effectively that's what
Sereina is having to do right now anyway.)

davidlmobley · 2017-01-30T16:54:43Z

This was resolved by the full rebuild of the database for version 0.5, in #28 .

davidlmobley closed this as completed Jan 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strip water molecules from all topology/coordinate files in current database #21

Strip water molecules from all topology/coordinate files in current database #21

davidlmobley commented Apr 17, 2015

jchodera commented Apr 17, 2015

davidlmobley commented Apr 17, 2015

jchodera commented Apr 17, 2015

davidlmobley commented Apr 17, 2015 via email

jchodera commented Apr 17, 2015

jchodera commented Apr 17, 2015

davidlmobley commented Apr 17, 2015 •

edited

davidlmobley commented Jan 30, 2017

Strip water molecules from all topology/coordinate files in current database #21

Strip water molecules from all topology/coordinate files in current database #21

Comments

davidlmobley commented Apr 17, 2015

jchodera commented Apr 17, 2015

davidlmobley commented Apr 17, 2015

jchodera commented Apr 17, 2015

davidlmobley commented Apr 17, 2015 via email

jchodera commented Apr 17, 2015

jchodera commented Apr 17, 2015

davidlmobley commented Apr 17, 2015 • edited

davidlmobley commented Jan 30, 2017

davidlmobley commented Apr 17, 2015 •

edited