Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip water molecules from all topology/coordinate files in current database #21

Closed
davidlmobley opened this issue Apr 17, 2015 · 8 comments

Comments

@davidlmobley
Copy link
Member

Because of prior manual curation of files, not all topology and coordinate files contain water molecules. And additionally, I just found out (from Sereina Riniker - e-mail excerpt below) that some of these contain TIP4P-Ew water molecules rather than TIP3P. Again, this is a result of manually gathering the topology/coordinate files for these (in some cases by students). The best long-term solution is to re-generate all topology/coordinate files from original source data (Issue #20), but an interim solution is just to strip all water molecules from existing topology/coordinate files.

Riniker's e-mail said this, in part:
"Regarding the [input files] I noticed two things which I thought you might like to know if you do not already. In the most recent version v0.31, I encountered 78 molecules where the GROMACS coordinate file .gro does not contain the solvent coordinates. In addition, there are 23 molecules where the solvent model in the coordinate file is not TIP3P (it contains 4 coordinates per solvent molecule). I attach the list of molecule numbers in case you would like to have a look at them."

The compound ID numbers for setups with TIP4P are:
1323538
1728386
186894
1873346
1875719
1923244
2005792
2049967
20524
2068538
2178600
2972906
3053621
3727287
3738859
4035953
511661
5157661
525934
5449201
8427539
9055303
9979854

And those for setups with no water are:
1034539
1160109
1469079
172879
1893815
1905088
1944394
2126135
2316618
242480
2484519
2492140
2613240
2636578
2659552
2844990
2845466
2850833
2960202
2972345
3040612
3083321
3211679
3265457
3269819
3359593
3515580
3686115
3802803
3976574
4149784
4371692
4479135
4587267
4603202
4613090
4678740
4689084
486214
4936555
5003962
5006685
5282042
5371840
5456566
5510474
5538249
5561855
5616693
5917842
6102880
6190089
6195751
6198745
628951
6359156
667278
6688723
6935906
7239499
7417968
7676709
7913234
8052240
819018
8208692
8311303
8337722
8823527
8827942
8883511
9257453
9510785
9653690
9717937
9741965
9821936
9897248

@jchodera
Copy link
Contributor

Which topology/coordinate files in particular are of interest? The Amber ones?

I might have some time to make progress on this today.

@davidlmobley
Copy link
Member Author

This should be the GROMACS ones, as I always solvated things after
converting to GROMACS.

On Fri, Apr 17, 2015 at 10:25 AM, John Chodera notifications@github.com
wrote:

Which topology/coordinate files in particular are of interest? The Amber
ones?

I might have some time to make progress on this today.


Reply to this email directly or view it on GitHub
#21 (comment).

David Mobley
dmobley@gmail.com
949-385-2436

@jchodera
Copy link
Contributor

Is a workflow in which we first solvate in AMBER tleap and then use acpype to convert to gromacs acceptable, or would that generate undesirable topology files?

Also, if there's already an issue on the preferred way to generate these files, my apologies---feel free to just post a pointer.

@davidlmobley
Copy link
Member Author

davidlmobley commented Apr 17, 2015 via email

@jchodera
Copy link
Contributor

I don't think we can invest any time in trying to fix up manually curated files with throwaway scripts. If we do put time into this, it has to be to establish automated pipelines that build this from the ground up.

Creating a workflow to create unsolvated and solvated AMBER prmtop/inpcrd files and convert to gromacs via acpype would be pretty easy if we find this acceptable for now. There are other options too, such as using OpenMM to solvate and write a PDB file and then converting directly to AMBER and gromacs, but that might be a bit trickier right now. Eventually, these protocols can be reworked to use tools like gaff2xml once the public API is stable.

Info on acpype testing is here:
https://code.google.com/p/acpype/wiki/TestingAcpypeAmb2gmx

@jchodera
Copy link
Contributor

See #22

@davidlmobley
Copy link
Member Author

davidlmobley commented Apr 17, 2015

For now, we absolutely ought to be doing the same thing we (in my group)
have always done for these which is to create AMBER files exactly as you
describe and convert to GROMACS. If you think you have time to do so today,
that's awesome. Otherwise I can put a student on it shortly.

(And, if my student for some reason takes a while to get this done, I'm not
ruling out that I will whip out a one-off script to just quickly strip the
waters so at least everything is consistent - since effectively that's what
Sereina is having to do right now anyway.)

@davidlmobley
Copy link
Member Author

This was resolved by the full rebuild of the database for version 0.5, in #28 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants