Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug filtering large system of waters #276

Closed
stefdoerr opened this issue Mar 13, 2017 · 21 comments
Closed

Bug filtering large system of waters #276

stefdoerr opened this issue Mar 13, 2017 · 21 comments
Assignees

Comments

@stefdoerr
Copy link
Contributor

stefdoerr commented Mar 13, 2017

Seems to be related with bond calculation. Since at each successive step if finds more molecules as water and keeps removing. Found at loro:/tmp/test-htmd/structure.pdb. Works find with prmtop I guess because it contains the correct bonds.

In [21]: mol = Molecule('./structure.pdb')

In [22]: mol.numAtoms
Out[22]: 302263

In [23]: mol.filter('not water')

In [24]: mol.numAtoms
Out[24]: 205575

In [25]: mol.filter('not water')

In [26]: mol.numAtoms
Out[26]: 108888

In [27]: mol.filter('not water')

In [28]: mol.numAtoms
Out[28]: 12201

In [29]: mol.filter('not water')

In [30]: mol.numAtoms
Out[30]: 7414

In [33]: mol = Molecule('./structure.prmtop')

In [34]: mol.filter('not water')

In [35]: mol.numAtoms
Out[35]: 7414
@stefdoerr
Copy link
Contributor Author

Also prmtop filtering takes a reaaaally long time. Would be good to profile it.

@j3mdamas
Copy link
Contributor

wow, it's really weird, no?
I mean, the water VMD selection should be independent of bonding, and based on residue name

@stefdoerr
Copy link
Contributor Author

It's getting more and more weird.
For example if you try to view that PDB through HTMD, HTMD writes a PSF and XTC and then loads it into VMD. However VMD 1.9.1 loads it fine, yet 1.9.2 throws following error:

Info) Using plugin psf for structure file /tmp/tmpzjnym6a6.psf
ERROR) BaseMolecule: Cannot add atom; namelist index value too large.
ERROR) Recompile VMD with larger index types.
ERROR) Atom namelist index values at time of overflow:
ERROR)   nameindex: 18
ERROR)   typeindex: 0
ERROR)   resnameindex: 25
ERROR)   segnameindex: 32768
ERROR)   altlocindex: 0
ERROR)   chainindex: 3
ERROR) MolFilePlugin: file load aborted
ERROR) molecule_structure: Unable to read structure for molecule 0
ERROR) molecule_structure: severe error indicated by plugin aborting loading of molecule 0
ERROR) Loading of startup molecule files aborted.

Have to find out if this is a regression in VMD and if it's also related to the bond guessing. Might also be two separate issues though

@stefdoerr
Copy link
Contributor Author

Ok this is a bug in our PSFwriter which overflows fields. Will fix now

@stefdoerr
Copy link
Contributor Author

Fixed the PSF writing bug
6167686

Was independent of bond guessing bug which will need more investigation now

@stefdoerr
Copy link
Contributor Author

stefdoerr commented Mar 14, 2017

This is just bound to be fun:

In [52]: mol = Molecule('./structure.pdb')

In [53]: np.sum(mol.atomselect('water and name O'))
Out[53]: 32229

In [54]: np.sum(mol.atomselect('water'))
Out[54]: 96688

In [55]: 32229*3
Out[55]: 96687

In [56]: mol = Molecule('./structure.prmtop')

In [57]: np.sum(mol.atomselect('water'))
Out[57]: 294849

@j3mdamas
Copy link
Contributor

do you need help, @stefdoerr ?

@stefdoerr
Copy link
Contributor Author

The issue is very probably in the C code of the atomselect, so it will be tricky. Feel free to look at it of course but I think it will be quite some work.

@stefdoerr
Copy link
Contributor Author

Works fine in VMD

>Main< (test-htmd) 44 % mol new structure.pdb autobonds off
1
>Main< (test-htmd) 45 % set x [atomselect top "water"]
atomselect1
>Main< (test-htmd) 46 % $x num
294849

@j3mdamas
Copy link
Contributor

also, we need to create tests for atomselect

@stefdoerr
Copy link
Contributor Author

Ok found the issue. It's not related to bonds at all since these are apparently not used for the "water" selection (you can do it with 0 coordinates everywhere). It's related to the segids.

When we read tleap PDBs I interpret the TER commands as ends of segments and create corresponding segids (which VMD does not do). If I do mol.segid[:] = '' it works. Also the prmtop works because it doesn't have segids defined.

The issue here is that VMD C code probably assumes that segids are 2 characters as per PDB format. In our case though with all those waters, we have enough TERs to reach triple-digit segids.

The only way I can think of to deal with this is to check number of letters in segids and if they go over 2 drop them and throw a warning during atomselect that the segid was not used.

@j3mdamas
Copy link
Contributor

that sounds like a good idea.

it's still weird to be how water depends on segid, as it shouldn't

@stefdoerr
Copy link
Contributor Author

Ok so I was partially wrong about my 2 character assumption. It works up to around 10.000+ segids and then it breaks. I assume we don't work much with such huge systems and hence didn't notice till now.

Works great if I just use alternating resids 1 and 0 so that might be a better idea compared to totally dropping them although I can't prove it.

@j3mdamas
Copy link
Contributor

Segids or resids?

@stefdoerr
Copy link
Contributor Author

Breaks between 20000 and 50000 segids. Go figure. I will stop here trying to find the exact limit.

@stefdoerr
Copy link
Contributor Author

@j3mdamas segids

@j3mdamas
Copy link
Contributor

that is one of weirdest bugs ever...

but how would the alternating resids thing work?

@stefdoerr
Copy link
Contributor Author

0 until a change in segid happens then 1 until a change happens etc.
Some programs only care about the changes in segid and not absolute values.
For the moment I'll just drop the segid. Seems less risky than alternating.

@j3mdamas
Copy link
Contributor

I agree.

@stefdoerr
Copy link
Contributor Author

Done, fixed. Pushing it now.

In [2]: mol = Molecule('./structure.pdb')
2017-03-15 17:00:43,136 - htmd.molecule.readers - WARNING - Reading PDB file with more than 99999 atoms. Bond information can be wrong.

In [3]: mol.filter('not water')
2017-03-15 17:00:48,173 - htmd.molecule.vmdparser - WARNING - More than 3 characters were used for segnames. Due to limitations in VMD atomselect segids will be dropped for the atomselection.
2017-03-15 17:00:49,565 - htmd.molecule.molecule - INFO - Removed 294849 atoms. 7414 atoms remaining in the molecule.

@stefdoerr
Copy link
Contributor Author

06304a6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants