Bug filtering large system of waters #276

stefdoerr · 2017-03-13T13:28:24Z

Seems to be related with bond calculation. Since at each successive step if finds more molecules as water and keeps removing. Found at loro:/tmp/test-htmd/structure.pdb. Works find with prmtop I guess because it contains the correct bonds.

In [21]: mol = Molecule('./structure.pdb')

In [22]: mol.numAtoms
Out[22]: 302263

In [23]: mol.filter('not water')

In [24]: mol.numAtoms
Out[24]: 205575

In [25]: mol.filter('not water')

In [26]: mol.numAtoms
Out[26]: 108888

In [27]: mol.filter('not water')

In [28]: mol.numAtoms
Out[28]: 12201

In [29]: mol.filter('not water')

In [30]: mol.numAtoms
Out[30]: 7414

In [33]: mol = Molecule('./structure.prmtop')

In [34]: mol.filter('not water')

In [35]: mol.numAtoms
Out[35]: 7414

The text was updated successfully, but these errors were encountered:

stefdoerr · 2017-03-13T13:32:27Z

Also prmtop filtering takes a reaaaally long time. Would be good to profile it.

j3mdamas · 2017-03-13T13:45:04Z

wow, it's really weird, no?
I mean, the water VMD selection should be independent of bonding, and based on residue name

stefdoerr · 2017-03-14T16:06:00Z

It's getting more and more weird.
For example if you try to view that PDB through HTMD, HTMD writes a PSF and XTC and then loads it into VMD. However VMD 1.9.1 loads it fine, yet 1.9.2 throws following error:

Info) Using plugin psf for structure file /tmp/tmpzjnym6a6.psf
ERROR) BaseMolecule: Cannot add atom; namelist index value too large.
ERROR) Recompile VMD with larger index types.
ERROR) Atom namelist index values at time of overflow:
ERROR)   nameindex: 18
ERROR)   typeindex: 0
ERROR)   resnameindex: 25
ERROR)   segnameindex: 32768
ERROR)   altlocindex: 0
ERROR)   chainindex: 3
ERROR) MolFilePlugin: file load aborted
ERROR) molecule_structure: Unable to read structure for molecule 0
ERROR) molecule_structure: severe error indicated by plugin aborting loading of molecule 0
ERROR) Loading of startup molecule files aborted.

Have to find out if this is a regression in VMD and if it's also related to the bond guessing. Might also be two separate issues though

stefdoerr · 2017-03-14T16:11:48Z

Ok this is a bug in our PSFwriter which overflows fields. Will fix now

stefdoerr · 2017-03-14T16:27:15Z

Fixed the PSF writing bug
6167686

Was independent of bond guessing bug which will need more investigation now

stefdoerr · 2017-03-14T16:30:12Z

This is just bound to be fun:

In [52]: mol = Molecule('./structure.pdb')

In [53]: np.sum(mol.atomselect('water and name O'))
Out[53]: 32229

In [54]: np.sum(mol.atomselect('water'))
Out[54]: 96688

In [55]: 32229*3
Out[55]: 96687

In [56]: mol = Molecule('./structure.prmtop')

In [57]: np.sum(mol.atomselect('water'))
Out[57]: 294849

j3mdamas · 2017-03-14T16:36:15Z

do you need help, @stefdoerr ?

stefdoerr · 2017-03-14T16:37:30Z

The issue is very probably in the C code of the atomselect, so it will be tricky. Feel free to look at it of course but I think it will be quite some work.

stefdoerr · 2017-03-14T16:51:03Z

Works fine in VMD

>Main< (test-htmd) 44 % mol new structure.pdb autobonds off
1
>Main< (test-htmd) 45 % set x [atomselect top "water"]
atomselect1
>Main< (test-htmd) 46 % $x num
294849

j3mdamas · 2017-03-14T16:55:14Z

also, we need to create tests for atomselect

stefdoerr · 2017-03-15T15:34:43Z

Ok found the issue. It's not related to bonds at all since these are apparently not used for the "water" selection (you can do it with 0 coordinates everywhere). It's related to the segids.

When we read tleap PDBs I interpret the TER commands as ends of segments and create corresponding segids (which VMD does not do). If I do mol.segid[:] = '' it works. Also the prmtop works because it doesn't have segids defined.

The issue here is that VMD C code probably assumes that segids are 2 characters as per PDB format. In our case though with all those waters, we have enough TERs to reach triple-digit segids.

The only way I can think of to deal with this is to check number of letters in segids and if they go over 2 drop them and throw a warning during atomselect that the segid was not used.

j3mdamas · 2017-03-15T15:47:51Z

that sounds like a good idea.

it's still weird to be how water depends on segid, as it shouldn't

stefdoerr · 2017-03-15T15:48:55Z

Ok so I was partially wrong about my 2 character assumption. It works up to around 10.000+ segids and then it breaks. I assume we don't work much with such huge systems and hence didn't notice till now.

Works great if I just use alternating resids 1 and 0 so that might be a better idea compared to totally dropping them although I can't prove it.

j3mdamas · 2017-03-15T15:52:49Z

Segids or resids?

stefdoerr · 2017-03-15T15:53:34Z

Breaks between 20000 and 50000 segids. Go figure. I will stop here trying to find the exact limit.

stefdoerr · 2017-03-15T15:53:44Z

@j3mdamas segids

j3mdamas · 2017-03-15T15:56:21Z

that is one of weirdest bugs ever...

but how would the alternating resids thing work?

stefdoerr · 2017-03-15T15:57:56Z

0 until a change in segid happens then 1 until a change happens etc.
Some programs only care about the changes in segid and not absolute values.
For the moment I'll just drop the segid. Seems less risky than alternating.

j3mdamas · 2017-03-15T15:58:49Z

I agree.

stefdoerr · 2017-03-15T16:01:54Z

Done, fixed. Pushing it now.

In [2]: mol = Molecule('./structure.pdb')
2017-03-15 17:00:43,136 - htmd.molecule.readers - WARNING - Reading PDB file with more than 99999 atoms. Bond information can be wrong.

In [3]: mol.filter('not water')
2017-03-15 17:00:48,173 - htmd.molecule.vmdparser - WARNING - More than 3 characters were used for segnames. Due to limitations in VMD atomselect segids will be dropped for the atomselection.
2017-03-15 17:00:49,565 - htmd.molecule.molecule - INFO - Removed 294849 atoms. 7414 atoms remaining in the molecule.

stefdoerr · 2017-03-15T16:09:00Z

06304a6

stefdoerr added the bug:medium label Mar 13, 2017

stefdoerr self-assigned this Mar 13, 2017

stefdoerr mentioned this issue Mar 13, 2017

Running adaptive sampling using AMBER on a cluster using SLURM or PBS #255

Closed

stefdoerr closed this as completed Mar 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug filtering large system of waters #276

Bug filtering large system of waters #276

stefdoerr commented Mar 13, 2017 •

edited

Loading

stefdoerr commented Mar 13, 2017

j3mdamas commented Mar 13, 2017

stefdoerr commented Mar 14, 2017

stefdoerr commented Mar 14, 2017

stefdoerr commented Mar 14, 2017

stefdoerr commented Mar 14, 2017 •

edited

Loading

j3mdamas commented Mar 14, 2017

stefdoerr commented Mar 14, 2017

stefdoerr commented Mar 14, 2017

j3mdamas commented Mar 14, 2017

stefdoerr commented Mar 15, 2017

j3mdamas commented Mar 15, 2017

stefdoerr commented Mar 15, 2017

j3mdamas commented Mar 15, 2017

stefdoerr commented Mar 15, 2017

stefdoerr commented Mar 15, 2017

j3mdamas commented Mar 15, 2017

stefdoerr commented Mar 15, 2017

j3mdamas commented Mar 15, 2017

stefdoerr commented Mar 15, 2017

stefdoerr commented Mar 15, 2017

Bug filtering large system of waters #276

Bug filtering large system of waters #276

Comments

stefdoerr commented Mar 13, 2017 • edited Loading

stefdoerr commented Mar 13, 2017

j3mdamas commented Mar 13, 2017

stefdoerr commented Mar 14, 2017

stefdoerr commented Mar 14, 2017

stefdoerr commented Mar 14, 2017

stefdoerr commented Mar 14, 2017 • edited Loading

j3mdamas commented Mar 14, 2017

stefdoerr commented Mar 14, 2017

stefdoerr commented Mar 14, 2017

j3mdamas commented Mar 14, 2017

stefdoerr commented Mar 15, 2017

j3mdamas commented Mar 15, 2017

stefdoerr commented Mar 15, 2017

j3mdamas commented Mar 15, 2017

stefdoerr commented Mar 15, 2017

stefdoerr commented Mar 15, 2017

j3mdamas commented Mar 15, 2017

stefdoerr commented Mar 15, 2017

j3mdamas commented Mar 15, 2017

stefdoerr commented Mar 15, 2017

stefdoerr commented Mar 15, 2017

stefdoerr commented Mar 13, 2017 •

edited

Loading

stefdoerr commented Mar 14, 2017 •

edited

Loading