Inconsistent dtype for universe.dimensions #2190

PicoCentauri · 2019-01-30T10:29:20Z

Expected behavior

The dtype of the dimensions for a standard (not in-memory) universe should be consistent with an in-memory universe. This is in particular important for the low-lewel C functions like the make_whole function which will fail for the in-memory trajectory, since it requires a float.

Actual behavior

The dimension of the standard universe representation is a float and the in-memory representation is a double.

Code to reproduce the behavior

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF, DCD

u = mda.Universe(PSF, DCD)
u2 = mda.Universe(PSF, DCD, in_memory=True)

print(u.dimensions.dtype)
print(u2.dimensions.dtype)

Currently version of MDAnalysis

I'm using Python 3.6 and the dev version of MDAnalysis on MacOS

The text was updated successfully, but these errors were encountered:

jbarnoud · 2019-01-30T10:57:46Z

The problem seems to be solvable by changing the dtype on
https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/coordinates/memory.py#L375

and on
https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/coordinates/memory.py#L378

with maybe some caution with
https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/coordinates/memory.py#L239

PicoCentauri · 2019-01-30T11:20:11Z

With these changed it looks good. I also found these lines:

https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/core/universe.py#L491
https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/core/universe.py#L661

richardjgowers · 2019-01-30T15:42:24Z

I'm pretty sure that we needed doubles at some point for triclinic pbc, so I think make_whole is wrong here

zemanj · 2019-01-30T16:14:32Z

@richardjgowers AFAIK triclinic PBC are handled exclusively in lib.distances, and that uses float32 boxes.
IIRC all routines in lib.distances use lib.util.check_box(), which does the type conversion to float32 regardless of input dtype.

You're right regarding precision problems with PBC transformations. Such problems can occur even with orthorombic boxes leading to atom positions ending up exactly on the upper box boundary instead of on the lower. The reason is that the box inverse is not precise enough in single precision. I encountered that when testing a wrap-unwrap-wrap cycle.

richardjgowers · 2019-01-30T16:18:12Z

Hmm yeah you're right, box is put in as a `coordinate*` which is float. Maybe I'm remembering that the box inverse had to be double...

…

On Wed, 30 Jan 2019 at 10:14 Johannes Zeman ***@***.***> wrote: @richardjgowers <https://github.com/richardjgowers> AFAIK triclinic PBC are handled exclusively in lib.distances, and that uses float32 boxes. IIRC all routines in lib.distances use lib.util.check_box(), which does the type conversion to float32 regardless of input dtype. You're right regarding precision problems with PBC transformations. Such problems can occur even with orthorombic boxes leading to atom positions ending up exactly on the upper box boundary instead of on the lower. The reason is that the box inverse is not precise enough in single precision. I encountered that when testing a wrap-unwrap-wrap cycle. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2190 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AI0jB65XPreEzONBVi6MH-daAgyDdwtWks5vIcTpgaJpZM4aZ5p4> .

zemanj · 2019-02-06T01:56:37Z

make_whole() will be fixed for double precision dimensions in PR #2189. However, that is just a workaround for the dtype consistency problem. I found that a universe created with mda.Universe.empty() also has a double precision box. Maybe that's related to the construction of an in-memory Universe?

richardjgowers · 2019-02-06T15:25:11Z

@zemanj yeah it's probably just a non specific call to np.zeros if I had to guess.. Universe.empty is using MemoryReader but it might be making the arrays manually

zemanj · 2019-02-11T11:39:00Z

I think we need a decision here... Should we

enforce Universe.dimensions to always be a float32 ndarray or
just make sure we always cast it as float32 whenever it's used?

If we opt for 2., I think we can close this.

jbarnoud · 2019-02-11T12:55:10Z

Opt 1 is easier to enforce: we can add a test in the base reader test class that asserts that universe.dimensions is float32. Any reader not following that rule would cause a test failure. Is there a reason to have float64 for the box?

zemanj · 2019-02-11T13:24:52Z

Universe.dimensions boils down to TimeStep._unitcell, which is default-initialized with np.zeros((6), np.float32). It might be easiest to add automatic dtype conversion to np.float32 in the TimeStep.dimensions setter, which would then read

@dimensions.setter
def dimensions(self, box):
    self._unitcell[:] = box.astype(np.float32, copy=False)

Ideally, we add a validation check here so that one cannot supply an invalid box.
That check could go into lib.util, which is imported in coordinates.base anyway.

EDIT:
We still have to check if TimeStep._init_unitcell() is overwritten with something undesirable in any of the trajectory readers.

zemanj · 2019-02-11T14:52:03Z

A check for a valid box could look as follows:

If all values are zero, return np.zeros(6, dtype=np.float32).
If any of the values is not finite (+/- np.nan or +/- np.inf), raise a ValueError.
If any angle is negative or greater than or equal to 360 degrees, apply angle = angle % 360.0.
If all lengths and angles are greater than zero and all angles are strictly smaller than 180 degrees:
- If the sum of any two angles is greater than the remaining one, return the box.
- Otherwise, raise a ValueError.
If any of the lengths or angles is zero:
- Check that for every zero length, the corresponding pair of angles is also zero (indicates a 2d- or 1d system). If any of the following conditions is not met, raise a ValueError:
  - If lx == 0, beta and gamma must be zero.
  - If ly == 0, alpha and gamma must be zero.
  - If lz == 0, alpha and beta must be zero.
  - If alpha == 0, at least one of ly or lz must be zero.
  - If beta == 0, at least one of lx or lz must be zero.
  - If gamma == 0, at least one of lx or ly must be zero.
If any length is negative or any angle is greater than 180 degrees, transform the box to matrix representation and back using
box = lib.mdamath.triclinic_box(*lib.mdamath.triclinic_vectors(box)) and run the above checks again. Otherwise, return the box.

Maybe one could also change the order of the tests and first check if a valid matrix representation exists, that might simplify things a bit.

Moreover, if any of the values is changed, a corresponding warning should be raised.

richardjgowers · 2019-02-11T17:00:21Z

@jbarnoud I think maybe you could argue that round tripping via MDAnalysis shouldn't mangle precision, so if we get data as float64 we should keep it that way, but we don't do that for positions, so why for box?

@zemanj enforcing option 1 seems possible like you've outlined. There's not many (any?) readers that redefine _init_unitcell iirc. I think the idea behind that was that each reader can store the ._unitcell in the native format, then .dimensions is the format MDA expects.

richardjgowers · 2019-02-11T17:01:15Z

@zemanj using np.nan is an interesting idea for expressing semi periodic systems, it is missing data essentially? Or maybe 0.0 is better, ie this dimension is flat...

zemanj · 2019-02-11T17:48:23Z

@richardjgowers

I think the idea behind that was that each reader can store the ._unitcell in the native format, then .dimensions is the format MDA expects.

if that's the case, then we should not convert the dtype in the dimensions setter but in the getter.

using np.nan is an interesting idea for expressing semi periodic systems, it is missing data essentially? Or maybe 0.0 is better, ie this dimension is flat...

We might run into problems with matrix representations if we choose np.nan for absent periodicity.
If we want periodic poundary conditions in less than three dimensions, we probably won't get around having an extra kind of "periodicity mask" that is applied to the box before applying PBC.
Think of 3d systems with walls, for example. There, you want distances to be non-periodic only in the direction normal to the walls, so setting the dimensions to zero in that direction is correct for distance calculations. If you want to compute the volume, however, setting that dimension to zero (or np.nan) will be wrong.

richardjgowers · 2019-02-11T19:01:32Z

@zemanj sure but I also don't like having it in the setter because we have too many astype calls :) Really we should act like numpy and handle whatever dtypes and return the appropriate dtypes, eg C++ templates/Cython fused types etc. But that's not a small PR to suggest.

Maybe just hacking the setter/init to force a float32 dtype is a nice fix for today's needs.

zemanj mentioned this issue Feb 28, 2019

Sp box #2213

Merged

4 tasks

zemanj closed this as completed in #2213 Feb 28, 2019

zemanj mentioned this issue Mar 1, 2019

Universe.dimensions: enhancements, guarantees, automatic conversions #2214

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent dtype for universe.dimensions #2190

Inconsistent dtype for universe.dimensions #2190

PicoCentauri commented Jan 30, 2019

jbarnoud commented Jan 30, 2019

PicoCentauri commented Jan 30, 2019

richardjgowers commented Jan 30, 2019

zemanj commented Jan 30, 2019

richardjgowers commented Jan 30, 2019 via email

zemanj commented Feb 6, 2019

richardjgowers commented Feb 6, 2019

zemanj commented Feb 11, 2019

jbarnoud commented Feb 11, 2019

zemanj commented Feb 11, 2019 •

edited

zemanj commented Feb 11, 2019 •

edited

richardjgowers commented Feb 11, 2019

richardjgowers commented Feb 11, 2019

zemanj commented Feb 11, 2019

richardjgowers commented Feb 11, 2019

Inconsistent dtype for universe.dimensions #2190

Inconsistent dtype for universe.dimensions #2190

Comments

PicoCentauri commented Jan 30, 2019

jbarnoud commented Jan 30, 2019

PicoCentauri commented Jan 30, 2019

richardjgowers commented Jan 30, 2019

zemanj commented Jan 30, 2019

richardjgowers commented Jan 30, 2019 via email

zemanj commented Feb 6, 2019

richardjgowers commented Feb 6, 2019

zemanj commented Feb 11, 2019

jbarnoud commented Feb 11, 2019

zemanj commented Feb 11, 2019 • edited

zemanj commented Feb 11, 2019 • edited

richardjgowers commented Feb 11, 2019

richardjgowers commented Feb 11, 2019

zemanj commented Feb 11, 2019

richardjgowers commented Feb 11, 2019

zemanj commented Feb 11, 2019 •

edited

zemanj commented Feb 11, 2019 •

edited