NCDFReader velocities can sometimes be improperly scaled #2323

IAlibay · 2019-08-09T10:08:28Z

Overview

So whilst I was working on a new NCRST reader, I noticed that the NCDFReader appears to not be properly handling the reading of velocities, leading to values that don't adhere to MDAnalysis velocity units standards (usually Angstrom per AKMA time instead of Angstrom per ps).

A little bit of background:

As per the AMBER NetCDF convention (http://ambermd.org/netcdf/nctraj.xhtml) all trajectory data variables can have a scale_factor value associated with them, which is essentially a factor by which stored data should be multiplied by on read. From what I can tell, for most variables the default behaviour of AMBER MD engines is to forgo the use of scale_factor attributes, storing the values as they are in the NetCDF file. The one exception to this is velocities, where the values are stored in units of Angstrom per AKMA time unit and given a scale_factor of 20.455 in order to obtain Angstrom per ps values.

MDAnalysis' NCDFReader current approach to scale_factor, is to not handle them and throw a NotImplementedError. However, the current check for this is only done on the coordinates data variable rather than all of them (see lines 518-519), which means that velocities get read irrespective of whether or not a scale_factor attribute exists and are always treated as values of units Angstrom per ps.

Proposed fix

I am currently working on a temporary hot-fix for this. What I propose to do is to extend the scale_factor check to all NetCDF file variables, if it exists for the velocities data and is equal to 20.455 overwrite self.units['velocity'] from Angstrom/ps to Angstrom/AKMA. Otherwise, throw a NotImplementedError.

In the long run, I will see what I can do to write a clean NCDFReader implementation that fully implements handling of scale_factor.

Expected behavior

MDAnalysis universe data should always adhere to the units defined here: https://www.mdanalysis.org/mdanalysis/documentation_pages/units.html

Actual behavior

Velocities can by off by a factor of scale_factor from standard units.

Code to reproduce the behavior

[1]: import MDAnalysis as mda

[2]: u = mda.Universe('../../ace_tip3p.parm7', 'mdcrd.nc')

# Values stored in universe (should be A/ps)
[3]: u.atoms[0].velocity
[3]: array([-0.477188  , -0.8108647 , -0.24172044], dtype=float32)

[4]: from scipy.io import netcdf

[5]: read_f = netcdf.netcdf_file('mdcrd.nc')

# This is how the data is stored in the file
[6]: read_f.variables['velocities'][0][0]
[6]: array([-0.477188  , -0.8108647 , -0.24172044], dtype=float32)

# This is the scale factor
[7]: read_f.variables['velocities'].scale_factor
[7]: 20.455

# This is the intended velocity units (post scaling)
[8]: read_f.variables['velocities'].units
[8]: b'angstrom/picosecond'

Currently version of MDAnalysis

Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)")
0.19.3-dev
Which version of Python (python -V)?
3.7.3
Which operating system?
Ubuntu 18.04 LTS

Test files

MDA_VelNetCDF_Issue.zip

The text was updated successfully, but these errors were encountered:

richardjgowers · 2019-08-09T13:41:03Z

A fix here sounds good.

In general, handling of units by MDAnalysis can be improved. I think we try and coerce into the formats you listed, but it's often difficult to know what units things were saved in, so often we just pass through the raw values...

IAlibay · 2019-08-09T14:36:01Z

Unfortunately this is one of those super obscure trajectory format things, I probably would have never noticed if I hadn't spent too long looking at the convention docs.

In cases where the input units aren't specified in the trajectory (or it's API), it might be overkill but maybe we could just throw a warning to users telling them the assumptions being made? It's probably a big undertaking, but if there's a consensus on how best to do it, I'm willing to have a go at it for the trajectory formats I'm more familiar with.

…#2326) * Fixes #2323 * Now checks for presence of scale_factor in all AMBER NetCDF convention variables * Adds support for scale_factor values, set in NCDFReader __init__ and multiplied through in _read_frame. * assumes/uses AKMA velocity scaling by default (as produced by AMBER) * Implements several new exception checks, mainly revolving around ensuring that units are properly adhered to. * Also included in this PR is part framework for the introduction of an NCRST file reader (singleframe version of the NCDFReader for .ncrst files). These can be seen in the _NCDFGenerator class of test_netcdf.py * Update changelog * DeprecationWarning (1.0) if NCDF file uses units "degrees" instead of correct "degree" for attribute "cell_angle" (MDA < 0.20.0 would write degrees (see #2327), we will now correct these units and from 1.0 onwards raise an error if we encounter degrees instead of degree)

orbeckst added Component-Readers Format-Amber defect labels Aug 9, 2019

IAlibay mentioned this issue Aug 15, 2019

Fixes velocity units and adds support for scale_factors in NCDFReader #2326

Merged

4 tasks

orbeckst closed this as completed in #2326 Aug 19, 2019

IAlibay mentioned this issue Jan 13, 2020

Release 1.0 #2443

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NCDFReader velocities can sometimes be improperly scaled #2323

NCDFReader velocities can sometimes be improperly scaled #2323

IAlibay commented Aug 9, 2019

richardjgowers commented Aug 9, 2019

IAlibay commented Aug 9, 2019

NCDFReader velocities can sometimes be improperly scaled #2323

NCDFReader velocities can sometimes be improperly scaled #2323

Comments

IAlibay commented Aug 9, 2019

richardjgowers commented Aug 9, 2019

IAlibay commented Aug 9, 2019