Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCDFReader velocities can sometimes be improperly scaled #2323

Closed
IAlibay opened this issue Aug 9, 2019 · 2 comments · Fixed by #2326
Closed

NCDFReader velocities can sometimes be improperly scaled #2323

IAlibay opened this issue Aug 9, 2019 · 2 comments · Fixed by #2326

Comments

@IAlibay
Copy link
Member

IAlibay commented Aug 9, 2019

Overview

So whilst I was working on a new NCRST reader, I noticed that the NCDFReader appears to not be properly handling the reading of velocities, leading to values that don't adhere to MDAnalysis velocity units standards (usually Angstrom per AKMA time instead of Angstrom per ps).

A little bit of background:

As per the AMBER NetCDF convention (http://ambermd.org/netcdf/nctraj.xhtml) all trajectory data variables can have a scale_factor value associated with them, which is essentially a factor by which stored data should be multiplied by on read. From what I can tell, for most variables the default behaviour of AMBER MD engines is to forgo the use of scale_factor attributes, storing the values as they are in the NetCDF file. The one exception to this is velocities, where the values are stored in units of Angstrom per AKMA time unit and given a scale_factor of 20.455 in order to obtain Angstrom per ps values.

MDAnalysis' NCDFReader current approach to scale_factor, is to not handle them and throw a NotImplementedError. However, the current check for this is only done on the coordinates data variable rather than all of them (see lines 518-519), which means that velocities get read irrespective of whether or not a scale_factor attribute exists and are always treated as values of units Angstrom per ps.

Proposed fix

I am currently working on a temporary hot-fix for this. What I propose to do is to extend the scale_factor check to all NetCDF file variables, if it exists for the velocities data and is equal to 20.455 overwrite self.units['velocity'] from Angstrom/ps to Angstrom/AKMA. Otherwise, throw a NotImplementedError.

In the long run, I will see what I can do to write a clean NCDFReader implementation that fully implements handling of scale_factor.

Expected behavior

MDAnalysis universe data should always adhere to the units defined here: https://www.mdanalysis.org/mdanalysis/documentation_pages/units.html

Actual behavior

Velocities can by off by a factor of scale_factor from standard units.

Code to reproduce the behavior

[1]: import MDAnalysis as mda

[2]: u = mda.Universe('../../ace_tip3p.parm7', 'mdcrd.nc')

# Values stored in universe (should be A/ps)
[3]: u.atoms[0].velocity
[3]: array([-0.477188  , -0.8108647 , -0.24172044], dtype=float32)

[4]: from scipy.io import netcdf

[5]: read_f = netcdf.netcdf_file('mdcrd.nc')

# This is how the data is stored in the file
[6]: read_f.variables['velocities'][0][0]
[6]: array([-0.477188  , -0.8108647 , -0.24172044], dtype=float32)

# This is the scale factor
[7]: read_f.variables['velocities'].scale_factor
[7]: 20.455

# This is the intended velocity units (post scaling)
[8]: read_f.variables['velocities'].units
[8]: b'angstrom/picosecond'

Currently version of MDAnalysis

  • Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)")
    0.19.3-dev
  • Which version of Python (python -V)?
    3.7.3
  • Which operating system?
    Ubuntu 18.04 LTS

Test files

MDA_VelNetCDF_Issue.zip

@richardjgowers
Copy link
Member

A fix here sounds good.

In general, handling of units by MDAnalysis can be improved. I think we try and coerce into the formats you listed, but it's often difficult to know what units things were saved in, so often we just pass through the raw values...

@IAlibay
Copy link
Member Author

IAlibay commented Aug 9, 2019

Unfortunately this is one of those super obscure trajectory format things, I probably would have never noticed if I hadn't spent too long looking at the convention docs.

In cases where the input units aren't specified in the trajectory (or it's API), it might be overkill but maybe we could just throw a warning to users telling them the assumptions being made? It's probably a big undertaking, but if there's a consensus on how best to do it, I'm willing to have a go at it for the trajectory formats I'm more familiar with.

orbeckst pushed a commit that referenced this issue Aug 19, 2019
…#2326)

* Fixes #2323 
* Now checks for presence of scale_factor in all AMBER NetCDF convention variables
* Adds support for scale_factor values, set in NCDFReader __init__ and multiplied through in _read_frame.
* assumes/uses AKMA velocity scaling by default (as produced by AMBER)
* Implements several new exception checks, mainly revolving around ensuring that units are properly adhered to.
* Also included in this PR is part framework for the introduction of an NCRST file reader (singleframe version of the NCDFReader for .ncrst files). These can be seen in the _NCDFGenerator class of test_netcdf.py
* Update changelog
* DeprecationWarning (1.0) if NCDF file uses units "degrees" instead of correct "degree" for attribute "cell_angle" (MDA < 0.20.0 would write degrees (see #2327), we will now correct these units and from 1.0 onwards raise an error if we encounter degrees instead of degree)
@IAlibay IAlibay mentioned this issue Jan 13, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants