Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCDFReader velocities can sometimes be improperly scaled #2323

Open
IAlibay opened this issue Aug 9, 2019 · 2 comments

Comments

@IAlibay
Copy link
Contributor

commented Aug 9, 2019

Overview

So whilst I was working on a new NCRST reader, I noticed that the NCDFReader appears to not be properly handling the reading of velocities, leading to values that don't adhere to MDAnalysis velocity units standards (usually Angstrom per AKMA time instead of Angstrom per ps).

A little bit of background:

As per the AMBER NetCDF convention (http://ambermd.org/netcdf/nctraj.xhtml) all trajectory data variables can have a scale_factor value associated with them, which is essentially a factor by which stored data should be multiplied by on read. From what I can tell, for most variables the default behaviour of AMBER MD engines is to forgo the use of scale_factor attributes, storing the values as they are in the NetCDF file. The one exception to this is velocities, where the values are stored in units of Angstrom per AKMA time unit and given a scale_factor of 20.455 in order to obtain Angstrom per ps values.

MDAnalysis' NCDFReader current approach to scale_factor, is to not handle them and throw a NotImplementedError. However, the current check for this is only done on the coordinates data variable rather than all of them (see lines 518-519), which means that velocities get read irrespective of whether or not a scale_factor attribute exists and are always treated as values of units Angstrom per ps.

Proposed fix

I am currently working on a temporary hot-fix for this. What I propose to do is to extend the scale_factor check to all NetCDF file variables, if it exists for the velocities data and is equal to 20.455 overwrite self.units['velocity'] from Angstrom/ps to Angstrom/AKMA. Otherwise, throw a NotImplementedError.

In the long run, I will see what I can do to write a clean NCDFReader implementation that fully implements handling of scale_factor.

Expected behavior

MDAnalysis universe data should always adhere to the units defined here: https://www.mdanalysis.org/mdanalysis/documentation_pages/units.html

Actual behavior

Velocities can by off by a factor of scale_factor from standard units.

Code to reproduce the behavior

[1]: import MDAnalysis as mda

[2]: u = mda.Universe('../../ace_tip3p.parm7', 'mdcrd.nc')

# Values stored in universe (should be A/ps)
[3]: u.atoms[0].velocity
[3]: array([-0.477188  , -0.8108647 , -0.24172044], dtype=float32)

[4]: from scipy.io import netcdf

[5]: read_f = netcdf.netcdf_file('mdcrd.nc')

# This is how the data is stored in the file
[6]: read_f.variables['velocities'][0][0]
[6]: array([-0.477188  , -0.8108647 , -0.24172044], dtype=float32)

# This is the scale factor
[7]: read_f.variables['velocities'].scale_factor
[7]: 20.455

# This is the intended velocity units (post scaling)
[8]: read_f.variables['velocities'].units
[8]: b'angstrom/picosecond'

Currently version of MDAnalysis

  • Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)")
    0.19.3-dev
  • Which version of Python (python -V)?
    3.7.3
  • Which operating system?
    Ubuntu 18.04 LTS

Test files

MDA_VelNetCDF_Issue.zip

@richardjgowers

This comment has been minimized.

Copy link
Member

commented Aug 9, 2019

A fix here sounds good.

In general, handling of units by MDAnalysis can be improved. I think we try and coerce into the formats you listed, but it's often difficult to know what units things were saved in, so often we just pass through the raw values...

@IAlibay

This comment has been minimized.

Copy link
Contributor Author

commented Aug 9, 2019

Unfortunately this is one of those super obscure trajectory format things, I probably would have never noticed if I hadn't spent too long looking at the convention docs.

In cases where the input units aren't specified in the trajectory (or it's API), it might be overkill but maybe we could just throw a warning to users telling them the assumptions being made? It's probably a big undertaking, but if there's a consensus on how best to do it, I'm willing to have a go at it for the trajectory formats I'm more familiar with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.