Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle different units across MD codes? #10

Closed
jhenin opened this issue Jun 13, 2013 · 34 comments
Closed

How to handle different units across MD codes? #10

jhenin opened this issue Jun 13, 2013 · 34 comments
Assignees
Labels
documentation Only affects manual and other docs

Comments

@jhenin
Copy link
Member

jhenin commented Jun 13, 2013

At this point, we follow the NAMD conventions of Angstroms, kcal/mol, angles in degrees. Is that acceptable for other codes? How flexible can and should we make that?

@akohlmey
Copy link
Member

you could use the system that lammps is using, where it has multiple "unit-sets" that define scaling factors, e.g. to get a force or an energy from positions and masses.
http://git.icms.temple.edu/git/?p=lammps-ro.git;a=blob;f=src/update.cpp;h=ad588429375fd857fbb904313af4164526f47a62;hb=HEAD#l116

for codes (e.g. gromacs. namd) that have their own fixed unit sets, you can identify easily which of the available unit sets is supposed to be used during the initialization of the proxy class, otherwise, it can be communicated.

i would consider angles (and dihedrals, impropers) a different thing, since - unlike positions, forces, masses and energies - those are not communicated across the embedding MD code and colvars. so it is mostly important to document carefully how those are defined, in case there could be ambiguities and - perhaps - offer a unit specifier in the input, but that is probably more trouble than worth it. cp2k does that and it is a bit of a mess and confusing to many people.

probably the best strategy to proceed is to first make a list of what codes you want to interface to, and decide how much effort you want to invest. for a code like LAMMPS, for example, we can also make the interface refuse to work, if an incompatible unit set it used.

@giacomofiorin
Copy link
Member

I think the angles can safely stay in degrees. Using radians is impractical for inputting "exact" angles such as 90°, 120°, etc. Multiples of pi are more convenient, but have the same disadvantages as degrees to evaluate functions.

Regarding the unit of length and energy: we aren't following the NAMD convention in a hard-coded way. Unless anyone has changed the code everywhere and not committed yet, those two units are the same the MD program uses: if LAMMPS switches units, so will the colvars module. (Unless, Axel, the unit conversion is only done during I/O?)

The unit_angstrom() proxy function is only used to convert hard-coded default values into the units of the hosting program (e.g. nm, au, etc).

@jhenin
Copy link
Member Author

jhenin commented Jun 13, 2013

Right now the documentation mentions Angstroms and kcal/mol all the time. This would have to be adapted somehow.

@giacomofiorin
Copy link
Member

That is true. But it is only a drawback of the documentation.

The only default energy constants are metadynamics' hillWeight, where I put a strong warning against blindingly using the default, and harmonic's forceConstant, for which at this point we can even remove the default value of 1 "kcal/mol" and let it be defined explicitly.

For the doc: how about adding a small subsection of "General parameters and input/output files", where we define what the units of length and energy are with a macro, and then replace \AA and kcal/mol in the following with that? I don't have extensive experience with all the units in LAMMPS, and I like scaled units more than the average chemist does: so you two please figure out how to name these macros and how to document them. I'm also open to rediscuss the scaled force constant of "harmonic", for as long as the discussion converges quickly.

Whatever is decided, I'll use it to change how the hillWeight option in metadynamics is treated.

@giacomofiorin giacomofiorin added documentation Only affects manual and other docs and removed enhancement question labels Jun 8, 2016
@giacomofiorin
Copy link
Member

giacomofiorin commented Jun 8, 2016

Solved by 80b8acf

@giacomofiorin
Copy link
Member

The various extended-Lagrangian options are not compatible with multiple unit systems, and while they are intuitive in the fs-based ones (NAMD, LAMMPS with units real), their values are not easy to convert to other systems to override default values.

@jhenin
Copy link
Member Author

jhenin commented Apr 6, 2017

Situation clarified a little by 376e173. The only limitation at this point is extendedLangevinDamping required in ps-1, which I think is not too much of a constraint. We take advantage of the proxy converting internal time units to fs.

@jhenin jhenin closed this as completed Apr 6, 2017
@giacomofiorin
Copy link
Member

What about extendedTimeConstant, which assumes fs?

\item %
  \keydef
    {extendedTimeConstant}{%
    \texttt{colvar}}{%
    Oscillation period of the fictitious particle (fs)}{%
    positive decimal}{%
    \texttt{200}}{%

Also, the ps-fs issue is a small thing but it is still rather NAMD-specific. In the LAMMPS version of the Langevin thermostat:
http://lammps.sandia.gov/doc/fix_langevin.html
1/gamma is the parameter, given in time units (which is fs for the "units real" system). There were already a couple of people using the Langevin thermostat in Colvars with LAMMPS, and those did read the manual carefully to know the difference. But confusion is easy, unfortunately.

@giacomofiorin giacomofiorin reopened this Apr 6, 2017
@jhenin
Copy link
Member Author

jhenin commented Apr 7, 2017

I could have the proxy expose a conversion factor for time units, and then take user input in whatever LAMMPS is setup to use. That would break current inputs that provide fs and ps-1, though. We need a good medium to warn our LAMMPS users of this change.

@giacomofiorin
Copy link
Member

For breaking with a previous convention the cleanest way would be to have a new keyword replace the old one (as we did with outputTotalForce vs outputSystemForce). LAMMPS users generally write their own input files and can edit them if needed, especially for a seldom used feature (which you are now proposing to use more often for eABF).

As one possibility, the proxy already provides the integration time step through colvarproxy::dt(), which could be used as a time unit by expressing relevant time constants in numbers of MD steps (see e.g. targetNumSteps) as the values of the new keywords. For NAMD, the unit of time is always fs, and it is even possible to automatically convert the values from the old keywords to the new keywords and preserve full backward compatibility. (You don't want to break an ongoing simulation of step5_assembly.psf...).

I propose the following new keywords:

  • extendedTimeConstant (in fs) -> extendedTimePeriod (in # of steps or in fs)
  • extendedLangevinDamping (in ps^-1) -> extendedLangevinDampTime (in # of steps or in fs)
    where I'm OK with either steps or fs.

Once all time constants are at least in the same unit, I can add support for other units options for LAMMPS if needed, and disable the older keywords. Few have used them yet, and I can simply warn them as courtesy so that they do not run into an error condition at runtime.

@jhenin
Copy link
Member Author

jhenin commented Apr 7, 2017

Sounds good. I'm sure I've simulated step5_assembly.psf before... who am I to judge.

I think the "fs (NAMD) or LAMMPS time unit" option is ideal. But then I would introduce the new keywords at the same time as support for the LAMMPS units, so that the new keywords have their final meaning from the onset. That's something I can do btw, it's not the more arcane parts of the LAMMPS proxy.

@giacomofiorin
Copy link
Member

If you use steps, then there is nothing to do for LAMMPS.

If you keep using fs, the LAMMPS proxy will need to get the unit style from the LAMMPS object. The Update object (pointer is stored in the LAMMPS object) has a unit_style member that can be accessed (see e.g. in finish.cpp).

@jhenin
Copy link
Member Author

jhenin commented Apr 7, 2017

I see the proxy already accesses _lmp->force->femtosecond. My understanding is that LAMMPS takes care of setting that according to unit_style, right?

@giacomofiorin
Copy link
Member

Theoretically yes (see update.cpp or ask @akohlmey).

But depending on what operations we do with the timestep I'm not sure if it's always safe to count on it for e.g. units si...

This is why I'd like to disable the default values for the new keywords in LAMMPS, if you do keep fs units for them.

@jhenin
Copy link
Member Author

jhenin commented Apr 7, 2017

The only usage we make of timestep is within the extended Lagrangian integrator, the plan is to keep it that way. And the unit would be 'time unit' in general, and fs only in NAMD.

I'm ok to have no default value - I can always recommend one in the doc, specifying its unit.

@giacomofiorin
Copy link
Member

If you are removing the default values the issue is most definitely closed with the commit that does it.

@giacomofiorin
Copy link
Member

Hi, can you push any commits done for this on a branch? I think that with the refactored colvarproxy class, unit handling would go nicely in the colvarproxy_system class.

@jhenin
Copy link
Member Author

jhenin commented Nov 19, 2018

This issue has gained a new degree of urgency with the Gromacs proxy. The current strategy that mostly uses the back-end's units both internally and for IO purposes makes it complicated to run the tests and compare their results.

At this point my favored approach would be to create options to specify units for I/O, but keep the back-end units internally to avoid the overhead of unit conversion when talking to the back-end.
Anyway we approach it, CVM will have to become more explicitly aware of the back-end units to perform the conversions (including energy and force units).

@giacomofiorin
Copy link
Member

Well, the introduction of different units for I/O will also need to be tested carefully, requiring comparable work to the testing of a new interface.

Can you try running the tests in the native units of the host code, and then load the trajectories using the plot_colvars_traj.py script? Converting the units by multiplying the resulting NumPy arrays by the appropriate conversion factor. That can be automated to some extent and become prohibitive only if you want to run the full array of tests, which is mostly testing internal features, not interface code.

As a general rule, I'd like to avoid major code changes without first exploring alternatives to spiff for automated yet accurate testing.

@giacomofiorin giacomofiorin mentioned this issue Jan 31, 2019
5 tasks
@giacomofiorin
Copy link
Member

(Continuing from PR #190)

I think it's best to keep using the same units as the MD engine: it would be a nightmare to switch, and also to deal with script callbacks. More in general, Colvars has so far been well integrated in each engine (with varying depth) and for the most part interchangeable with its other features.  Specifying a separate unit system for runtime objects would lead to an independent code, which is linked to the main engine following something akin a multi-physics simulation: this use case is already well covered by PLUMED.

The issue of moving input files from one code to another is real, but is mostly arising now thanks to the Dashboard, which enables researchers to write complex inputs without quantitative and technical skills sufficient to convert their input's units with small effort.

It should be possible, then, to reuse config files that work in VMD in other codes with different units. There are possibly two alternatives to support it:

  • Specify a unit system when loading configuration strings only (defaulting to the one of the MD engine). This requires a bit of work to handle, but the places where to handle are clear: these are the calls to get_keyval() and the coordinate file loading function (mostly the internal XYZ one: LAMMPS lacks a PDB reader, and I'm sure the GROMACS one handles unit conversions). After Parser improvements #196 is merged, adding this should be manageable. I'm thinking of a units flag that is only valid in the scope of the string or file that contains it:
units vmd  # This seems a well-defined identifier to me
colvar { ... usual syntax ... }
harmonic { ... same ... }
  • The other option is to keep input parsing as it is, but handle the conversion in the Dashboard, via a mix of Tcl and C++ code. This is arguably more maintainable (bugs and omissions can be caught and fixed by upgrading the Dashboard itself without waiting for a new VMD release), but also probably more work.

@jhenin
Copy link
Member Author

jhenin commented Jan 31, 2019

I'm thinking of a units flag that is only valid in the scope of the string or file that contains it

This sounds like a very good idea to me, because that way the files remain portable. I prefer it to the other option, where you'd have to dedicate a file specifically to Gromacs when saving it, and then if you read it again in VMD you'll get another unit conflict.

So yeah, I vote for option #1 hands down. It's also extensible, reasonably future-proof.

As to the name "vmd", I'm not as sure because vmd doesn't really care about energy units. How about two arguments, length and energy?
units nm kJ/mol

@jhenin
Copy link
Member Author

jhenin commented Feb 27, 2019

Decision reached from in-person conversation: detect incompatible unit systems with units keyword, throw error.

Rationale: we have no mechanism to do unit conversion internally, because we have no way to track the dimensionality of our numerical parameters (especially with user-defined functions).

@jhenin jhenin self-assigned this Oct 17, 2019
@jhenin
Copy link
Member Author

jhenin commented Oct 17, 2019

This question is arising again now that we have a user base trying to make VMD and Gromacs interoperate.

@giacomofiorin
Copy link
Member

Interestingly, this issue is approximately as old as this one:
openmm/openmm#204
and it's quite telling an issue with the unit system is still open in a heavily developed software package.

Having said that, let's assume that a new units keyword takes care of this issue for good (will it?). Here are some proposed values for that keyword:

  • units gromacs
  • units lj
  • units metal
  • units namd
  • units real
  • units vmd
    (Listing here for brevity only the three most relevant unit systems from LAMMPS, but it's easy to support all the others). https://lammps.sandia.gov/doc/units.html

The logic could be the following:

  • With no value, units defaults to the value provided by the colvarproxy implementation (hard-coded for most backends, dynamic for LAMMPS).
  • When a value is provided, an error is raised unless the value coincides with the default.
  • vmd and namd should be treated as synonymous in my opinion, but units real for LAMMPS is slightly different (e.g. electric field) but as far as current input values supported by Colvars they are the same. So that's the dilemma...

When a solution is agreeable, tt's probably best to take the opportunity to also update the values of fundamental constants, such as boltzmann(), and just follow the value from the MD engine. See here: openmm/openmm#2427
Uniformity between MD engines is a long way to go, but uniformity between Colvars and them should be much easier (and clearer for the user).

@jhenin
Copy link
Member Author

jhenin commented Oct 22, 2019

This is roughly what I'm tending towards. With the exception of VMD, which I think can actually handle different unit systems. That's key to letting Gromacs users benefit from the VMD interface, that is the reason I'm getting into this again.

@giacomofiorin
Copy link
Member

In the units branch I see changes that go towards supporting using a different unit system than the MD engine. Obviously, you marked these as WIP for now. But even when finished, it will be difficult to handle all cases, including e.g. callbacks to functions that could rely on either unit system.

How about beginning from just using the units keyword in the configuration and state file to detect incompatibilities? It needs to make it into at least one stable version for each engine before it can begin to be relied upon by the Dashboard.

@jhenin
Copy link
Member Author

jhenin commented Nov 26, 2019

While working on the fairly simple-minded technical solution in #295 I've come across a possible issue in the current master branch. The XYZ reader of colvarmodule does not do any coordinate conversion, implicitly assuming that the coordinates are in the Colvars length unit. However the closest thing there is to a convention, as far as I can see, is that XYZ files have coordinates in Angstrom. VMD, used by different communities, will read them and assume Angstroms.

The Gromacs integration branch already has a change where cvm::load_coords_xyz does the conversion from Angstrom to the current unit. Once merged this would also affect LAMMPS. I suspect that a majority of LAMMPS/Colvars users work in Angstroms, but I'm not sure. My favored course of action would be to change the XYZ parser to always convert from Angstrom, but maybe communicate that to LAMMPS users through some channel.

@giacomofiorin @akohlmey Thoughts?

@jhenin
Copy link
Member Author

jhenin commented Nov 26, 2019

How about beginning from just using the units keyword in the configuration and state file to detect incompatibilities? It needs to make it into at least one stable version for each engine before it can begin to be relied upon by the Dashboard.

With an updated VMD and not-updated MD engines, the worst case scenario is that the units keyword written by the Dashboard must be deleted manually for the MD engine to accept the input. I think this is necessary, as it forces (mostly Gromacs) users to check the compatibility of units manually, until Colvars bundled with the MD engine can do it for them. This will not annoy users who leave the default VMD units: in that case the Dashboard does not write the units keyword.

@akohlmey
Copy link
Member

Once merged this would also affect LAMMPS. I suspect that a majority of LAMMPS/Colvars users work in Angstroms, but I'm not sure. My favored course of action would be to change the XYZ parser to always convert from Angstrom, but maybe communicate that to LAMMPS users through some channel.

@giacomofiorin @akohlmey Thoughts?

In LAMMPS one can query the current unit setting and there are some conversion factors already available. Thus the LAMMPS colvars proxy already contains code that assembles a conversion factor from native LAMMPS units to angstrom, the time step converted into femtoseconds and the value of the Boltzmann constant in native units.

@jhenin
Copy link
Member Author

jhenin commented Nov 26, 2019

the LAMMPS colvars proxy already contains code

@akohlmey It does, but currently the XYZ parser does not use that code, implicitly assuming that coordinates in the XYZ files do not need to be converted. I'm proposing to change that.

@giacomofiorin
Copy link
Member

In principle, forcing the XYZ reader to only use Angstrom units seems reasonable. But before making a final decision it's best to consider also the inline coordinate reader (e.g. refPositions vs. refPositionsFile). I doubt that it would make sense to make that option use Angstrom units as well.

Note: while I am strongly in favor of having well-defined units for each file format, the XYZ format is used in several more codes than VMD; VMD probably just has the most users. In some cases, atomic units are used (although I agree that's not good practice).

So in summary I'd be OK with putting the coordinate conversion of the XYZ reader into master as part of #295. Just want to make sure that it's OK to leave the inline reader as is. I doubt it's used very much because it is most useful only for very few atoms: however, it could also serve the purpose to provide coordinates in "native" units if Angstrom is not useful.

@jhenin
Copy link
Member Author

jhenin commented Nov 26, 2019

My reasoning about the inline reader is that coordinates specified within a config string should be in the same unit system as everything else within that same config string. Whereas XYZ files can be used independently, and usually come from a different workflow.

@giacomofiorin
Copy link
Member

I pushed a commit to #295 that also contains a warning. To avoid flooding the output, the warning is printed only upon first use of cvm::load_coords_xyz(). What do you think?

@jhenin
Copy link
Member Author

jhenin commented Dec 6, 2019

Dealt with by #295 and #303

@jhenin jhenin closed this as completed Dec 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Only affects manual and other docs
Projects
None yet
Development

No branches or pull requests

3 participants