Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long term solution for units in AuxReaders and in general #3792

Open
hmacdope opened this issue Aug 26, 2022 · 9 comments
Open

Long term solution for units in AuxReaders and in general #3792

hmacdope opened this issue Aug 26, 2022 · 9 comments
Labels
Component-Aux interoperability important for making MDAnalysis work with other packages and tools
Milestone

Comments

@hmacdope
Copy link
Member

Is your feature request related to a problem?

Some discussions on #3749 brought up conversion of units in the AuxReaders. Our current solution used in the coordinate readers is to hardcode the units for the relevant format in and then convert while required.

While this may be an OK solution for the coordinates where the units are relatively simple, the more exotic units that are possible to read in an EDR file for example make this process more complicated. This is a general problem for all proposed AuxReaders, as they are generally aimed at providing an interface to those "more exotic" pieces of information obtained from a simulation.

Additionally this solution doesn't scale well to packages like LAMMPs where you can use up to 8 different unit systems. We will probably need consider if some kind of units package is the best way to deal with this long term.

Use of unyt or pint where raised in the related issue #2230

@hmacdope hmacdope added Component-Aux interoperability important for making MDAnalysis work with other packages and tools labels Aug 26, 2022
@hmacdope hmacdope mentioned this issue Aug 26, 2022
9 tasks
@hmacdope
Copy link
Member Author

Personally I think we will need to move to a units management system at some point.

@BFedder
Copy link
Contributor

BFedder commented Aug 26, 2022

We'd also need to decide if more exotic units should be converted or not. Some of the units that can occur in EDR files are

  • kJ/mol
  • K
  • bar
  • unitless
  • nm
  • nm ^ 3
  • kg / m^3
  • bar * nm
  • nm / ps
  • D

and a few of these are not yet mentioned in our docs.

What do we want to do with pressures and dipole moments, for example?

@hmacdope
Copy link
Member Author

hmacdope commented Aug 26, 2022

Most should be derivable from our base units, eg pressure is Force/Area so should be kJ/molA^3? And dipole moment eA. But I agree this is a bit of a challenge and we should enhance docs whatever we decide.

@hmacdope
Copy link
Member Author

I am going to add this as a discussion point for 3.0 ? If we are going to change to a units package 3.0 would be a good time to do it.

@hmacdope hmacdope changed the title Long term solution for units in AuxReaders Long term solution for units in AuxReaders and in general Aug 30, 2022
@hmacdope hmacdope added this to the Release 3.0 milestone Aug 30, 2022
@hmacdope
Copy link
Member Author

hmacdope commented Sep 1, 2022

Xarray just transitioned to using pint. Check out their blog on the subject.

@mattwthompson
Copy link
Contributor

Compared to my comment in the other thread - long enough ago I forget having written it - I'd recommend Pint as the one solution to reach for. unyt is not abandoned or anything but its maintenance has shifted from the original author to a team of other engineers in their organization. More importantly, its adoption is lower than Pint's, which I think has exceeded critical mass, has integration with tons of other packages, etc.

Something else to consider is that Pint makes it easy to create your own unit registry, if you need exotic stuff not explicitly included in the default pint.UnitRegistry. OpenFF has its own units package which is basically a slightly-custom defaults.txt with a little bit of machinery to expose it at the module level. This makes it easy do add shorthands like the various molar kJ/cal units, etc. The major downside of this approach is that unit registries are not directly compatible with each other (so I think an extra conversion layer would be needed):

>>> from openff.units import unit as unit1
>>> import pint
>>> unit2 = pint.UnitRegistry()
>>> 1.0 * unit1.meter + 1.0 * unit2.meter
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mattthompson/miniconda3/envs/openff-interchange-env/lib/python3.9/site-packages/pint/quantity.py", line 1154, in __add__
    return self._add_sub(other, operator.add)
  File "/Users/mattthompson/miniconda3/envs/openff-interchange-env/lib/python3.9/site-packages/pint/quantity.py", line 139, in wrapped
    return f(self, *args, **kwargs)
  File "/Users/mattthompson/miniconda3/envs/openff-interchange-env/lib/python3.9/site-packages/pint/quantity.py", line 1032, in _add_sub
    if not self._check(other):
  File "/Users/mattthompson/miniconda3/envs/openff-interchange-env/lib/python3.9/site-packages/pint/util.py", line 846, in _check
    raise ValueError(
ValueError: Cannot operate with Quantity and Quantity of different registries.

I'm a bit curious, though, is the proposal to change most uses of numpy.ndarray to pint.Quantity? That'd be a pretty significant change. If so, some other things that come to mind are

  • Some amount of NEP 18 is not implemented in Pint. numpy.linalg.norm comes to mind as one. I don't have a great sense of what is and isn't implemented - most of the basic stuff I do is supported.
  • Some stuff can be slow. I haven't yet figured out why my code is slow, so I can't be much more helpful than that, but when I profile stuff I write that makes heavy use of Quantity objects I find a lot of time is spent doing conversions. I'm probably doing some stuff the wrong way - but performance would something to pay attention to while working on an implementation.

If I'm assuming incorrectly and the idea is to use Pint or something when doing I/O or conversions but keep the core data as numpy.ndarrays with implicit units ... then I guess disregard half of this.

@hmacdope
Copy link
Member Author

hmacdope commented Sep 5, 2022

Thanks @mattwthompson!

Super useful info and good to get a perspective from an ecosystem that uses pint regularly.

Interesting comments about speed, given what you have said I imagine we won't switch to pint.Quantity but will just use for I/O, conversions and tagging objects with units.

@IAlibay
Copy link
Member

IAlibay commented Sep 5, 2022

Yeah, so I think in discussions with @orbeckst @richardjgowers and @lilyminium we were also quite wary of the extra costs, I think there was maybe some rough idea where we would keep .positions as-is but offer a new pint-ed attribute or something along those lines? (someone might remember that conversation better)

@lilyminium
Copy link
Member

My objections lay mainly along having to contort code to fit operations that pint doesn't support, similar to what @mattwthompson's mentioned. Given that we return a copy of a numpy array every time we request .positions speed hasn't been my main concern, especially since you can access the underlying _/magnitude reasonably easily.

xarray has gotten around some of this with their numpy support (np.linalg.norm strips units but does work), so just using their data types could be a cool direction to look into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component-Aux interoperability important for making MDAnalysis work with other packages and tools
Projects
None yet
Development

No branches or pull requests

5 participants