Support units in binary operations #541

danielhuppmann · 2021-05-31T17:01:05Z

Please confirm that this PR has done the following:

Tests Added
Documentation Added
~~Name of contributors Added to AUTHORS.rst~~
Description in RELEASE_NOTES.md Added

Description of PR

This PR changes the implementation of the binary ops to support units (via iam-units and pint).

Default behaviour:

Cast values and units to a pint.Quantity using the iam-units registry and perform operations with pint unit handling (ignore_units=False)
Warning: By casting units to pints and then performing operations, the return-format may look different.
In some of the tests, EJ/yr is transformed to exajoule / year (in full format) or EJ / a (in compact format, implemented here). This is over-ridden for some methods (addition, multiplication, division) if all units are identical.
Passing a pint quantity as argument will work, e.g, df.add("Primary Energy", Quantity(2, "EJ/yr"), "new").
pint will raise appropriate errors if it doesn't know how to handle the units.
In line with common practice in the IAM community, a dimensionless quantity is marked as unit "" (empty string) rather than the word dimensionless (this works in a round-trip to-from-to pint).

Alternative over-ride:

A user can set ignore_units=True, in which case the operations will work just on the values and set unit to "unknown" (per suggestion by @znicholls in Automated unit handling in operations #536 (comment)) or to the value of ignore_units (if it's a string).

Known issues:

pandas fillna does not work with non-numeric values (in particular not with pint.Quantity).

Side note:

I simplified the test for the apply() function.

To-dos in later PRs (it's already quite a beast):

Make the fillna feature work with pint.
Add ignore_units to the apply() function.
Support custom pint registry instances.
Pass unit formatting options to pint as kwargs.

closes #537
closes #535

codecov · 2021-05-31T17:07:33Z

Codecov Report

Merging #541 (52101b0) into main (a4c6c71) will increase coverage by 0.0%.
The diff coverage is 98.2%.

@@          Coverage Diff          @@
##            main    #541   +/-   ##
=====================================
  Coverage   93.5%   93.5%           
=====================================
  Files         47      47           
  Lines       5167    5228   +61     
=====================================
+ Hits        4832    4891   +59     
- Misses       335     337    +2

Impacted Files	Coverage Δ
pyam/_ops.py	`95.0% <95.7%> (-1.0%)`	⬇️
pyam/core.py	`92.7% <100.0%> (+<0.1%)`	⬆️
pyam/units.py	`92.9% <100.0%> (ø)`
tests/test_ops.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a4c6c71...52101b0. Read the comment docs.

danielhuppmann · 2021-06-02T09:53:03Z

pinging @znicholls @gidden @pjuergens @coroa @khaeru - anyone have time for a review or comments on the strategy?

pyam/_ops.py

pjuergens · 2021-06-02T13:47:39Z

So it's just a comment and not a full review. I hope some of the others have time for an in-depth-review, cause I'd really like to use this feature :)

danielhuppmann · 2021-06-02T16:55:28Z

Thanks for the suggestion, @pjuergens - implemented it and updated the unit tests, because shorter units are generally preferable, I think. Also updated the description of this PR to reflect this change.

However, note that the underlying issue remains, because a user will generally not receive the same notation for a unit - in our case, EJ / yr is now returned as EJ / a. I don't see that as a problem, just something to be aware of...

znicholls

Looks good to me. My major concern would be the casting of each individual value to/from pint.Quantity. That looks like it could be an expensive operation as soon as the data is a moderate size. We could test that, or we could just not worry about it for now.

znicholls · 2021-06-02T23:11:57Z

pyam/_ops.py

+    therefore a pint.Quantity is transformed to a pd.Series of quantities.
+    """
+    if isinstance(a, Quantity):
+        return pd.Series([a] * len(b), index=b.index), b


We should probably add explicit tests for this.

Pint-pandas is an attempt to support something like this. At the moment I don't think it works for pyam because the entire series has to have the same unit, but maybe something to keep in mind for future.

danielhuppmann · 2021-06-03T06:34:24Z

Thanks for the comment @znicholls! I had a look at pint-pandas, but as you write, this assumes that an entire column in a dataframe has the same unit - this is not something that pyam satisfies, not even on a per-variable level. A user could currently have a scenario ensemble where "Primary Energy" is in EJ for some scenarios and "GWh" in others. Something to discuss for a pyam v2.0 release...

About speed and casting, I'd follow the strategy I was taught by @gidden - first make it work, then make it fast...

znicholls · 2021-06-03T06:39:53Z

About speed and casting, I'd follow the strategy I was taught by @gidden - first make it work, then make it fast...

It's a great strategy!

pjuergens

Basically just two minor comments concerning documentation. As far as I can tell the coding looks good to me. Thanks for the work!

pyam/_ops.py

pyam/core.py

Co-authored-by: pjuergens <74722312+pjuergens@users.noreply.github.com>

pjuergens

In my opinion it's good to go :)

danielhuppmann added 3 commits May 31, 2021 14:52

Use internal _data directly in units conversion module

d8edad1

Update units in addition tests

1dfaf74

Implement operations on units, deactivate fillna tests

7bb1333

danielhuppmann self-assigned this May 31, 2021

danielhuppmann added data back-end Anything related to the (timeseries) data back end implementation data-ops labels May 31, 2021

danielhuppmann added 5 commits June 2, 2021 09:37

Add ignore_units feature to all binary-ops

e90c575

Activate more tests

215e2aa

Implement fast-pass for some methods and re-activate fill-na tests

4897bdd

Add to release notes

7756593

Appease stickler

d377db8

danielhuppmann marked this pull request as ready for review June 2, 2021 09:42

One more pass at clean-up

84814cd

pjuergens reviewed Jun 2, 2021

View reviewed changes

pyam/_ops.py Outdated Show resolved Hide resolved

danielhuppmann added 2 commits June 2, 2021 18:48

Implement suggestion by @pjuergens to use compact pint unit format

3fffc95

Make black

1bccef6

znicholls reviewed Jun 2, 2021

View reviewed changes

pjuergens suggested changes Jun 4, 2021

View reviewed changes

pyam/_ops.py Outdated Show resolved Hide resolved

pyam/core.py Show resolved Hide resolved

danielhuppmann and others added 2 commits June 4, 2021 12:06

Add notes about unit-formatting to docstrings per comment by @pjuergens

b45f6ad

Minor edit by @pjuergems

a5b9614

Co-authored-by: pjuergens <74722312+pjuergens@users.noreply.github.com>

pjuergens approved these changes Jun 4, 2021

View reviewed changes

Add a See Also section to all ops functions and aggregate[_region]()

52101b0

danielhuppmann merged commit 8d8aa6b into IAMconsortium:main Jun 8, 2021

danielhuppmann deleted the ops/units branch June 8, 2021 03:54

danielhuppmann mentioned this pull request Jun 10, 2021

Automated unit handling in operations #536

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support units in binary operations #541

Support units in binary operations #541

danielhuppmann commented May 31, 2021 •

edited

Loading

codecov bot commented May 31, 2021 •

edited

Loading

danielhuppmann commented Jun 2, 2021

pjuergens commented Jun 2, 2021

danielhuppmann commented Jun 2, 2021

znicholls left a comment

znicholls Jun 2, 2021

danielhuppmann commented Jun 3, 2021

znicholls commented Jun 3, 2021

pjuergens left a comment

pjuergens left a comment

Support units in binary operations #541

Support units in binary operations #541

Conversation

danielhuppmann commented May 31, 2021 • edited Loading

Please confirm that this PR has done the following:

Description of PR

codecov bot commented May 31, 2021 • edited Loading

Codecov Report

danielhuppmann commented Jun 2, 2021

pjuergens commented Jun 2, 2021

danielhuppmann commented Jun 2, 2021

znicholls left a comment

Choose a reason for hiding this comment

znicholls Jun 2, 2021

Choose a reason for hiding this comment

danielhuppmann commented Jun 3, 2021

znicholls commented Jun 3, 2021

pjuergens left a comment

Choose a reason for hiding this comment

pjuergens left a comment

Choose a reason for hiding this comment

danielhuppmann commented May 31, 2021 •

edited

Loading

codecov bot commented May 31, 2021 •

edited

Loading