Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support units in binary operations #541

Merged
merged 14 commits into from
Jun 8, 2021

Conversation

danielhuppmann
Copy link
Member

@danielhuppmann danielhuppmann commented May 31, 2021

Please confirm that this PR has done the following:

  • Tests Added
  • Documentation Added
  • Name of contributors Added to AUTHORS.rst
  • Description in RELEASE_NOTES.md Added

Description of PR

This PR changes the implementation of the binary ops to support units (via iam-units and pint).

Default behaviour:

  • Cast values and units to a pint.Quantity using the iam-units registry and perform operations with pint unit handling (ignore_units=False)
    Warning: By casting units to pints and then performing operations, the return-format may look different.
    In some of the tests, EJ/yr is transformed to exajoule / year (in full format) or EJ / a (in compact format, implemented here). This is over-ridden for some methods (addition, multiplication, division) if all units are identical.
  • Passing a pint quantity as argument will work, e.g, df.add("Primary Energy", Quantity(2, "EJ/yr"), "new").
  • pint will raise appropriate errors if it doesn't know how to handle the units.
  • In line with common practice in the IAM community, a dimensionless quantity is marked as unit "" (empty string) rather than the word dimensionless (this works in a round-trip to-from-to pint).

Alternative over-ride:

Known issues:

  • pandas fillna does not work with non-numeric values (in particular not with pint.Quantity).

Side note:

  • I simplified the test for the apply() function.

To-dos in later PRs (it's already quite a beast):

  • Make the fillna feature work with pint.
  • Add ignore_units to the apply() function.
  • Support custom pint registry instances.
  • Pass unit formatting options to pint as kwargs.

closes #537
closes #535

@codecov
Copy link

codecov bot commented May 31, 2021

Codecov Report

Merging #541 (52101b0) into main (a4c6c71) will increase coverage by 0.0%.
The diff coverage is 98.2%.

Impacted file tree graph

@@          Coverage Diff          @@
##            main    #541   +/-   ##
=====================================
  Coverage   93.5%   93.5%           
=====================================
  Files         47      47           
  Lines       5167    5228   +61     
=====================================
+ Hits        4832    4891   +59     
- Misses       335     337    +2     
Impacted Files Coverage Δ
pyam/_ops.py 95.0% <95.7%> (-1.0%) ⬇️
pyam/core.py 92.7% <100.0%> (+<0.1%) ⬆️
pyam/units.py 92.9% <100.0%> (ø)
tests/test_ops.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a4c6c71...52101b0. Read the comment docs.

@danielhuppmann danielhuppmann self-assigned this May 31, 2021
@danielhuppmann danielhuppmann added data back-end Anything related to the (timeseries) data back end implementation data-ops labels May 31, 2021
@danielhuppmann danielhuppmann marked this pull request as ready for review June 2, 2021 09:42
@danielhuppmann
Copy link
Member Author

pinging @znicholls @gidden @pjuergens @coroa @khaeru - anyone have time for a review or comments on the strategy?

pyam/_ops.py Outdated Show resolved Hide resolved
@pjuergens
Copy link
Contributor

So it's just a comment and not a full review. I hope some of the others have time for an in-depth-review, cause I'd really like to use this feature :)

@danielhuppmann
Copy link
Member Author

Thanks for the suggestion, @pjuergens - implemented it and updated the unit tests, because shorter units are generally preferable, I think. Also updated the description of this PR to reflect this change.

However, note that the underlying issue remains, because a user will generally not receive the same notation for a unit - in our case, EJ / yr is now returned as EJ / a. I don't see that as a problem, just something to be aware of...

Copy link
Collaborator

@znicholls znicholls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. My major concern would be the casting of each individual value to/from pint.Quantity. That looks like it could be an expensive operation as soon as the data is a moderate size. We could test that, or we could just not worry about it for now.

therefore a pint.Quantity is transformed to a pd.Series of quantities.
"""
if isinstance(a, Quantity):
return pd.Series([a] * len(b), index=b.index), b
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add explicit tests for this.

Pint-pandas is an attempt to support something like this. At the moment I don't think it works for pyam because the entire series has to have the same unit, but maybe something to keep in mind for future.

@danielhuppmann
Copy link
Member Author

Thanks for the comment @znicholls! I had a look at pint-pandas, but as you write, this assumes that an entire column in a dataframe has the same unit - this is not something that pyam satisfies, not even on a per-variable level. A user could currently have a scenario ensemble where "Primary Energy" is in EJ for some scenarios and "GWh" in others. Something to discuss for a pyam v2.0 release...

About speed and casting, I'd follow the strategy I was taught by @gidden - first make it work, then make it fast...

@znicholls
Copy link
Collaborator

About speed and casting, I'd follow the strategy I was taught by @gidden - first make it work, then make it fast...

It's a great strategy!

Copy link
Contributor

@pjuergens pjuergens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically just two minor comments concerning documentation. As far as I can tell the coding looks good to me. Thanks for the work!

pyam/_ops.py Outdated Show resolved Hide resolved
pyam/core.py Show resolved Hide resolved
danielhuppmann and others added 2 commits June 4, 2021 12:06
Co-authored-by: pjuergens <74722312+pjuergens@users.noreply.github.com>
Copy link
Contributor

@pjuergens pjuergens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion it's good to go :)

@danielhuppmann danielhuppmann merged commit 8d8aa6b into IAMconsortium:main Jun 8, 2021
@danielhuppmann danielhuppmann deleted the ops/units branch June 8, 2021 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data back-end Anything related to the (timeseries) data back end implementation data-ops
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants