-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support units in binary operations #541
Conversation
Codecov Report
@@ Coverage Diff @@
## main #541 +/- ##
=====================================
Coverage 93.5% 93.5%
=====================================
Files 47 47
Lines 5167 5228 +61
=====================================
+ Hits 4832 4891 +59
- Misses 335 337 +2
Continue to review full report at Codecov.
|
pinging @znicholls @gidden @pjuergens @coroa @khaeru - anyone have time for a review or comments on the strategy? |
So it's just a comment and not a full review. I hope some of the others have time for an in-depth-review, cause I'd really like to use this feature :) |
Thanks for the suggestion, @pjuergens - implemented it and updated the unit tests, because shorter units are generally preferable, I think. Also updated the description of this PR to reflect this change. However, note that the underlying issue remains, because a user will generally not receive the same notation for a unit - in our case, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. My major concern would be the casting of each individual value to/from pint.Quantity. That looks like it could be an expensive operation as soon as the data is a moderate size. We could test that, or we could just not worry about it for now.
therefore a pint.Quantity is transformed to a pd.Series of quantities. | ||
""" | ||
if isinstance(a, Quantity): | ||
return pd.Series([a] * len(b), index=b.index), b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add explicit tests for this.
Pint-pandas is an attempt to support something like this. At the moment I don't think it works for pyam because the entire series has to have the same unit, but maybe something to keep in mind for future.
Thanks for the comment @znicholls! I had a look at pint-pandas, but as you write, this assumes that an entire column in a dataframe has the same unit - this is not something that pyam satisfies, not even on a per-variable level. A user could currently have a scenario ensemble where "Primary Energy" is in EJ for some scenarios and "GWh" in others. Something to discuss for a pyam v2.0 release... About speed and casting, I'd follow the strategy I was taught by @gidden - first make it work, then make it fast... |
It's a great strategy! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically just two minor comments concerning documentation. As far as I can tell the coding looks good to me. Thanks for the work!
Co-authored-by: pjuergens <74722312+pjuergens@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion it's good to go :)
Please confirm that this PR has done the following:
Name of contributors Added to AUTHORS.rstDescription of PR
This PR changes the implementation of the binary ops to support units (via iam-units and pint).
Default behaviour:
pint.Quantity
using the iam-units registry and perform operations with pint unit handling (ignore_units=False
)Warning: By casting units to pints and then performing operations, the return-format may look different.
In some of the tests,
EJ/yr
is transformed toexajoule / year
(in full format) orEJ / a
(in compact format, implemented here). This is over-ridden for some methods (addition, multiplication, division) if all units are identical.df.add("Primary Energy", Quantity(2, "EJ/yr"), "new")
.""
(empty string) rather than the worddimensionless
(this works in a round-trip to-from-to pint).Alternative over-ride:
ignore_units=True
, in which case the operations will work just on the values and setunit
to "unknown" (per suggestion by @znicholls in Automated unit handling in operations #536 (comment)) or to the value ofignore_units
(if it's a string).Known issues:
Side note:
apply()
function.To-dos in later PRs (it's already quite a beast):
ignore_units
to theapply()
function.closes #537
closes #535