-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor defs to avoid defining non-quantities; expand GWP conversions #19
Conversation
Thanks @khaeru - a few questions as I try to wrap my head around this...
|
I'm still not sure, but was going to ask for such a pointer to the current usage in pyam so this could be adjusted to make it usable. So thanks! If I understand correctly, the user must supply the context arg if factor is None and they wish to do a conversion using a GWP metric—is that right? args = [_reg[to]] if context is None else [_reg[to], context] Or put another way, if they are converting emissions species using GWPs and fail to supply context, this will not work?
Yes, I think it could be as simple as adding units/iam_units/data/emissions/emissions.txt Lines 35 to 38 in e04629a
…but, will test. I also need to add |
Correct, converting a species in pyam to CO2e without context raises a
Yes, please! |
Taking a thought from an inline comment to the larger discussion: how would an application like message_ix or pyam keep track of the species when it's not part of the unit...? What is the intended implementation there? |
In message_ix reporting, at least, any quantity relating to emissions has an for (species, unit), data in all_data.groupby(['emission', 'unit']):
# Use vector conversion for the entire column
result = convert_gwp('AR5GWP100',
(data['value'].values, unit),
species, 'CO2e')
# Produce whatever kind of output is desired This groups all_data in the largest possible chunks for application of common factors. |
CO2e and C are now added. |
Also per #16 (comment), here are all the occurences of 'units' in scmdata for reference. |
Pretty sure that this will not be sufficient... If you only record the species and the unit (just the physical quantity), it is not possible to track what the reference is. For example, for carbon dioxide emissions, there is no consensus whether to have the quantity measured in CO2 or C. Similar aspects apply to the HFC species (which are all accounted converted to HFC23-equivalent). |
You're right, and I'm realizing (and adding this to the README now) there are actually as many as three properties:
In various usage that I've seen (could be others):
This package shouldn't try to accommodate all current/possible ways of arranging that information in downstream code—only to make sure the conversion is done correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is obviously an excellent PR from a Python implementation perspective.
But (for the record) I think it is a terribly bad design decision and detrimental to the long-term sustainability of the repository:
TL;DR: it's making everything complicated, potentially confuses users, and requires a lot of extra hoops in downstream applications - for no (relevant) reason.
- There is no fundamental reason why defining emission species as base units is a bad idea other than adherence to VIM definitions and intellectual purity.
- Users think of "units" as a combination of a physical and a nominal property, so forcing them to wrap their head around a separation will simply cause confusion.
- The new docstrings explicitly tell users that "to avoid ambiguity, code handling GHG quantities should also track and output these nominal properties". Which is only necessary because of the pure-but-complicated approach. There is a risk that users will not follow this advice, leading to exactly the kind of errors that using pint plus this repository was intended to mitigate.
- The new implementation does not use standard, out-of-the-box pint functions like the context for the first gwp-conversion-implementation, but (is forced to) introduce new functions.
- The new functions are expertly implemented and exhaustively documented - but I see the risk that only a Python expert will be able to make any further changes, reducing the sustainability of this project.
- Applications using this repository like pyam (see this PR) cannot apply pint out-of-the-box, like previously just passing a context arg and letting pint figure out what to do. Instead, they need to introduce custom code before calling the new
iam-units
custom function. In the referenced PR, this extends the codebase from 2 lines applicable for any pint-context to 15 lines of code that require extensive documentation to be intelligible. - The customisation required in the downstream-applications limits forward-compatibility. If the same level of purity will be applied to passenger-kilometres and other non-VIM-satisfying conventions, pyam, message_ix, etc. will again need to be changed to use the new custom conversion functions. Whereas the "context" approach doesn't require any downstream changes.
@khaeru, you have the lead on this repo, so I'll only add this as a comment and not as Request changes - your call how to proceed.
@danielhuppmann thanks for these comments! To explain further, the motivation here is the same worries about users:
In particular, in the iTEM community there have been repeated, circular discussions triggered by confusion over definitions and measurement. The root cause of these was the choice to adopt a data format that is a variant of the IAMC format, and to shoehorn many, conceptually-different things into 'variable' and 'unit' strings in an inconsistent way. Analysis codes are thus very long and convoluted. This is the same cause underlying the need for extensive 'variable'-name mappings to move data between different databases like AR5, ADVANCE, SR15, AR6, etc., when the things measured are often the same. These experiences are why I see promise in applying a carefully-designed data model like SDMX (or at least its logic), in which concepts are clearly separated. Then, for instance:
Similarly, in iTEM, we will use concepts (via dimensions and attributes) as far as possible to capture whether The chief goal of this overarching direction is to clarify the meaning of data and prevent downstream confusion and associated costs/headaches. As an incidental benefit, when these non-unit properties are handled explicitly, units can just be units. As for implementation, I think of at least three kinds of code:
It's good and appropriate for ixmp, pyam, etc. (2) to do some hand-holding for their users (3), depending on their level of knowledge. That can take different forms:
Low-level libraries (1) can be more opinionated. In the advice "code handling GHG quantities should…" the word "code" mainly applies to (2). IAMconsortium/pyam#361 does this, and the MESSAGE stack is moving in the same direction. But these packages can/should make their own decisions about how forgiving to be of their users, per the above. "code" using this package might also be (3). e.g. if someone uses this package in their research code directly (not via ixmp or pyam). Then they are (choose to be) encouraged to think of origin species, metric, and equivalent species, and how to track them. I guess the tl;dr of the response is, "discipline is useful in downstream packages; we can still choose to be forgiving in user code." |
We agree on the key principles:
We have different views when it comes to implementation. The current implementation allows the following:
I think that this obviously satisfies (1) and (2), and there are many things related to units that can be confusing, but methane emissions expressed as "unit" This implementation also automatically returns an error when trying to convert emissions without providing a context, so safeguarding against wrong use. The only "downside" is the violation (I'd call it flexible interpretation) that species are defined as base units. The new implementation defines a new function
which is really not more or less intuitive than the previous one and does not satisfy the three items above better. The only "improvement" is the purity of not defining species as a base unit - but it comes at the expense of substantially more elaborate code in this (what used to be low-level) utility package and any downstream packages. tl;dr: there is a trade-off between discipline (adherence to high-level principles) and simplicity. I'm still convinced that the design decision in this PR unnecessarily overcomplicates this utility and hinders maintainability of the code and any downstream work going forward. |
- Also install numpy for CI
I appreciate this concern. I've also taken the time to read the code in openscm-units, and it seems to me this design achieves a superset of features with less code and complexity, and a simpler UX. As you say at #19 (review), I do take responsibility for these additions. If the functions need adjustment and the code/comments do not, per se, show a clear way to write PRs, I commit to making the changes (please open issues), or to explain the code to those who would make them. As for code downstream, IAMconsortium/pyam#361 preserves and expands current pyam functionality; and using these functions in iiasa/message_data#116 has been straightforward. I'll be happy to provide advice (on short notice) for how best to use this code in pyam, and certainly to expand the test suite here to ensure there is a clearly-defined and stable API for pyam and other client packages to run against. |
Our first functions! This will close #12 and #11.
The docstrings, README, and new tests show the supported functionality:
units/iam_units/__init__.py
Lines 17 to 37 in e04629a
units/iam_units/test_all.py
Lines 97 to 119 in e04629a