Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor defs to avoid defining non-quantities; expand GWP conversions #19

Merged
merged 22 commits into from
Apr 12, 2020

Conversation

khaeru
Copy link
Contributor

@khaeru khaeru commented Apr 6, 2020

Our first functions! This will close #12 and #11.

The docstrings, README, and new tests show the supported functionality:

def convert_gwp(metric, quantity, *species):
"""Convert *quantity* between emissions *species* with a GWP *metric*.
Parameters
----------
metric : 'SARGWP100' or 'AR4GWP100' or 'AR5GWP100'
Metric conversion factors to use.
quantity : str or pint.Quantity or tuple
Quantity to convert. If a tuple, the arguments are passed to the
:class:`pint.Quantity` constructor.
species : sequence of str, length 1 or 2
Output, or input and output emissions species, e.g. ('CH4', 'CO2') to
convert mass of CH₄ to GWP-equivalent mass of CO₂. If only the output
species is provided, *quantity* must contain the name of the input
species in some location, e.g. 'tonne CH4 / year'.
Returns
-------
pint.Quantity
`quantity` converted from the input to output species.
"""

@pytest.mark.parametrize('units', ['t {}', 'Mt {}', 'Mt {} / yr'])
@pytest.mark.parametrize('metric, expected_value', EMI_DATA)
def test_convert_gwp(units, metric, expected_value):
# Bare masses can be converted
qty = registry.Quantity(1.0, units.format(''))
expected = registry(f'{expected_value} {units}')
assert convert_gwp(metric, qty, 'CH4', 'CO2') == expected
# '[mass] [speciesname] (/ [time])' can be converted; the input species is
# extracted from the *qty* argument
qty = f'1.0 ' + units.format('CH4')
expected = registry(f'{expected_value} {units}')
assert convert_gwp(metric, qty, 'CO2') == expected
# Tuple of (vector magnitude, unit expression) can be converted where the
# the unit expression contains the input species name
arr = [1.0, 2.5, 0.1]
qty = (arr, units.format('CH4'))
assert_array_almost_equal(
convert_gwp(metric, qty, 'CO2').magnitude,
np.array(arr) * expected_value,
)

@khaeru khaeru self-assigned this Apr 6, 2020
@khaeru khaeru linked an issue Apr 6, 2020 that may be closed by this pull request
@danielhuppmann
Copy link
Member

Thanks @khaeru - a few questions as I try to wrap my head around this...

  1. I guess having a specific function and the existing context-implementation in parallel is extra hassle and scope for confusion, so do you intend to deprecate the context-gwp-conversion?

  2. If yes - does that mean that any tool using this registry will need to implement a switch in its implementation (if emissions/gwp, use gwp_convert else do the standard Qantitity().to())?

    This would obviously complicate things in applications relative to current approach... See pyam implementation for reference here, which is agnostic and simply passes on the context arg if provided - and let's pint figure out what to do.

  3. The new implementation is not able to deal with synonyms (which is a really neat feature of pint), not even "CO2e" which is the actual unit after GWP-conversion, not CO2. Is that on the to-do list?

@khaeru
Copy link
Contributor Author

khaeru commented Apr 7, 2020

  1. I guess having a specific function and the existing context-implementation in parallel is extra hassle and scope for confusion, so do you intend to deprecate the context-gwp-conversion?
  2. If yes - does that mean that any tool using this registry will need to implement a switch in its implementation (if emissions/gwp, use gwp_convert else do the standard Qantitity().to())?

I'm still not sure, but was going to ask for such a pointer to the current usage in pyam so this could be adjusted to make it usable. So thanks! If I understand correctly, the user must supply the context arg if factor is None and they wish to do a conversion using a GWP metric—is that right?

        args = [_reg[to]] if context is None else [_reg[to], context]

Or put another way, if they are converting emissions species using GWPs and fail to supply context, this will not work?

  1. The new implementation is not able to deal with synonyms (which is a really neat feature of pint), not even "CO2e" which is the actual unit after GWP-conversion, not CO2. Is that on the to-do list?

Yes, I think it could be as simple as adding a_CO2e = 1.0 here:

# Dummy base unit used for GWP conversions of [mass] -> [mass]
_gwp = [_GWP]
a_CO2 = 1.0

…but, will test. I also need to add C for Carbon.

@danielhuppmann
Copy link
Member

Or put another way, if they are converting emissions species using GWPs and fail to supply context, this will not work?

Correct, converting a species in pyam to CO2e without context raises a pint.DimensionalityError. This is tested here.

I also need to add C for Carbon.

Yes, please!

@danielhuppmann
Copy link
Member

Taking a thought from an inline comment to the larger discussion: how would an application like message_ix or pyam keep track of the species when it's not part of the unit...? What is the intended implementation there?

@khaeru
Copy link
Contributor Author

khaeru commented Apr 7, 2020

how would an application like message_ix or pyam keep track of the species when it's not part of the unit

In message_ix reporting, at least, any quantity relating to emissions has an emission or e dimension that contains the species name. So supposing a pd.DataFrame all_data with 'value', 'emission', and 'unit' columns:

for (species, unit), data in all_data.groupby(['emission', 'unit']):
    # Use vector conversion for the entire column
    result = convert_gwp('AR5GWP100',
                         (data['value'].values, unit),
                         species, 'CO2e')
    # Produce whatever kind of output is desired

This groups all_data in the largest possible chunks for application of common factors.

@khaeru
Copy link
Contributor Author

khaeru commented Apr 7, 2020

CO2e and C are now added.

@khaeru
Copy link
Contributor Author

khaeru commented Apr 7, 2020

I'm still not sure, but was going to ask for such a pointer to the current usage in pyam so this could be adjusted to make it usable.

Also per #16 (comment), here are all the occurences of 'units' in scmdata for reference.

@danielhuppmann
Copy link
Member

In message_ix reporting, at least, any quantity relating to emissions has an emission or e dimension that contains the species name.

Pretty sure that this will not be sufficient... If you only record the species and the unit (just the physical quantity), it is not possible to track what the reference is. For example, for carbon dioxide emissions, there is no consensus whether to have the quantity measured in CO2 or C. Similar aspects apply to the HFC species (which are all accounted converted to HFC23-equivalent).

@khaeru
Copy link
Contributor Author

khaeru commented Apr 7, 2020

If you only record the species and the unit (just the physical quantity), it is not possible to track what the reference is. For example, for carbon dioxide emissions, there is no consensus whether to have the quantity measured in CO2 or C. Similar aspects apply to the HFC species (which are all accounted converted to HFC23-equivalent).

You're right, and I'm realizing (and adding this to the README now) there are actually as many as three properties:

  1. Original emissions species,
  2. Species in which the GWP-equivalent are expressed (sometimes CO2, C, or HFC23, as mentioned), and
  3. The metric used for conversion.

In various usage that I've seen (could be others):

  • (1) is included in a 'variable' column (along with other things) and in a 'unit' column (along with the units) when no GWP is applied; (2) and (3) are not applicable.
  • If converted, (1) is in a 'variable' column; a 'unit' column contains (2) (instead of (1) as above).
    • Sometimes (3) is explicit, e.g. somewhere in a 'variable' column along with (1).
    • Sometimes (3) is omitted/implicit.
  • In the MESSAGE reporting we have:
    • (1) as an e dimension,
    • (3) as a gwp dimension (see iiasa/message_data#90),
    • We'll probably end up handling (2) by requiring that it always the same internally; or by adding it as another dimension with a constant value.
    • We convert to the above format(s) in order to pass data to existing tools.

This package shouldn't try to accommodate all current/possible ways of arranging that information in downstream code—only to make sure the conversion is done correctly.

Copy link
Member

@danielhuppmann danielhuppmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obviously an excellent PR from a Python implementation perspective.

But (for the record) I think it is a terribly bad design decision and detrimental to the long-term sustainability of the repository:

TL;DR: it's making everything complicated, potentially confuses users, and requires a lot of extra hoops in downstream applications - for no (relevant) reason.

  1. There is no fundamental reason why defining emission species as base units is a bad idea other than adherence to VIM definitions and intellectual purity.
  2. Users think of "units" as a combination of a physical and a nominal property, so forcing them to wrap their head around a separation will simply cause confusion.
  3. The new docstrings explicitly tell users that "to avoid ambiguity, code handling GHG quantities should also track and output these nominal properties". Which is only necessary because of the pure-but-complicated approach. There is a risk that users will not follow this advice, leading to exactly the kind of errors that using pint plus this repository was intended to mitigate.
  4. The new implementation does not use standard, out-of-the-box pint functions like the context for the first gwp-conversion-implementation, but (is forced to) introduce new functions.
  5. The new functions are expertly implemented and exhaustively documented - but I see the risk that only a Python expert will be able to make any further changes, reducing the sustainability of this project.
  6. Applications using this repository like pyam (see this PR) cannot apply pint out-of-the-box, like previously just passing a context arg and letting pint figure out what to do. Instead, they need to introduce custom code before calling the new iam-units custom function. In the referenced PR, this extends the codebase from 2 lines applicable for any pint-context to 15 lines of code that require extensive documentation to be intelligible.
  7. The customisation required in the downstream-applications limits forward-compatibility. If the same level of purity will be applied to passenger-kilometres and other non-VIM-satisfying conventions, pyam, message_ix, etc. will again need to be changed to use the new custom conversion functions. Whereas the "context" approach doesn't require any downstream changes.

@khaeru, you have the lead on this repo, so I'll only add this as a comment and not as Request changes - your call how to proceed.

@khaeru
Copy link
Contributor Author

khaeru commented Apr 9, 2020

@danielhuppmann thanks for these comments!

To explain further, the motivation here is the same worries about users:

Users think of "units" as a combination of a physical and a nominal property.

There is a risk that users will not follow [the] advice that […] to avoid ambiguity, code handling GHG quantities should also track and output these nominal properties.

In particular, in the iTEM community there have been repeated, circular discussions triggered by confusion over definitions and measurement. The root cause of these was the choice to adopt a data format that is a variant of the IAMC format, and to shoehorn many, conceptually-different things into 'variable' and 'unit' strings in an inconsistent way. Analysis codes are thus very long and convoluted. This is the same cause underlying the need for extensive 'variable'-name mappings to move data between different databases like AR5, ADVANCE, SR15, AR6, etc., when the things measured are often the same.

These experiences are why I see promise in applying a carefully-designed data model like SDMX (or at least its logic), in which concepts are clearly separated. Then, for instance:

  • A concept 'GWP metric' can be clearly defined, once.
  • In particular applications, according to need:
    • an attribute for the metric concept can be attached to an entire data set, or
    • a dataset can have a dimension for the metric concept, so it can contain observations with the same quantity converted into CO₂-equivalents by different metrics.

Similarly, in iTEM, we will use concepts (via dimensions and attributes) as far as possible to capture whether 1000 km / year (magnitude, unit) are travelled by a person, vehicle, or even tonne cargo (different values for a concept of 'thing transported'). (per point 6, there are no standard metrics for e.g. occupancy—km / year of persons divided by km / year of a vehicle—so that will not go into this repo/package. I may, as mentioned in Slack, remove the base units if I can't convince myself they fit the VIM.)

The chief goal of this overarching direction is to clarify the meaning of data and prevent downstream confusion and associated costs/headaches. As an incidental benefit, when these non-unit properties are handled explicitly, units can just be units.


As for implementation, I think of at least three kinds of code:

  1. Low-level utility libraries, like this one.
  2. Scientific packages like ixmp and pyam.
  3. Researchers' project- or paper-specific analysis codes.

It's good and appropriate for ixmp, pyam, etc. (2) to do some hand-holding for their users (3), depending on their level of knowledge. That can take different forms:

  • Silently handle their customary behaviour (without objecting that it's 'impure')—pyam does this, and still will after Use iam_units.convert_gwp pyam#361.
  • Encourage/educate them on how to be more careful and rigorous. We do this often e.g.:
    • by forcing pyam users to pick a specific GWP metric instead of using one by default, we create a small opportunity for them to think, "Which one do I want to use? Which is appropriate for my analysis?"
    • by requiring ixmp users to enter a 'unit' column with parameter data and pre-add units to a Platform, we create opportunities for them to think about units in the first place. (Without the prompt, they might not.)

Low-level libraries (1) can be more opinionated. In the advice "code handling GHG quantities should…" the word "code" mainly applies to (2). IAMconsortium/pyam#361 does this, and the MESSAGE stack is moving in the same direction. But these packages can/should make their own decisions about how forgiving to be of their users, per the above.

"code" using this package might also be (3). e.g. if someone uses this package in their research code directly (not via ixmp or pyam). Then they are (choose to be) encouraged to think of origin species, metric, and equivalent species, and how to track them.


I guess the tl;dr of the response is, "discipline is useful in downstream packages; we can still choose to be forgiving in user code."

@danielhuppmann
Copy link
Member

We agree on the key principles:

  1. it should be possible to use functions from the low-level utility directly in any code
  2. user should be " encouraged to think of origin species, metric, and equivalent species, and how to track them."
  3. too many concepts are shoehorned into a variable-units convention in the IAMC framework in a non-harmonised manner

We have different views when it comes to implementation.

The current implementation allows the following:

registry('1.23 Mt CH4').to('Gt CO2e', 'gwp_AR4GWP100')

I think that this obviously satisfies (1) and (2), and there are many things related to units that can be confusing, but methane emissions expressed as "unit" Gt CO2e is really not one them - so I think that (3) is also satisfied (in the sense of not over-shoehorning).

This implementation also automatically returns an error when trying to convert emissions without providing a context, so safeguarding against wrong use.

The only "downside" is the violation (I'd call it flexible interpretation) that species are defined as base units.

The new implementation defines a new function

convert_gwp('AR4GWP100', '1.23 Mt', 'CH4', 'CO2e')

which is really not more or less intuitive than the previous one and does not satisfy the three items above better. The only "improvement" is the purity of not defining species as a base unit - but it comes at the expense of substantially more elaborate code in this (what used to be low-level) utility package and any downstream packages.

tl;dr: there is a trade-off between discipline (adherence to high-level principles) and simplicity. I'm still convinced that the design decision in this PR unnecessarily overcomplicates this utility and hinders maintainability of the code and any downstream work going forward.

@khaeru
Copy link
Contributor Author

khaeru commented Apr 12, 2020

tl;dr: there is a trade-off between discipline (adherence to high-level principles) and simplicity. I'm still convinced that the design decision in this PR unnecessarily overcomplicates this utility and hinders maintainability of the code and any downstream work going forward.

I appreciate this concern. I've also taken the time to read the code in openscm-units, and it seems to me this design achieves a superset of features with less code and complexity, and a simpler UX.

As you say at #19 (review), I do take responsibility for these additions. If the functions need adjustment and the code/comments do not, per se, show a clear way to write PRs, I commit to making the changes (please open issues), or to explain the code to those who would make them.

As for code downstream, IAMconsortium/pyam#361 preserves and expands current pyam functionality; and using these functions in iiasa/message_data#116 has been straightforward. I'll be happy to provide advice (on short notice) for how best to use this code in pyam, and certainly to expand the test suite here to ensure there is a clearly-defined and stable API for pyam and other client packages to run against.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor defs to avoid defining non-quantities Test GWP conversions with vectors
2 participants