Refactor defs to avoid defining non-quantities; expand GWP conversions #19

khaeru · 2020-04-06T19:19:14Z

Our first functions! This will close #12 and #11.

The docstrings, README, and new tests show the supported functionality:

Lines 17 to 37 in e04629a

    
           def convert_gwp(metric, quantity, *species): 
        
               """Convert *quantity* between emissions *species* with a GWP *metric*. 
        
               Parameters 
        
               ---------- 
        
               metric : 'SARGWP100' or 'AR4GWP100' or 'AR5GWP100' 
        
                   Metric conversion factors to use. 
        
               quantity : str or pint.Quantity or tuple 
        
                   Quantity to convert. If a tuple, the arguments are passed to the 
        
                   :class:`pint.Quantity` constructor. 
        
               species : sequence of str, length 1 or 2 
        
                   Output, or input and output emissions species, e.g. ('CH4', 'CO2') to 
        
                   convert mass of CH₄ to GWP-equivalent mass of CO₂. If only the output 
        
                   species is provided, *quantity* must contain the name of the input 
        
                   species in some location, e.g. 'tonne CH4 / year'. 
        
               Returns 
        
               ------- 
        
               pint.Quantity 
        
                   `quantity` converted from the input to output species. 
        
               """

units/iam_units/test_all.py

Lines 97 to 119 in e04629a

    
           @pytest.mark.parametrize('units', ['t {}', 'Mt {}', 'Mt {} / yr']) 
        
           @pytest.mark.parametrize('metric, expected_value', EMI_DATA) 
        
           def test_convert_gwp(units, metric, expected_value): 
        
               # Bare masses can be converted 
        
               qty = registry.Quantity(1.0, units.format('')) 
        
               expected = registry(f'{expected_value} {units}') 
        
               assert convert_gwp(metric, qty, 'CH4', 'CO2') == expected 
        
               # '[mass] [speciesname] (/ [time])' can be converted; the input species is 
        
               # extracted from the *qty* argument 
        
               qty = f'1.0 ' + units.format('CH4') 
        
               expected = registry(f'{expected_value} {units}') 
        
               assert convert_gwp(metric, qty, 'CO2') == expected 
        
               # Tuple of (vector magnitude, unit expression) can be converted where the 
        
               # the unit expression contains the input species name 
        
               arr = [1.0, 2.5, 0.1] 
        
               qty = (arr, units.format('CH4')) 
        
               assert_array_almost_equal( 
        
                   convert_gwp(metric, qty, 'CO2').magnitude, 
        
                   np.array(arr) * expected_value, 
        
               )

iam_units/emissions.py

iam_units/test_all.py

danielhuppmann · 2020-04-06T21:47:25Z

Thanks @khaeru - a few questions as I try to wrap my head around this...

I guess having a specific function and the existing context-implementation in parallel is extra hassle and scope for confusion, so do you intend to deprecate the context-gwp-conversion?
If yes - does that mean that any tool using this registry will need to implement a switch in its implementation (if emissions/gwp, use gwp_convert else do the standard Qantitity().to())?

This would obviously complicate things in applications relative to current approach... See pyam implementation for reference here, which is agnostic and simply passes on the context arg if provided - and let's pint figure out what to do.
The new implementation is not able to deal with synonyms (which is a really neat feature of pint), not even "CO2e" which is the actual unit after GWP-conversion, not CO2. Is that on the to-do list?

khaeru · 2020-04-07T07:10:41Z

I guess having a specific function and the existing context-implementation in parallel is extra hassle and scope for confusion, so do you intend to deprecate the context-gwp-conversion?

If yes - does that mean that any tool using this registry will need to implement a switch in its implementation (if emissions/gwp, use gwp_convert else do the standard Qantitity().to())?

I'm still not sure, but was going to ask for such a pointer to the current usage in pyam so this could be adjusted to make it usable. So thanks! If I understand correctly, the user must supply the context arg if factor is None and they wish to do a conversion using a GWP metric—is that right?

        args = [_reg[to]] if context is None else [_reg[to], context]

Or put another way, if they are converting emissions species using GWPs and fail to supply context, this will not work?

The new implementation is not able to deal with synonyms (which is a really neat feature of pint), not even "CO2e" which is the actual unit after GWP-conversion, not CO2. Is that on the to-do list?

Yes, I think it could be as simple as adding a_CO2e = 1.0 here:

units/iam_units/data/emissions/emissions.txt

Lines 35 to 38 in e04629a

    
           # Dummy base unit used for GWP conversions of [mass] -> [mass] 
        
           _gwp = [_GWP] 
        
           a_CO2 = 1.0

…but, will test. I also need to add C for Carbon.

danielhuppmann · 2020-04-07T07:47:29Z

Or put another way, if they are converting emissions species using GWPs and fail to supply context, this will not work?

Correct, converting a species in pyam to CO2e without context raises a pint.DimensionalityError. This is tested here.

I also need to add C for Carbon.

Yes, please!

danielhuppmann · 2020-04-07T08:11:37Z

Taking a thought from an inline comment to the larger discussion: how would an application like message_ix or pyam keep track of the species when it's not part of the unit...? What is the intended implementation there?

khaeru · 2020-04-07T10:05:13Z

how would an application like message_ix or pyam keep track of the species when it's not part of the unit

In message_ix reporting, at least, any quantity relating to emissions has an emission or e dimension that contains the species name. So supposing a pd.DataFrame all_data with 'value', 'emission', and 'unit' columns:

for (species, unit), data in all_data.groupby(['emission', 'unit']):
    # Use vector conversion for the entire column
    result = convert_gwp('AR5GWP100',
                         (data['value'].values, unit),
                         species, 'CO2e')
    # Produce whatever kind of output is desired

This groups all_data in the largest possible chunks for application of common factors.

khaeru · 2020-04-07T10:07:31Z

CO2e and C are now added.

khaeru · 2020-04-07T10:31:00Z

I'm still not sure, but was going to ask for such a pointer to the current usage in pyam so this could be adjusted to make it usable.

Also per #16 (comment), here are all the occurences of 'units' in scmdata for reference.

danielhuppmann · 2020-04-07T11:16:25Z

In message_ix reporting, at least, any quantity relating to emissions has an emission or e dimension that contains the species name.

Pretty sure that this will not be sufficient... If you only record the species and the unit (just the physical quantity), it is not possible to track what the reference is. For example, for carbon dioxide emissions, there is no consensus whether to have the quantity measured in CO2 or C. Similar aspects apply to the HFC species (which are all accounted converted to HFC23-equivalent).

khaeru · 2020-04-07T13:34:10Z

If you only record the species and the unit (just the physical quantity), it is not possible to track what the reference is. For example, for carbon dioxide emissions, there is no consensus whether to have the quantity measured in CO2 or C. Similar aspects apply to the HFC species (which are all accounted converted to HFC23-equivalent).

You're right, and I'm realizing (and adding this to the README now) there are actually as many as three properties:

Original emissions species,
Species in which the GWP-equivalent are expressed (sometimes CO2, C, or HFC23, as mentioned), and
The metric used for conversion.

In various usage that I've seen (could be others):

(1) is included in a 'variable' column (along with other things) and in a 'unit' column (along with the units) when no GWP is applied; (2) and (3) are not applicable.
If converted, (1) is in a 'variable' column; a 'unit' column contains (2) (instead of (1) as above).
- Sometimes (3) is explicit, e.g. somewhere in a 'variable' column along with (1).
- Sometimes (3) is omitted/implicit.
In the MESSAGE reporting we have:
- (1) as an e dimension,
- (3) as a gwp dimension (see iiasa/message_data#90),
- We'll probably end up handling (2) by requiring that it always the same internally; or by adding it as another dimension with a constant value.
- We convert to the above format(s) in order to pass data to existing tools.

This package shouldn't try to accommodate all current/possible ways of arranging that information in downstream code—only to make sure the conversion is done correctly.

danielhuppmann

This is obviously an excellent PR from a Python implementation perspective.

But (for the record) I think it is a terribly bad design decision and detrimental to the long-term sustainability of the repository:

TL;DR: it's making everything complicated, potentially confuses users, and requires a lot of extra hoops in downstream applications - for no (relevant) reason.

There is no fundamental reason why defining emission species as base units is a bad idea other than adherence to VIM definitions and intellectual purity.
Users think of "units" as a combination of a physical and a nominal property, so forcing them to wrap their head around a separation will simply cause confusion.
The new docstrings explicitly tell users that "to avoid ambiguity, code handling GHG quantities should also track and output these nominal properties". Which is only necessary because of the pure-but-complicated approach. There is a risk that users will not follow this advice, leading to exactly the kind of errors that using pint plus this repository was intended to mitigate.
The new implementation does not use standard, out-of-the-box pint functions like the context for the first gwp-conversion-implementation, but (is forced to) introduce new functions.
The new functions are expertly implemented and exhaustively documented - but I see the risk that only a Python expert will be able to make any further changes, reducing the sustainability of this project.
Applications using this repository like pyam (see this PR) cannot apply pint out-of-the-box, like previously just passing a context arg and letting pint figure out what to do. Instead, they need to introduce custom code before calling the new iam-units custom function. In the referenced PR, this extends the codebase from 2 lines applicable for any pint-context to 15 lines of code that require extensive documentation to be intelligible.
The customisation required in the downstream-applications limits forward-compatibility. If the same level of purity will be applied to passenger-kilometres and other non-VIM-satisfying conventions, pyam, message_ix, etc. will again need to be changed to use the new custom conversion functions. Whereas the "context" approach doesn't require any downstream changes.

@khaeru, you have the lead on this repo, so I'll only add this as a comment and not as Request changes - your call how to proceed.

khaeru · 2020-04-09T10:13:15Z

@danielhuppmann thanks for these comments!

To explain further, the motivation here is the same worries about users:

Users think of "units" as a combination of a physical and a nominal property.

There is a risk that users will not follow [the] advice that […] to avoid ambiguity, code handling GHG quantities should also track and output these nominal properties.

In particular, in the iTEM community there have been repeated, circular discussions triggered by confusion over definitions and measurement. The root cause of these was the choice to adopt a data format that is a variant of the IAMC format, and to shoehorn many, conceptually-different things into 'variable' and 'unit' strings in an inconsistent way. Analysis codes are thus very long and convoluted. This is the same cause underlying the need for extensive 'variable'-name mappings to move data between different databases like AR5, ADVANCE, SR15, AR6, etc., when the things measured are often the same.

These experiences are why I see promise in applying a carefully-designed data model like SDMX (or at least its logic), in which concepts are clearly separated. Then, for instance:

A concept 'GWP metric' can be clearly defined, once.
In particular applications, according to need:
- an attribute for the metric concept can be attached to an entire data set, or
- a dataset can have a dimension for the metric concept, so it can contain observations with the same quantity converted into CO₂-equivalents by different metrics.

Similarly, in iTEM, we will use concepts (via dimensions and attributes) as far as possible to capture whether 1000 km / year (magnitude, unit) are travelled by a person, vehicle, or even tonne cargo (different values for a concept of 'thing transported'). (per point 6, there are no standard metrics for e.g. occupancy—km / year of persons divided by km / year of a vehicle—so that will not go into this repo/package. I may, as mentioned in Slack, remove the base units if I can't convince myself they fit the VIM.)

The chief goal of this overarching direction is to clarify the meaning of data and prevent downstream confusion and associated costs/headaches. As an incidental benefit, when these non-unit properties are handled explicitly, units can just be units.

As for implementation, I think of at least three kinds of code:

Low-level utility libraries, like this one.
Scientific packages like ixmp and pyam.
Researchers' project- or paper-specific analysis codes.

It's good and appropriate for ixmp, pyam, etc. (2) to do some hand-holding for their users (3), depending on their level of knowledge. That can take different forms:

Silently handle their customary behaviour (without objecting that it's 'impure')—pyam does this, and still will after Use iam_units.convert_gwp pyam#361.
Encourage/educate them on how to be more careful and rigorous. We do this often e.g.:
- by forcing pyam users to pick a specific GWP metric instead of using one by default, we create a small opportunity for them to think, "Which one do I want to use? Which is appropriate for my analysis?"
- by requiring ixmp users to enter a 'unit' column with parameter data and pre-add units to a Platform, we create opportunities for them to think about units in the first place. (Without the prompt, they might not.)

Low-level libraries (1) can be more opinionated. In the advice "code handling GHG quantities should…" the word "code" mainly applies to (2). IAMconsortium/pyam#361 does this, and the MESSAGE stack is moving in the same direction. But these packages can/should make their own decisions about how forgiving to be of their users, per the above.

"code" using this package might also be (3). e.g. if someone uses this package in their research code directly (not via ixmp or pyam). Then they are (choose to be) encouraged to think of origin species, metric, and equivalent species, and how to track them.

I guess the tl;dr of the response is, "discipline is useful in downstream packages; we can still choose to be forgiving in user code."

danielhuppmann · 2020-04-09T13:49:55Z

We agree on the key principles:

it should be possible to use functions from the low-level utility directly in any code
user should be " encouraged to think of origin species, metric, and equivalent species, and how to track them."
too many concepts are shoehorned into a variable-units convention in the IAMC framework in a non-harmonised manner

We have different views when it comes to implementation.

The current implementation allows the following:

registry('1.23 Mt CH4').to('Gt CO2e', 'gwp_AR4GWP100')

I think that this obviously satisfies (1) and (2), and there are many things related to units that can be confusing, but methane emissions expressed as "unit" Gt CO2e is really not one them - so I think that (3) is also satisfied (in the sense of not over-shoehorning).

This implementation also automatically returns an error when trying to convert emissions without providing a context, so safeguarding against wrong use.

The only "downside" is the violation (I'd call it flexible interpretation) that species are defined as base units.

The new implementation defines a new function

convert_gwp('AR4GWP100', '1.23 Mt', 'CH4', 'CO2e')

which is really not more or less intuitive than the previous one and does not satisfy the three items above better. The only "improvement" is the purity of not defining species as a base unit - but it comes at the expense of substantially more elaborate code in this (what used to be low-level) utility package and any downstream packages.

tl;dr: there is a trade-off between discipline (adherence to high-level principles) and simplicity. I'm still convinced that the design decision in this PR unnecessarily overcomplicates this utility and hinders maintainability of the code and any downstream work going forward.

- Also install numpy for CI

- For developing IAMconsortium/pyam#361

khaeru · 2020-04-12T20:52:39Z

tl;dr: there is a trade-off between discipline (adherence to high-level principles) and simplicity. I'm still convinced that the design decision in this PR unnecessarily overcomplicates this utility and hinders maintainability of the code and any downstream work going forward.

I appreciate this concern. I've also taken the time to read the code in openscm-units, and it seems to me this design achieves a superset of features with less code and complexity, and a simpler UX.

As you say at #19 (review), I do take responsibility for these additions. If the functions need adjustment and the code/comments do not, per se, show a clear way to write PRs, I commit to making the changes (please open issues), or to explain the code to those who would make them.

As for code downstream, IAMconsortium/pyam#361 preserves and expands current pyam functionality; and using these functions in iiasa/message_data#116 has been straightforward. I'll be happy to provide advice (on short notice) for how best to use this code in pyam, and certainly to expand the test suite here to ensure there is a clearly-defined and stable API for pyam and other client packages to run against.

khaeru self-assigned this Apr 6, 2020

khaeru linked an issue Apr 6, 2020 that may be closed by this pull request

Test GWP conversions with vectors #11

Closed

khaeru force-pushed the issue/12 branch from 9e6c7f9 to e04629a Compare April 6, 2020 19:24

danielhuppmann reviewed Apr 6, 2020

View reviewed changes

iam_units/emissions.py Show resolved Hide resolved

danielhuppmann reviewed Apr 6, 2020

View reviewed changes

iam_units/test_all.py Outdated Show resolved Hide resolved

khaeru force-pushed the issue/12 branch from 5d51366 to e6ee6c1 Compare April 7, 2020 09:43

danielhuppmann reviewed Apr 8, 2020

View reviewed changes

khaeru added 12 commits April 9, 2020 18:19

Implement convert_gwp and generator for input data files

99f643a

Add generated data files for GWP metrics

fd00cf5

Expand test_convert_gwp to same set of values as test_units_emissions

7cfd948

Handle [mass]/[time] quantities in convert_gwp

50c0174

Also generate an importable regex for emissions species

27dd5b5

Automatically extract input species for convert_gwp

64701c1

Handle tuples as input to convert_gwp

4cf3cd4

Update developers' notes

c7c175d

Bump max complexity; mea culpa

c65e350

- Also install numpy for CI

Add code comments in generated emissions.py

b301a98

Add a_CO2e and test

f74ef7a

Add a_C, a_Ce and tests

c421300

khaeru added 10 commits April 9, 2020 18:20

Expand comments, decrement max complexity

d675e87

Add format_mass() function and tests

7691787

Add new methods to README

b2e3a54

Explicit check for mag is None in convert_gwp

c15180c

Remove implementation of GHG species as base units

522c49f

- For developing IAMconsortium/pyam#361

Remove mention of 'C' from README, old context files

ced37d8

Adjust __all__

d02847e

Roll checks.csv into test_all.py to eliminating parsing code

6ac2676

Update README

4fa0f4d

Add emissions flux and test

8b67b75

khaeru force-pushed the issue/12 branch from 73c68ce to 8b67b75 Compare April 9, 2020 17:03

khaeru merged commit c3a800d into master Apr 12, 2020

khaeru deleted the issue/12 branch April 12, 2020 20:53

khaeru added a commit to khaeru/pyam that referenced this pull request Apr 12, 2020

Remove CI pin to IAMconsortium/units#19 branch

d2e05b9

danielhuppmann mentioned this pull request Apr 14, 2020

Bump units from 2dfb706 to c3a800d IAMconsortium/pyam#366

Closed

gidden mentioned this pull request May 24, 2022

supporting emissions units in operations IAMconsortium/pyam#666

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor defs to avoid defining non-quantities; expand GWP conversions #19

Refactor defs to avoid defining non-quantities; expand GWP conversions #19

khaeru commented Apr 6, 2020 •

edited

Loading

danielhuppmann commented Apr 6, 2020

khaeru commented Apr 7, 2020

danielhuppmann commented Apr 7, 2020

danielhuppmann commented Apr 7, 2020

khaeru commented Apr 7, 2020

khaeru commented Apr 7, 2020

khaeru commented Apr 7, 2020

danielhuppmann commented Apr 7, 2020

khaeru commented Apr 7, 2020

danielhuppmann left a comment

khaeru commented Apr 9, 2020

danielhuppmann commented Apr 9, 2020

khaeru commented Apr 12, 2020

	def convert_gwp(metric, quantity, *species):
	"""Convert quantity between emissions species with a GWP metric.

	Parameters
	----------
	metric : 'SARGWP100' or 'AR4GWP100' or 'AR5GWP100'
	Metric conversion factors to use.
	quantity : str or pint.Quantity or tuple
	Quantity to convert. If a tuple, the arguments are passed to the
	:class:`pint.Quantity` constructor.
	species : sequence of str, length 1 or 2
	Output, or input and output emissions species, e.g. ('CH4', 'CO2') to
	convert mass of CH₄ to GWP-equivalent mass of CO₂. If only the output
	species is provided, quantity must contain the name of the input
	species in some location, e.g. 'tonne CH4 / year'.

	Returns
	-------
	pint.Quantity
	`quantity` converted from the input to output species.
	"""

	@pytest.mark.parametrize('units', ['t {}', 'Mt {}', 'Mt {} / yr'])
	@pytest.mark.parametrize('metric, expected_value', EMI_DATA)
	def test_convert_gwp(units, metric, expected_value):
	# Bare masses can be converted

	qty = registry.Quantity(1.0, units.format(''))
	expected = registry(f'{expected_value} {units}')
	assert convert_gwp(metric, qty, 'CH4', 'CO2') == expected

	# '[mass] [speciesname] (/ [time])' can be converted; the input species is
	# extracted from the qty argument
	qty = f'1.0 ' + units.format('CH4')
	expected = registry(f'{expected_value} {units}')
	assert convert_gwp(metric, qty, 'CO2') == expected

	# Tuple of (vector magnitude, unit expression) can be converted where the
	# the unit expression contains the input species name
	arr = [1.0, 2.5, 0.1]
	qty = (arr, units.format('CH4'))
	assert_array_almost_equal(
	convert_gwp(metric, qty, 'CO2').magnitude,
	np.array(arr) * expected_value,
	)

Refactor defs to avoid defining non-quantities; expand GWP conversions #19

Refactor defs to avoid defining non-quantities; expand GWP conversions #19

Conversation

khaeru commented Apr 6, 2020 • edited Loading

danielhuppmann commented Apr 6, 2020

khaeru commented Apr 7, 2020

danielhuppmann commented Apr 7, 2020

danielhuppmann commented Apr 7, 2020

khaeru commented Apr 7, 2020

khaeru commented Apr 7, 2020

khaeru commented Apr 7, 2020

danielhuppmann commented Apr 7, 2020

khaeru commented Apr 7, 2020

danielhuppmann left a comment

Choose a reason for hiding this comment

khaeru commented Apr 9, 2020

danielhuppmann commented Apr 9, 2020

khaeru commented Apr 12, 2020

khaeru commented Apr 6, 2020 •

edited

Loading