Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CO2-eq units (plus design decisions) #9

Closed
gidden opened this issue Mar 10, 2020 · 13 comments
Closed

CO2-eq units (plus design decisions) #9

gidden opened this issue Mar 10, 2020 · 13 comments

Comments

@gidden
Copy link
Member

gidden commented Mar 10, 2020

We were getting quite deep in the weeds in a conversation over #7. My suggestion is that we move that to an issue to hash out opinions that are not directly related to the implementation in the PR.

@khaeru opined (with example implementation here):

My comments to @danielhuppmann, repeated here for the benefit of others:

In the case of GWP, thinking out loud:

  • Warming per se is measured as [temperature] e.g. of global mean surface air.

    • It can also be measured as radiative forcing, e.g. [power] / [length]² (W/m²).
  • These changes (measured in [temperature] or [power]/[length]²) are caused by (under certain assumptions) a certain amount of carbon.

  • Carbon in turn can be measured in [mass] or [substance].

  • So one way (not the common one, I'll get to that) to define a context would be like this. I am copying here the syntax used by Pint:

    [mass] -> [temperature]: value * gwp
    
  • In this case, one supplies the gwp for carbon per se, i.e. in [mass]/[temperature], e.g. kelvin per gigatonne (the exact units don't matter). (Also, per this, maybe the parameter should be a for radiative efficiency; TBD.)

  • Other GHGs are converted to [temperature] by their own gwp parameter.

  • In layman's thinking, GWP is roughly [mass] of "CO2 that would cause equivalent warming to a certain [mass] of CH4".

  • i.e. these both have the same physical quantity [mass], but they describe different things:

    • One is a certain amount of CH4.
    • The other is a hypothetical amount of CO2 that would cause (…etc.)
  • To convert [mass] of CH4 to [mass] of "CO2 that would cause (…etc.)", one could:

    1. Convert [mass] CH4 to [temperature] using the gwp for CH4.
    2. Convert [temperature] to [mass] using the gwp for CO2.
  • This conversion is clear and explicit because the [temperature] is the same measurement of the same system.

I would prefer to use some variant of this approach, rather than what's in this commit, because it avoids abusing Pint by introducing physical quantities that are not actually physical quantities. (Using the definition “A physical quantity is a property of a material or system that can be quantified by measurement,” [mass] or [substance] (number of molecules) are properties of a system, e.g. a certain lump of carbon. “[carbon]” is the system itself, not a property thereof.)

@znicholls identified and already implemented solution here

@danielhuppmann expressed some reservations regarding the heaviness implied on using that approach.

The conversation to be decided here is:

  • do we have a preferred approach?
  • if we like the approach by scmdata, are we ok with derivative effects? (moving units to a package)
@gidden
Copy link
Member Author

gidden commented Mar 10, 2020

@khaeru could you please comment on the approach in scmdata? My understanding is that it separates the concerns between the Registry and a UnitConverter which avoids the 'abuse' of Pint.

@gidden
Copy link
Member Author

gidden commented Mar 10, 2020

And I should note that, outside of design concerns, I am generally in favor a utilizing existing implementations when it makes sense especially if we can grow the user/dev community for specific tools. So in general, the more we can share between the IAM and SCM teams tool-wise, the better (many hands making lighter work and all..).

@danielhuppmann
Copy link
Member

I believe there is a larger issue than the technical implementation lurking in the shadows:

@znicholls stop me if I'm completely wrong here, but afaik there is no unique value of a gwp that could be used to convert across species, e.g., CH4 to CO2e. The values depend on:

  • the assumed timespan of observation (usually GWP-100, but other durations are used)
  • the state of the literature over time (usually using IPCC reports as markers)

So strictly speaking, it doesn't make sense to convert species without specifying the reference.

I think that the behaviour of any implementation should be:

convert CH4 to CO2e -> raises an error or returns None
convert CH4 to CO2e using a specific metric e.g. AR5-GWP100 -> returns a value

However, I don't see that implemented in scmdata? So is my understanding wrong or did you hard-code the reference?

If I'm correct, then we should use pint-contexts to choose the metric.

@znicholls
Copy link

My understanding is that it separates the concerns between the Registry and a UnitConverter which avoids the 'abuse' of Pint.

Unfortunately we still had to more or less create a new units system in Pint. The UnitConverter is basically just a cache to make Pint slightly faster in the case where you're doing lots of conversions (say constantly passing information back and forth between an SCM and some other model). UnitConverter could be removed if we made a new repository and felt the extra complexity wasn't worth it.

@znicholls stop me if I'm completely wrong here, but afaik there is no unique value of a gwp that could be used to convert across species, e.g., CH4 to CO2e

Spot on.

So is my understanding wrong or did you hard-code the reference?

Hard-coded i.e. the user only has access to the metric conversions we've defined. The convention we've used so far is <IPCC report><metric><time horizon> e.g. SARGWP100. The metric conversions are loaded here (it's done like this so they can be lazy loaded i.e. we only read off disk when we need to).

I think that the behaviour of any implementation should be

I agree and it's how scmdata's unit registry behaves (see below). The keys are exactly what you've mentioned (I think):

  1. you have to specify the metric conversion, if you try to do it without a context then a dimensionality error is raised (also using just 'GWP' won't work as no such context is defined in scmdata)
  2. if you specify a metric, and the conversion exists within that metric, then it all proceeds happily
  3. if you specify a metric, and the conversion doesn't exist within that metric (e.g. you try to convert aerosol emissions to CO2-eq), then a dimensionality error is raised

illustrate-units.txt

>>> import scmdata
>>> 
>>> 
>>> scmdata.__version__
'0.4.0'
>>> 
>>> 
>>> ur = scmdata.units.unit_registry
>>> 
>>> 
>>> # 'standard' conversions are all fine
... ch4 = 1 * ur("Mt CH4 / yr")
>>> ch4.to("Gt CH4 / day")
<Quantity(2.737850787132101e-06, 'CH4 * gigametric_ton / day')>
>>> 
>>> 
>>> # if you try to convert CH4 to CO2 without a context, it fails
... try:
...     ch4.to("Gt CO2 / yr")
... except DimensionalityError:
...     traceback.print_exc(limit=0, chain=False)
... 
Traceback (most recent call last):
pint.errors.DimensionalityError: Cannot convert from 'CH4 * megametric_ton / a' ([mass] * [methane] / [time]) to 'CO2 * gigametric_ton / a' ([carbon] * [mass] / [time])
>>> # with a valid context, it returns a result
... with ur.context("AR4GWP100"):
...     ch4.to("Mt CO2 / yr")
... 
<Quantity(25.0, 'CO2 * megametric_ton / a')>
>>> # the result depends on the metric
... with ur.context("AR5GWP100"):
...     ch4.to("Mt CO2 / yr")
... 
<Quantity(28.0, 'CO2 * megametric_ton / a')>
>>> with ur.context("SARGWP100"):
...     ch4.to("Mt CO2 / yr")
... 
<Quantity(21.0, 'CO2 * megametric_ton / a')>
>>> 
>>> # simply 'GWP' is not a context
... try:
...     with ur.context("GWP"):
...         ch4.to("Mt CO2 / yr")
... except KeyError:
...     traceback.print_exc(limit=0, chain=False)
... 
Traceback (most recent call last):
KeyError: 'GWP'
>>> 
>>> so2 = 1 * ur("Mt S / yr")
>>> so2.to("Mt SO2/yr")
<Quantity(2.0, 'SO2 * megametric_ton / a')>
>>> 
>>> # if there is no valid conversion, a dimensionality error appears (context or not)
... try:
...     so2.to("Gt CO2 / yr")
... except DimensionalityError:
...     traceback.print_exc(limit=0, chain=False)
... 
Traceback (most recent call last):
pint.errors.DimensionalityError: Cannot convert from 'S * megametric_ton / a' ([mass] * [sulfur] / [time]) to 'CO2 * gigametric_ton / a' ([carbon] * [mass] / [time])
>>> try:
...     with ur.context("AR5GWP100"):
...         so2.to("Mt CO2 / yr")
... except DimensionalityError:
...     traceback.print_exc(limit=0, chain=False)
... 
Traceback (most recent call last):
pint.errors.DimensionalityError: Cannot convert from 'S * megametric_ton / a' ([mass] * [sulfur] / [time]) to 'CO2 * megametric_ton / a' ([carbon] * [mass] / [time])

@danielhuppmann
Copy link
Member

thanks for the clarification @znicholls! this level of sophistication wasn't clear to me from the source code and the docs...

@znicholls
Copy link

Our docs definitely leave something to be desired...

@khaeru khaeru mentioned this issue Mar 11, 2020
4 tasks
@znicholls
Copy link

Sorry if I'm missing something, but does #10 mean that the idea of a shared repository is off the table?

@danielhuppmann
Copy link
Member

Sorry if @khaeru and I moved too quickly here - but let me flip the question back: is there some use case where the light-weight approach followed in #10 is overly constraining so that it would make sense to disentangle part of scm-data into another shared resource?

@znicholls
Copy link

is there some use case where the light-weight approach followed in #10 is overly constraining so that it would make sense to disentangle part of scm-data into another shared resource?

I don't know, but I know we aimed for these light weight approaches in scmdata and abandoned them to avoid a) always reading off disk and b) problems related to compound units. This makes me nervous about the approach here (to be clear, the nerves don't mean it won't work, it's just not obvious to me that it will work yet).

The reason I was suggesting a shared repository was to make it clear that we have all have some ownership (we'd all be contributors with the same access rights) and hence make clear that everyone's time would be worth investing. At the moment I have no stake in this project so it doesn't make sense for me to put time into it, especially given that we already have a fully tested solution. A shared repository might not suit you though, which is totally fine, I would just like a straight yes/no answer.

@danielhuppmann
Copy link
Member

Frankly, I don't see the need for a(nother) shared repo at this point. scmdata has a solution that works, and this repo should be kept for as long as possible in a state that it can be used by any application just with pint.get_application_registry().load_definitions(<path/to/definitions.txt>).

But I don't know how deep the rabbit hole goes and we might need a more elaborate solution in the future...

About shared ownership - we migrated and renamed this repo partly because of your comment (as an extension of your suggestion iam-units, to show that it is not just relevant for IIASA).

@gidden
Copy link
Member Author

gidden commented Mar 12, 2020

Hi all - before we collectively decide to abandon a shared repo/resource (see prior reasoning why I do see it as a benefit), can I ask a few clarifying questions?

Specifically for @znicholls:

  1. I would assume reading from disc here happens once per execution - is that true and is that overhead enough to be a concern?

  2. I'm not sure I grok the compound units issue - is this not addressed by Pint contexts?

  3. But in any case, my understanding is that part of scmdata is not only definitions but also optimizations (speeding up certain processes) - so would it be possible to have definitions defined in a shared resource (e.g., here) and optimizations needed defined separately?

  4. Even if it was possible, would the scmdata devs be interested in that design, given that it will be non-zero work to implement and you were first movers here?

@znicholls
Copy link

znicholls commented Mar 12, 2020

  1. I would assume reading from disc here happens once per execution - is that true and is that overhead enough to be a concern?

Yes it would happen once and no I don't think it's enough to be a concern right now. My only hesitation is that if you want to do this for a lot of gases you could end up with rather big files (but I'd be happy to experiment with that).

2. I'm not sure I grok the compound units issue - is this not addressed by Pint contexts?

There's a few little bugs that need hacks to get around. For example, Pint isn't that smart about dealing with spaces in units. So, for example, if you add the following test to #10 it will fail as shown.

# new test
def test_conversion_tC_per_year(registry):
    1 * registry("tC / yr")
$ pytest test.py::test_conversion_tC_per_year -r a
=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.7.4, pytest-5.3.5, py-1.8.1, pluggy-0.13.1
rootdir: /Users/znicholls/Documents/AGCEC/Misc/units
collected 1 item                                                                                                                                                                                           

test.py F                                                                                                                                                                                            [100%]

================================================================================================= FAILURES =================================================================================================
_______________________________________________________________________________________ test_conversion_tC_per_year ________________________________________________________________________________________

registry = <pint.registry.UnitRegistry object at 0x117f64410>

    def test_conversion_tC_per_year(registry):
>       1 * registry("tC / yr")

test.py:92: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

...

>           raise UndefinedUnitError(name_or_alias)
E           pint.errors.UndefinedUnitError: 'tC' is not defined in the unit registry

venv/lib/python3.7/site-packages/pint/registry.py:626: UndefinedUnitError

That means you have to define all the compound units (without spaces) explicitly or just never use units without spaces (which looks weird in plots). Most of the pint related code in scmdata is dealing with these sort of problems. So I guess what I was hoping is that we could start with what is in scmdata (given it circumvents most such issues) and then improve docs and simplify from there (so that the repo becomes maintainable by all). I'm happy to do that, I just want it to be of interest before I put in the couple of hours it'll take me to set things up.

3. so would it be possible to have definitions defined in a shared resource (e.g., here) and optimizations needed defined separately?

Yes definitely

4. Even if it was possible, would the scmdata devs be interested in that design, given that it will be non-zero work to implement and you were first movers here?

Yes, I think there'd be particular value in having more eyes rather than not (same as your initial comment). I think the value of combining the doc skills of this repo with the understanding of the units, gases and metrics available in scmdata (plus its handling of Pint features) would be of benefit to all. I'm also excited by the idea of working out how to integrate units with pandas a bit more (there is an effort here but it needs people to start trying to use it to see how it breaks).

@gidden
Copy link
Member Author

gidden commented Mar 12, 2020

Ok, thanks @znicholls. Let's then see how this repo progresses and when, as there is interest and sufficient progress, we might be able to make that leap. I'll now close this issue as I think we've discussed as much as we can (others please feel free to reopen as needed).

Just an aside on the units and spaces issue - a thought:

What if we added a helper function in pyam like squeeze_units()? which did the logic so you could do something like

df.squeeze_units().line_plot()

Anyway, for a future discussion =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants