Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function to aggregate a variable to a region from subregions #207

Merged
merged 24 commits into from Apr 29, 2019

Conversation

danielhuppmann
Copy link
Member

@danielhuppmann danielhuppmann commented Mar 7, 2019

Please confirm that this PR has done the following:

  • Tests Added
  • Documentation Added
  • Description in RELEASE_NOTES.md Added

Description of PR

This PR refactors the check_aggregate_regions() function into a separate aggregate_region() function and a check_...() function to use it in scenario postprocessing (calculating regional values rather than just checking that the aggregation is correct).

The important issue here is that any variables components that are only defined at the region (e.g., 'World') should be added to the regional total. Say that CO2 emissions from air travel are only accounted for at the global level, the following output should be returned:

Input

region variable unit 2020
Europe Emissions|CO2 GtCO2 5
Africa Emissions|CO2 GtCO2 3
World Emissions|CO2|Air Travel GtCO2 1

Expected output

region variable unit 2020
World Emissions|CO2 GtCO2 9

Refactoring as part of this PR

While implementing, there were a number of issues that I believed should be improved even though they break the API.

@znicholls, you implemented the first version of this, can you mark the items below if you agree with the change?

  • all [check_]aggregate[_regions]() functions: refactor units to unit for closer resemblance to df.filter()
  • the check_aggregate...() functions return the index columns in the standard IAMC-order
    (i.e, [model, scenario, region, variable, unit])
  • the function check_aggregate_regions() was renamed to check_aggregate_region() because it only ever checks one region at a time
  • the function check_aggregate_regions() takes a kwarg subregions for an optional list of regions to aggregate and a kwarg components for variable components to be included at the region level
    (before, it was not possible to select custom variable components and components referred to the custom subregions, differing from the use in check_aggregate()
  • _apply_filters() interprets col=None as no filter applied, i.e, all True (before, it would return all False). This streamlines passing the unit filter from the [check_]aggregate[_regions]() and is in line with how pandas treats slice(None).
  • _apply_filters() takes kwargs, not a dict

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 84.785% when pulling 52bd4b2 on danielhuppmann:aggregate_region into 381c4f6 on IAMconsortium:master.

@coveralls
Copy link

coveralls commented Mar 7, 2019

Coverage Status

Coverage increased (+0.08%) to 91.918% when pulling e38afc5 on danielhuppmann:aggregate_region into 7d695a6 on IAMconsortium:master.

@gidden
Copy link
Member

gidden commented Mar 7, 2019

Hey @danielhuppmann quick first question. I think we already have an aggregate_regions style function in map_region. Is there overlap here? Happy to use any suggestions you have to improve/change it. Plus I see the docstring is definitely wrong in its summary...

pyam/core.py Show resolved Hide resolved
@znicholls
Copy link
Collaborator

* `_apply_filters()` takes `kwargs`, not a `dict`

Probably need to ask @gidden about this one I think?

Otherwise looks very nice to me

@danielhuppmann
Copy link
Member Author

danielhuppmann commented Mar 11, 2019

Hey @danielhuppmann quick first question. I think we already have an aggregate_regions style function in map_region. Is there overlap here? Happy to use any suggestions you have to improve/change it. Plus I see the docstring is definitely wrong in its summary...

  1. why do you think that the docstring is wrong?

  2. well, I didn't have map_regions() on my radar... dug into it a bit now.

    One could use aggregate_region() iteratively to get map_regions(), see the example in footnote [1].
    Whether it's worth the effort, I can't tell...

    main distinction

    • map_regions()
      • implicitly assumes that the variable tree is complete
      • supports different mappings of subregions to regions per model
      • returns an average (?) over each subregion, see footnote [2]
      • takes a table with two columns as input, e.g.,
        pd.DataFrame([['NAF', 'R5MAF'], ['ME', 'R5MAF']], columns=['region', 'r5_region'])
    • aggregate_region()
      • intended to build up a region "tree" similar to a variable tree (calling it recursively over a variable tree and region hierarchy from bottom up)
      • implicitly assumes a common (or at least not conflicting) region names
      • includes variable components that are only defined at the region level (not the subregions), see Input and Expected Output in the description of this PR.
      • takes the mapping as kwargs, e.g. region='R5MAF', subregions=['NAF', 'ME'] (see footnote [1])

[1] e.g., for df = IamDataFrame(REG_DF).filter(model='IMAGE') where REG_DF is from conftest.py

df.aggregate_region(variable='Primary Energy', region='R5MAF', subregions=['NAF', 'ME'],
                    components=[])

is similar to (except for the "average" issue in [2])

df.map_regions('r5_region').timeseries()

where region='R5MAF', subregions=['NAF', 'ME'] could also be read dynamically from run_control() as is implemented in map_regions().

[2] calling IamDataFrame(REG_DF).map_regions('r5_region').timeseries(), does NOT return the World timeseries.

@gidden
Copy link
Member

gidden commented Mar 29, 2019

I meant map_regions docstring is wrong, sorry! =)

@gidden
Copy link
Member

gidden commented Mar 29, 2019

Ok, just getting back to this now. I agree that there are differences between the two, but my suspicion here is that we should harmonize them into one function (or two, but different).

The original goal of map_regions was to be able to take either a many-to-one or one-to-many mapping and plot them. Perhaps it would be better to break this into two different functions, one to aggregate, and one to "paint" (for lack of a better word). What do you think? In principle, it should be relatively easy to update the tests/tutorials that go in "that direction" (many-to-one).

What do you think?

@danielhuppmann
Copy link
Member Author

Agree that the two could, maybe should, be refactored into a data-manipulation and a “paint” function. But...

aggregate_regions() has the extra feature to automatically search for and add subcategories of the one variable and one region that it operates on that are not defined in the subregions.

map_regions() just aggregates all variables without consideration of the variable tree and whatever region mapping it gets - or it loads a default region mapping from file and does some model-dependent best-guessing how to apply the mapping.

Merging these two features will require a lot kwargs...

I did go down that rabbit hole to harmonise them and I did not see a light after an hour or two, so I abandoned the effort. We can venture into it again together next week...

@gidden
Copy link
Member

gidden commented Apr 23, 2019

Per in-person discussions, we have tentitavely agreed to break this into two functions:

  1. aggregate_regions() for the many-to-one mapping
  2. downscale_regions() for the one-to-many mapping

@danielhuppmann
Copy link
Member Author

@gidden, merge conflicts resolved, should be good to be merged once the CI passes

pyam/core.py Show resolved Hide resolved
pyam/core.py Show resolved Hide resolved
pyam/core.py Outdated Show resolved Hide resolved
@znicholls
Copy link
Collaborator

Just catching up on this now

Per in-person discussions, we have tentitavely agreed to break this into two functions:

  1. aggregate_regions() for the many-to-one mapping
  2. downscale_regions() for the one-to-many mapping

So in this PR you add aggregate_regions(), in a future PR we will add downscale_regions()?

@danielhuppmann
Copy link
Member Author

Just catching up on this now

Per in-person discussions, we have tentitavely agreed to break this into two functions:

  1. aggregate_regions() for the many-to-one mapping
  2. downscale_regions() for the one-to-many mapping

So in this PR you add aggregate_regions(), in a future PR we will add downscale_regions()?

Yes (but renaming the functions to aggregate_region() and check_aggregate_region() because it is only one region being treated).

And as suggested in the inline discussion above, we could refactor the existing map_regions() to take a region-mapping and then call the functions iteratively as required.

pyam/core.py Outdated Show resolved Hide resolved
@gidden
Copy link
Member

gidden commented Apr 29, 2019

hey @danielhuppmann. it turns out the geopandas dep is more complicated than initially thought - it seems some underlying datasets have changed (perhaps ISO names? I would need to dig further)... For the moment, can you try updating the CI conda installs to geopandas<0.5.0 and I can make an issue

appveyor.yml Outdated Show resolved Hide resolved
.travis.yml Outdated Show resolved Hide resolved
@danielhuppmann
Copy link
Member Author

force-pushed to get rid of my own very sorry efforts at implementing @gidden's suggestion

cherrypicked from #226

@danielhuppmann danielhuppmann mentioned this pull request Apr 29, 2019
@gidden gidden merged commit 4077929 into IAMconsortium:master Apr 29, 2019
@danielhuppmann
Copy link
Member Author

Thanks for the assist with geopandas, @gidden!

I made an issue for the refactoring of map_regions() (#225) as discussed. So should be good to go!

@gidden
Copy link
Member

gidden commented Apr 29, 2019

Woo, thanks so much @danielhuppmann. can you please add that conversation to the map_region() issue?

@gidden
Copy link
Member

gidden commented Apr 29, 2019

hah, beat me to it =)

@danielhuppmann danielhuppmann deleted the aggregate_region branch April 29, 2019 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants