New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial region aggregation #99
Partial region aggregation #99
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that implementation was too easy... Best to extend the test, I guess...
Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>
I added deep copies for both
This test would have failed without the use of the deep copy as we would have removed Primary Energy from the list of variables that need to be aggregated. One more point remains though: I deliberately put the numbers so that for common_region_A which is comprised of region_A and region_B (as defined in |
In the end this was a bit trickier that I thought including all edge cases. Nonetheless, here's my first prototype for partial region aggregation with the comparison feature between model native and aggregated results. Points addressed
Differences found between model native and aggregated results
model native aggregated
model scenario region variable unit year
m_a s_a common_region_A Primary Energy EJ/yr 2005 1 4
2010 2 6
Open points
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @phackstock - the mechanics of the implementation look great, but I have a few implementation suggestions...
Thanks for the suggestions @danielhuppmann, I updated the code accordingly. I also added a check for the warning message in Coming back to the open points I raised in my comment two days ago:
|
Thanks, looks good!
Nice, fine to leave here.
Not quite sure what you mean, maybe make an issue and tackle later...
Now that you specifically ask... It might make sense to change the implementation strategy. Do all aggregation first, then do the comparison. I gave it a try, see phackstock#2 Additional benefits: faster performance (because filtering and comparison only once, not in every iteration), and only one warning message per model.
The suggestion would also take care of this.
The suggestion would also take of it, because it filters to all valid variables provided at a common-region. |
Thanks a lot for the PR @danielhuppmann, we can continue the discussion over there. |
…egation-alternative Alernative implementation for partial region aggregation
UpdatesTestsAdded unit tests for partial region aggregation covering the following cases:
Tests also check logging output. Compare & merge function@danielhuppmann I took your compare & merge for provided and aggregated results and put it in a dedicated function. Now Adjust pyam log levelIn order to get rid of some warnings from pyam regarding empty data frames as a result of the comparison between provided and aggregated results I extended the context manager Open pointsFrom my side the only thing that's remaining is to update the docs. I would put this in the here https://nomenclature-iamc.readthedocs.io/en/latest/usage.html#regionprocessor, if that works for you @danielhuppmann. |
When writing the documentation I stumbled upon a possible bug in the way partial region aggregation is performed. - Variable B:
unit: EJ/yr
region-aggregation:
- Variable B:
method: max with the following model mapping: model: m_a
common_regions:
- World:
- region_A
- region_B then providing data such as this: [
["m_a", "s_a", "region_A", "Variable B", "EJ/yr", 1, 2],
["m_a", "s_a", "region_B", "Variable B", "EJ/yr", 3, 4],
["m_a", "s_a", "World", "Variable B", "EJ/yr", 4, 6]
] and performing region aggregation would give us, according to our "provided data takes precedence over aggregated"-rule:
which would be incorrect as
What are your thoughts on this @danielhuppmann? |
Added some documentation about partial region aggregation. As touched on before, the docs will probably need another re-work soon, but that's a separate issue. |
Re the potential bug or inconsistency in the partial region aggregation... I do not think that this is a bug or problem (from a technical-implementation point of view). If "Variable B" is defined such that the maximum across regions should be reported at the "World" level, and a team reports the sum instead of the max, this is clearly a reporting error. The package can help identify reporting errors by writing a warning to the log, but it is not the purpose of the package to be smarter than the user... |
Ah yes fair enough, then it should be all good. The warning is written already anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
PS: We should revisit the docs and harmonize the notation (provided data vs. original) in the not-too-distat future...
Agreed on the docs part, I'm also not super happy with the distinction between "Getting Started" and "Usage". I find myself looking for things, not knowing in which part they are. |
closes #96.
New Feature
This PR implements partial region aggregation.
Partial region aggregation happens when a common region is already being reported from a model natively.
This can encompass all variables or only a subset. Additionally this also works for variables that are only reported on this region level.
Example
Say we have model native results that look like this:
The corresponding model mapping like like this:
The region processor now applies the following logic:
The resulting data frame then contains the following:
Open Points