Add an option to skip existing intermediate variables when aggregating recursivly #532

pjuergens · 2021-05-12T08:00:31Z

Please confirm that this PR has done the following:

Tests Added
Documentation Added
Name of contributors Added to AUTHORS.rst
Description in RELEASE_NOTES.md Added

Description of PR

Add an option to skip existing intermediate variables when aggregating recursivly as discussed in #525

codecov · 2021-05-12T08:10:05Z

Codecov Report

Merging #532 (25edae0) into main (8d8aa6b) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@          Coverage Diff          @@
##            main    #532   +/-   ##
=====================================
  Coverage   93.5%   93.5%           
=====================================
  Files         47      48    +1     
  Lines       5228    5257   +29     
=====================================
+ Hits        4891    4920   +29     
  Misses       337     337

Impacted Files	Coverage Δ
pyam/_aggregate.py	`99.0% <100.0%> (+<0.1%)`	⬆️
pyam/_compare.py	`100.0% <100.0%> (ø)`
pyam/core.py	`92.6% <100.0%> (-0.1%)`	⬇️
tests/conftest.py	`100.0% <100.0%> (ø)`
tests/test_feature_aggregate.py	`98.8% <100.0%> (+<0.1%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d8aa6b...25edae0. Read the comment docs.

pjuergens · 2021-05-12T09:55:08Z

I don't understand why stickler-ci is failing as it doesn't tell me the line where the PR differs from the black code style

pyam/core.py

danielhuppmann · 2021-05-12T11:56:21Z

You can directly commit my suggestion to your branch directly, this should appease Stickler. Going forward, I recommend that you install a Black linter utility on your machine and follow this workflow:

write the code
commit the changes
run the linter
commit the changes, looking carefully at what parts of the code were changed by the linter
then push to GitHub

After a while, you'll not need to run the linter anymore because you'll be coding black naturally...

Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>

pjuergens · 2021-05-12T12:16:32Z

thanks, that did the job :)

danielhuppmann · 2021-05-13T15:24:38Z

Looking at the code implementation in a bit more detail, two thoughts:

having thought about this feature again, I think it makes more sense to not add an extra kwarg skip_intermediate but instead do the following:
- if an intermediate variable exists, check that the values are in line with the aggregate of its components (rather than simply skipping the computation) to ensure internal consistency along the variable hierarchy
- if it doesn't exist, compute the aggregate of the components and append the data
I'm a bit concerned with the proposed implementation about unexpected behavior if a user has an IamDataFrame with multiple scenarios, where one scenario (scen_a) has some intermediate variables and another (scen_b) has only the lower-level components. With the proposed implementation, scen_b would not be complete.

The (in my opinion expected) behavior could be tested by adding the following line
```
df2.aggregate("Secondary Energy|Electricity|Wind", append=True)
```
in the function test_aggregate_recursive() before appending df2 to df.

This way, the existing test could be modified and you wouldn't need to add an almost-duplicate test.

tests/test_feature_aggregate.py

pjuergens · 2021-05-14T09:44:52Z

I changed the implementation from comparing variables beforehand to comparing indices after aggregating, which should now be able to deal with different variable sets in different scenarios, regions or even years.

Concerning your first thought: I'd prefer to not check internal consistency while skipping intermediate variables but rather let the user check it afterwards. When explicitly using this feature it should be clear that the intermediate variables are not aggregated by pyam and should be checked. Running check_internal_consistency then gives the user more information about which variables differ and if it's due to rounding errors. In my opinion more useful for potential debugging.

pjuergens · 2021-05-14T11:18:37Z

It seems like stickler uses slightly different definitions of black than the built-in black-linter in Spyder.

danielhuppmann · 2021-05-14T11:52:54Z

Thanks for continuing to work on this, sorry for the issues with difference versions of black...

Re your comment:

When explicitly using this feature it should be clear that the intermediate variables are not aggregated by pyam and should be checked.

There is a tradeoff here:

advanced users will know what they are doing - for them, the increased efficiency of skipping validation is a bonus
for non-expert users, pyam should have sensible guardrails to prevent them from doing stupid things or running into unexpected behavior - for them, pyam should raise loud and specific error messages

And either way, first running recursive-aggregation and the checking consistency is not an efficient approach, because it requires computing a lot of data twice.

So let me modify my earlier suggestion:

drop the skip_intermediate keyword argument - skipping should be the default, but with validation
allow recursive to be a boolean (as it is now) or a string "skip-validate" - and skip the validation in that case.
So your expert-level function could be called with
```
df.aggregate(<variable>, recursive="skip-validate")
```
but novice users would get an error if the hierarchy is not consistent.
In core.py, you only have to change if recursive is True: to if recursive:, then the recursive-aggregation will be selected both if it's True or a dictionary.

Later extensions could then also do something like recursive={rtol=0.2} where the contents of the dictionary are passed to np.isclose() in the validation to avoid errors if data is almost identical but not quite.

pjuergens · 2021-05-14T15:46:15Z

I implemented my approach for the aggregation check. I will change the interface with dropping the skip_intermediate keyword after the weekend.

danielhuppmann

Just one minor in-line comment for now - will need to check the approach more thoroughly later...

pyam/_aggregate.py

Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>

Refactor implementation for (skipping) validation

danielhuppmann

Thanks @pjuergens for the initiative!

pjuergens added 2 commits May 12, 2021 09:57

implemented skip_intermediate option

17b132d

Updated release notes

bc064d9

pjuergens added 2 commits May 12, 2021 10:10

Test added

6a30ec6

Bugfix in tests

0ec0804

pjuergens marked this pull request as draft May 12, 2021 09:13

pjuergens added 3 commits May 12, 2021 11:16

styleguide checked

2b0ddaa

fix Style guide

a7d1d4e

finally code style

b4509e5

pjuergens marked this pull request as ready for review May 12, 2021 09:55

danielhuppmann assigned pjuergens May 12, 2021

danielhuppmann self-requested a review May 12, 2021 10:49

danielhuppmann reviewed May 12, 2021

View reviewed changes

pyam/core.py Outdated Show resolved Hide resolved

fix style guide

7b4e266

Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>

pjuergens added 2 commits May 14, 2021 11:23

changed implementation to deal with different scenarios

f939a8e

added test for different scenarios

a452f2e

danielhuppmann reviewed May 14, 2021

View reviewed changes

tests/test_feature_aggregate.py Outdated Show resolved Hide resolved

fixed test

39ea9bf

pjuergens marked this pull request as draft May 14, 2021 09:47

pjuergens added 6 commits May 14, 2021 12:24

skip intermediate variable also at highest level

5a2ced2

Bugfix

144b39d

Bugfix test

d6db93a

again bugfix test

cbf4eeb

formatting black

0a5b53d

fix black style

0d7c11c

pjuergens added 2 commits May 14, 2021 17:42

added aggregation check

b904073

black style guide

9ab914e

pjuergens added 4 commits May 19, 2021 14:26

changed interface of recursive aggregation

0b9c26f

fixed black style

7edbdda

automated black style with spyder

7a18b43

change back to stickler black style

cc581c8

pjuergens marked this pull request as ready for review May 19, 2021 13:31

danielhuppmann reviewed May 20, 2021

View reviewed changes

pyam/_aggregate.py Outdated Show resolved Hide resolved

pjuergens and others added 8 commits May 25, 2021 09:23

Update pyam/_aggregate.py

70b1d70

Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>

Merge branch 'main' into intermediate-aggregate

d0ad181

Switch order to ['left', 'right'] in returned object from compare()

364fe01

Move internal implementation of compare to own module

66cabcb

Save _data as pd.Series in swap_time_for_year()

c7d7168

Implement a once-through aggregate-and-validate method

d188e00

Move recursive-aggregation data to conftest.py

d43351d

Add validation that recursive aggregation fails if data is inconsistent

e3775bc

danielhuppmann mentioned this pull request May 29, 2021

Refactor implementation for (skipping) validation pjuergens/pyam#2

Merged

danielhuppmann and others added 4 commits May 29, 2021 12:59

Fix the test of the compare function (changed order of cols)

b4f65f4

Fix calling the internal compare function

1d386fa

Merge pull request #2 from danielhuppmann/intermediate-aggregate-alt

f6464a8

Refactor implementation for (skipping) validation

Merge branch 'main' into intermediate-aggregate

25edae0

danielhuppmann approved these changes Jun 10, 2021

View reviewed changes

danielhuppmann merged commit 412fcd8 into IAMconsortium:main Jun 10, 2021

danielhuppmann mentioned this pull request Jun 30, 2021

recursive aggregation with partially existing aggregated variables #525

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to skip existing intermediate variables when aggregating recursivly #532

Add an option to skip existing intermediate variables when aggregating recursivly #532

pjuergens commented May 12, 2021 •

edited

codecov bot commented May 12, 2021 •

edited

pjuergens commented May 12, 2021

danielhuppmann commented May 12, 2021

pjuergens commented May 12, 2021

danielhuppmann commented May 13, 2021

pjuergens commented May 14, 2021

pjuergens commented May 14, 2021

danielhuppmann commented May 14, 2021

pjuergens commented May 14, 2021

danielhuppmann left a comment

danielhuppmann left a comment

Add an option to skip existing intermediate variables when aggregating recursivly #532

Add an option to skip existing intermediate variables when aggregating recursivly #532

Conversation

pjuergens commented May 12, 2021 • edited

Please confirm that this PR has done the following:

Description of PR

codecov bot commented May 12, 2021 • edited

Codecov Report

pjuergens commented May 12, 2021

danielhuppmann commented May 12, 2021

pjuergens commented May 12, 2021

danielhuppmann commented May 13, 2021

pjuergens commented May 14, 2021

pjuergens commented May 14, 2021

danielhuppmann commented May 14, 2021

pjuergens commented May 14, 2021

danielhuppmann left a comment

Choose a reason for hiding this comment

danielhuppmann left a comment

Choose a reason for hiding this comment

pjuergens commented May 12, 2021 •

edited

codecov bot commented May 12, 2021 •

edited