Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean-up of aggregation features #315

Merged
merged 57 commits into from Jan 13, 2020

Conversation

danielhuppmann
Copy link
Member

@danielhuppmann danielhuppmann commented Dec 31, 2019

Please confirm that this PR has done the following:

  • Tests Added
  • Documentation Added
  • Description in RELEASE_NOTES.md Added

Description of PR

This PR implements a number of clean-ups following PRs #305 and #312:

  • the return type of aggregate() and aggregate_region() is changed to an IamDataFrame instance (per suggestion by @jkikstra), previously a timeseries-dataframe
  • the return type of check_aggregate() and check_aggregate_region() is changed to a pd.DataFrame with both expected and actual value (previously only the expected value)
  • adds an equals() function (originally used to make tests easier)
  • adds an pyam.testing.assert_frames_equal() function (to make tests easier)
  • the tutorial for aggregation, downscaling and consistency checking is reworked (now includes description of regional components, weighted average, setting variables as list, downscaling)
  • the aggregation tests are completely refactored (because of the changed return types, it was easier to completely rewrite rather than figure out how to salvage)

pyam/core.py Outdated Show resolved Hide resolved
pyam/core.py Outdated Show resolved Hide resolved
tests/test_feature_aggregate.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@znicholls znicholls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr - I don't think this is the best path forward, but I also know you have deadlines and I'm contributing less and less so don't want to be a blocker hence like @gidden I'll put request changes but ignore if you want.


I definitely haven't done the most thorough review. Given the PR has ~1 000 lines of changes that would take more time than I have unfortunately. In general though I think this makes lots of good changes, but I'm concerned about the aggregation feature testing.

the aggregation tests are completely refactored (because of the changed return types, it was easier to completely rewrite rather than figure out how to salvage)

I think this is a really bad idea and think the effort of putting tests back in is worth it. My plan a) would be to do it in this PR but making an issue and addressing it later could also be fine. To explain why I think this: checking aggregation and internal consistency is hard because there are lots of edge cases (mainly bunker related...) and because you want it to work on big datasets so you need some (admittedly annoyingly large) test sets. The existing tests had covered a lot of that and just throwing that away risks it never coming back (or bugs re-appearing when you really wish they wouldn't e.g. in the middle of checking AR6 data).

I like all the other changes to return types etc. and think they'll make things way easier to use. The only other thing that I would reconsider is that check_internal_consistency has components=True for check_aggregate_region which is the opposite to the default of check_aggregate_region, I would find this very confusing as a new user. I would make components=True the default for `check_aggregate_region.

pyam/aggregate.py Outdated Show resolved Hide resolved
pyam/aggregate.py Outdated Show resolved Hide resolved
pyam/aggregate.py Outdated Show resolved Hide resolved
pyam/aggregate.py Outdated Show resolved Hide resolved
pyam/core.py Outdated Show resolved Hide resolved
pyam/core.py Outdated Show resolved Hide resolved
pyam/testing.py Outdated Show resolved Hide resolved
@gidden
Copy link
Member

gidden commented Jan 10, 2020

@danielhuppmann I share @znicholls opinion on the test refactor here (also not blocking but with a strong preference).

I may not grok the gritty details, but is it possible to write a wrapper function in the tests that would translate the newly returned IamDataFrame back to a pd.DataFrame in the test's expected format and then apply that in each test a la pdt.assert_frame_equals(pd_to_pyam_test(obs), exp)?

@danielhuppmann
Copy link
Member Author

re the comments by @znicholls and @gidden:

  1. sorry that this is such a big refactoring - I divided it into a suite of PRs to make it manageable, but it's still huge...

  2. API of check_internal_consistency(): I added a kwarg components so that the behaviour can be controlled and passed through to check_aggregate_region(). However, as to whether we should use True or False as default, detecting components that exist only at the region-level is one use case out of many (it's relevant mainly in the emissions domain). And it would interfere with using (weighted) average or other methods which are equally valid and relevant, so I changed that to False with a purpose.

  3. about the tests: I agree that just dropping tests is a really bad idea, but the suite of tests were written so efficiently (a dozen different tests with hardly any documentation about their intent, all feeding into one master testing function with a bunch of if-else-logic to do the comparison), it just wasn't possible for me to disentangle this.
    @gidden, your approach sounds smart but if you look at what was actually tested (length of indices, etc.) this won't work. The new test suite with assert_frame_equal() is far more explicit.
    @znicholls, if you can spell out for each of the removed tests what their intent was, I'll be happy to re-introduce them in a next step.

@francescolovat
Copy link

Hi @danielhuppmann,

I've been through the pyam_first_steps.ipynb tutorial again.

It really looks nice. You've implemented all our suggestions in a very detailed way. I was really enjoyable to follow the steps in the notebook.

I have no further comments to add to it (I'll keep in mind some formatting features you included, e.g. the blue boxes, to potentially include them also in the MESSAGEix tutorials).

I regards of the previous conversation about the refactoring of tests, I have little experience with them. I'm still getting familiar with pandas.util.testing methods. Matt's suggestions of a wrapper function looks promising, however it is also a matter of the time left before the openmod meeting.

@znicholls
Copy link
Collaborator

znicholls commented Jan 11, 2020

  1. sorry that this is such a big refactoring - I divided it into a suite of PRs to make it manageable, but it's still huge...

All good there's always a nasty conflict between time pressure and moving slowly enough that everyone can keep up (you're striking a good balance)

2. API of check_internal_consistency()

All makes sense thanks for spelling it out.

3. about the tests

fair, they were definitely not as clear as they should have been (I have since learnt that whilst this 'efficiency' approach looks good, it's actually a terrible idea as it's not explicit enough about what is going on). Let's park this in #317 for now

@znicholls znicholls mentioned this pull request Jan 11, 2020
pyam/_aggregate.py Outdated Show resolved Hide resolved
pyam/core.py Outdated Show resolved Hide resolved
pyam/core.py Outdated Show resolved Hide resolved
@gidden
Copy link
Member

gidden commented Jan 13, 2020

Ok all, will merge this now that everything is approved/reviewed/marked for future issues. Thanks @danielhuppmann for the huge effort here and all reviewers!

@gidden gidden merged commit 412ff2a into IAMconsortium:master Jan 13, 2020
@danielhuppmann danielhuppmann deleted the cleanup/aggregation branch January 13, 2020 10:10
@danielhuppmann danielhuppmann mentioned this pull request Feb 16, 2020
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants