New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clean-up of aggregation features #315
clean-up of aggregation features #315
Conversation
change return type to `IamDataFrame`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr - I don't think this is the best path forward, but I also know you have deadlines and I'm contributing less and less so don't want to be a blocker hence like @gidden I'll put request changes but ignore if you want.
I definitely haven't done the most thorough review. Given the PR has ~1 000 lines of changes that would take more time than I have unfortunately. In general though I think this makes lots of good changes, but I'm concerned about the aggregation feature testing.
the aggregation tests are completely refactored (because of the changed return types, it was easier to completely rewrite rather than figure out how to salvage)
I think this is a really bad idea and think the effort of putting tests back in is worth it. My plan a) would be to do it in this PR but making an issue and addressing it later could also be fine. To explain why I think this: checking aggregation and internal consistency is hard because there are lots of edge cases (mainly bunker related...) and because you want it to work on big datasets so you need some (admittedly annoyingly large) test sets. The existing tests had covered a lot of that and just throwing that away risks it never coming back (or bugs re-appearing when you really wish they wouldn't e.g. in the middle of checking AR6 data).
I like all the other changes to return types etc. and think they'll make things way easier to use. The only other thing that I would reconsider is that check_internal_consistency
has components=True
for check_aggregate_region
which is the opposite to the default of check_aggregate_region
, I would find this very confusing as a new user. I would make components=True
the default for `check_aggregate_region.
Co-Authored-By: Zeb Nicholls <zebedee.nicholls@climate-energy-college.org>
@danielhuppmann I share @znicholls opinion on the test refactor here (also not blocking but with a strong preference). I may not grok the gritty details, but is it possible to write a wrapper function in the tests that would translate the newly returned |
…` (per comment by @znicholls)
re the comments by @znicholls and @gidden:
|
Hi @danielhuppmann, I've been through the It really looks nice. You've implemented all our suggestions in a very detailed way. I was really enjoyable to follow the steps in the notebook. I have no further comments to add to it (I'll keep in mind some formatting features you included, e.g. the blue boxes, to potentially include them also in the I regards of the previous conversation about the refactoring of tests, I have little experience with them. I'm still getting familiar with |
All good there's always a nasty conflict between time pressure and moving slowly enough that everyone can keep up (you're striking a good balance)
All makes sense thanks for spelling it out.
fair, they were definitely not as clear as they should have been (I have since learnt that whilst this 'efficiency' approach looks good, it's actually a terrible idea as it's not explicit enough about what is going on). Let's park this in #317 for now |
Ok all, will merge this now that everything is approved/reviewed/marked for future issues. Thanks @danielhuppmann for the huge effort here and all reviewers! |
Please confirm that this PR has done the following:
Description of PR
This PR implements a number of clean-ups following PRs #305 and #312:
aggregate()
andaggregate_region()
is changed to anIamDataFrame
instance (per suggestion by @jkikstra), previously a timeseries-dataframecheck_aggregate()
andcheck_aggregate_region()
is changed to apd.DataFrame
with both expected and actual value (previously only the expected value)equals()
function (originally used to make tests easier)pyam.testing.assert_frames_equal()
function (to make tests easier)