Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a validation check to rename() #202

Merged
merged 7 commits into from Feb 27, 2019

Conversation

danielhuppmann
Copy link
Member

@danielhuppmann danielhuppmann commented Feb 26, 2019

Please confirm that this PR has done the following:

  • Tests Added
  • Documentation Added
  • Description in RELEASE_NOTES.md Added

Description of PR

This PR adds a validation check to df.rename() to prevent accidentally renaming and aggregating existing and renamed variables.

In terms of the new unit test test_rename_duplicates(), the user may not be (actively) aware that a variable test_3 exists in the IamDataFrame and not actually want to aggregate test_1 and test_3. With the new feature, the user will receive an error message as default behaviour and has to actively override the validation step.

closes #182

pyam/core.py Outdated Show resolved Hide resolved
tests/test_feature_append_rename_convert.py Outdated Show resolved Hide resolved
@danielhuppmann
Copy link
Member Author

@gidden, ready for review if you have a minute...

@coveralls
Copy link

coveralls commented Feb 26, 2019

Coverage Status

Coverage increased (+0.08%) to 84.686% when pulling 57115e7 on danielhuppmann:rename_wich_check into bc5fcbc on IAMconsortium:master.

pyam/core.py Outdated
@@ -580,6 +589,7 @@ def rename(self, mapping=None, inplace=False, append=False, **kwargs):
# renaming is only applied where a filter matches for all given columns
rows = ret._apply_filters(filters)
idx = ret.meta.index.isin(_make_index(ret.data[rows]))
_data = ret.data.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to make a copy here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case check_duplicates=True and a ValueError is raised, we don't want any changes to the IamDataFrame to persist.

But I guess _data = ret.data.copy() if check_duplicates else ret.data would be slightly more efficient, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Can you add a warning in the docstring stating that this makes a copy as well? Also, are you sure we want that to be True by default? What are the pro/cons?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the more elaborate only-make-interim-copy-if-check_duplicates and an inline comment to explain that the interim copy is kept until after the validation check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are you sure we want that to be True by default? What are the pro/cons?

I think that the more careful behaviour (check_duplicates == True) should be the default. Users should explicitly have to override the sanity check when (as in the unit test) renaming timeseries test_1 to test_3 and aggregating with existing timeseries data of test_3.

@gidden
Copy link
Member

gidden commented Feb 27, 2019

looks good, thanks @danielhuppmann !

@gidden gidden merged commit 381c4f6 into IAMconsortium:master Feb 27, 2019
@danielhuppmann danielhuppmann deleted the rename_wich_check branch February 28, 2019 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overly ambitious groupby().sum()´ in rename()`
4 participants