Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of pyam.concat() #510

Merged
merged 11 commits into from Mar 22, 2021

Conversation

danielhuppmann
Copy link
Member

@danielhuppmann danielhuppmann commented Mar 21, 2021

Please confirm that this PR has done the following:

  • Tests Added
  • Documentation Added
  • Name of contributors Added to AUTHORS.rst
  • Description in RELEASE_NOTES.md Added

Description of PR

This PR refactors the implementation of pyam.concat() to use pd.concat() on the timeseries data rather than iteratively calling IamDataFrame.append().

closes #500

@danielhuppmann danielhuppmann self-assigned this Mar 21, 2021
@danielhuppmann danielhuppmann mentioned this pull request Mar 21, 2021
4 tasks
@danielhuppmann
Copy link
Member Author

@pjuergens, it would be great if you could try out this implementation for your use cases and report any unexpected issues. Also, it would be interesting to hear about the performance improvements in a real-world application.

@codecov
Copy link

codecov bot commented Mar 21, 2021

Codecov Report

Merging #510 (7c1b183) into main (ebceba1) will increase coverage by 0.0%.
The diff coverage is 100.0%.

Impacted file tree graph

@@          Coverage Diff          @@
##            main    #510   +/-   ##
=====================================
  Coverage   93.1%   93.1%           
=====================================
  Files         43      43           
  Lines       4907    4926   +19     
=====================================
+ Hits        4569    4588   +19     
  Misses       338     338           
Impacted Files Coverage Δ
pyam/core.py 92.5% <100.0%> (+<0.1%) ⬆️
tests/conftest.py 100.0% <100.0%> (ø)
tests/test_core.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ebceba1...7c1b183. Read the comment docs.

@danielhuppmann danielhuppmann marked this pull request as ready for review March 22, 2021 08:31
@pjuergens
Copy link
Contributor

@danielhuppmann thanks for working on this issue. I'll try to test the implementation today.

Concerning performance-improvement in real-world-application: We have a code in which we aggregate a lot of hourly-resolved timeseries. The bottleneck there was the pyam.append-function. Switching from append to concat on major parts of the code with my original implementation I could reduce the calculation time from 24 to 15 minutes. I'll have a deeper look at it again today when I test the new implementation.

@pjuergens
Copy link
Contributor

my script ran without any issues, thanks again!

Copy link
Collaborator

@coroa coroa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I can't find any faults in that implementation. Thanks for the nice speed-up!

The next low-hanging fruit on the speed-up quest is to add a (non-copying!?) fast-path to __init__ for a pd.Series in perfect condition, ie an index with the right names that is unique and maybe other conditions that one would have to extract from format_data to avoid having to reindex.

@danielhuppmann
Copy link
Member Author

At some point, it will be so fast that you don't even have time for coffee while your scripts are running 😜...

@danielhuppmann danielhuppmann merged commit 32b06f7 into IAMconsortium:main Mar 22, 2021
@danielhuppmann danielhuppmann deleted the performance/concat branch March 22, 2021 18:06
coroa added a commit to coroa/pyam that referenced this pull request Mar 29, 2021
coroa added a commit that referenced this pull request Apr 13, 2021
* Fix iterable regression in concat introduced by #510

* Cast to list instead of to iterable

* Update pyam/core.py

Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

concat uses pandas append instead of concat
3 participants