New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve execution speed of rdtools.degradation_classical_decomposition
#371
Conversation
Also, I took the liberty of making a 2.1.6 whatsnew file for this. Happy to change to whatever the release plan is, or feel free to just push changes yourself :) |
rdtools/degradation.py
Outdated
energy_ma.append(np.nan) | ||
|
||
energy_ma = df['energy_normalized'].rolling('365d', center=True).mean() | ||
has_full_year = (df['years'] > df['years'][0] + 0.5) & (df['years'] < df['years'][-1] - 0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make a difference that the old method use >=
and <=
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For daily values, I think it does not matter. Because this code defines a year to be 365 days, with daily values the closest you can get to 0.5 is 0.49863014 or 0.50136986, so there is no difference between <
and <=
.
For some sub-daily cases (e.g. 12 hour intervals), then it would matter, so I have updated to match previous behavior. Thanks! But it also made me realize that the old method of using floats for the datetime equality checks produced inconsistent behavior. For example, with 12-hour inputs, the moving window is sometimes length 730 and sometimes length 731 using the loop approach. In contrast, the pandas windows are always length 730. So although the codes give identical results for daily inputs, they can give slightly different results for sub-daily. The difference is very minor but I point it out for completeness.
I think it makes sense to update the minimum pandas version to 1.3. Looks like #373 needs a more recent minimum version as well. |
Done. As is often the case with increasing minimum versions, it required increasing some others as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @kandersolar and @mikofski
[ ] Code changes have been evaluated for compatibility/integration with TrendAnalysis[ ] New functions added to__init__.py
[ ] API.rst is up to date, along with other sphinx docs pages[ ] Example notebooks are rerun and differences in results scrutinizedrdtools.degradation_classical_decomposition
is rather slow for large inputs (~10 seconds for a 6-year daily dataset). The large runtime is caused by two computational bottlenecks: the moving average calculation and the M-K trend test. The current implementations of these calculations use python loops and are straightforward to replace with vectorized pandas/numpy operations. Doing this speeds up the overallrdtools.degradation_classical_decomposition
runtime by a couple orders of magnitude.The following table compares runtimes (values in seconds), along with their ratio, for various input lengths (number of years of daily values).
Here is some code to verify that the new implementations produce output equivalent to the current implementations:
MK-test
Moving average