-
Notifications
You must be signed in to change notification settings - Fork 62
Optimize dseries.rolling.mean() #611
Optimize dseries.rolling.mean() #611
Conversation
7493231
to
1229c1c
Compare
1229c1c
to
9c923ec
Compare
…ture/series_rolling_mean_opt
…ture/series_rolling_mean_opt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current performance:
name | nthreads | type | size | median |
---|---|---|---|---|
Series.rolling.mean | 1 | Python | 200000 | 1.298 |
Series.rolling.mean | 1 | SDC | 200000 | 1.019 |
Series.rolling.mean | 4 | Python | 200000 | 1.418 |
Series.rolling.mean | 4 | SDC | 200000 | 0.469 |
Python 1 / SDC 4 = 2.77
The scalability was enabled.
…ture/series_rolling_mean_opt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current performance:
name | nthreads | type | size | median |
---|---|---|---|---|
Series.rolling.mean | 1 | Python | 800000 | 4.135 |
Series.rolling.mean | 1 | SDC | 800000 | 0.84 |
Series.rolling.mean | 4 | Python | 800000 | 4.009 |
Series.rolling.mean | 4 | SDC | 800000 | 0.279 |
Python 1 / SDC 1 = 4.923
Python 1 / SDC 4 = 14.821
The scalability was enabled.
if numpy.isnan(result): | ||
result = value / nfinite | ||
else: | ||
result = ((nfinite - 1) * result + value) / nfinite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we can introduce new function something like get_result
. Which for mean
would be:
def get_mean(nfinite, result):
if nfinite:
return result/nfinite
return numpy.nan
And then, call it inside result_or_nan
:
@sdc_register_jitable
def result_or_nan(get, nfinite, minp, result):
"""Get result taking into account min periods."""
if nfinite < minp:
return numpy.nan
return get(nfinite, result)
For mean
we can do recalculations on put
/pop
stage, but for more complex functions (std, var) it is not that easy.
Hope, it wouldn't ruin our performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest patch set recommended by @AlexanderKalistratov actually improved performance:
name | nthreads | type | size | median |
---|---|---|---|---|
Series.rolling.mean | 1 | Python | 800000 | 4.836 |
Series.rolling.mean | 1 | SDC | 800000 | 0.496 |
Series.rolling.mean | 4 | Python | 800000 | 4.648 |
Series.rolling.mean | 4 | SDC | 800000 | 0.256 |
Python 1 / SDC 1 = 9.75
SDC 1 / SDC 4 = 1.938
Previous implementation results:
Optimized implementation results:
The optimized implementation executes faster up to ~45 times than previous one and faster up to ~5 times than Python. There is no scalability due to prange isn't used at all because variable nfinite (number of finite values) is common for all threads.