Overload df.rolling.cov, add perf.test #547

densmirn · 2020-01-30T11:32:43Z

No description provided.

…ure/df_rolling_cov

kozlov-alexey · 2020-01-31T00:10:00Z

sdc/datatypes/common_functions.py

+        min_length = min(arr_len, other_arr_len)
+        length = max(arr_len, other_arr_len) if size == 'max' else min_length
+
+        aligned_arr = numpy.array([numpy.nan] * length)


You could probably use numpy.repeat(numpy.nan, length), but I doubt it's parallelized. So it might turn out that using numpy.array is faster.

Measured res = numpy.array([numpy.nan] * 10 ** 8) vs. numpy.repeat(numpy.nan, 10 ** 8) with SDC_AUTO_PARALLEL = True

name nthreads type size median min max compile boxing

numpy.array 1.0 SDC 100000000.0 1.0090000629425049 1.002000093460083 1.022000074386597 0.09800267219543456 0.09000849723815918

numpy.array 4.0 SDC 100000000.0 1.2319998741149902 1.0130000114440918 2.2649998664855957 0.0879981517791748 0.09700202941894531

numpy.repeat 1.0 SDC 100000000.0 0.4199998378753662 0.3970000743865967 0.49299979209899897 0.15896368026733398 5.0067901611328125e-06

numpy.repeat 4.0 SDC 100000000.0 0.3919999599456787 0.3859999179840088 0.4030001163482666 0.15500760078430176 1.6689300537109375e-06

I don't see scalability at all. numpy.repeat looks faster than numpy.array in this case. So let me replace numpy.array with numpy.repeat.

I measured performance based on SDC performance test system:
master...densmirn:perf/array_vs_repeat

kozlov-alexey · 2020-01-31T00:28:50Z

sdc/datatypes/hpat_pandas_dataframe_rolling_functions.py

+    kws = {'other': 'None', 'pairwise': 'None', 'ddof': '1'}
+
+    if none_other:
+        return gen_df_rolling_method_other_none_impl('_df_cov', self, kws=kws)


I think we should add a comment here to explain the need for _df_cov (or maybe add docstrings to cov and _df_cov overloads).

Let me add a comment here.

kozlov-alexey

I'm OK with this implementation, but it would be good to clarify what's pandas motivation on having different implementations. Should we maybe open an issue?

densmirn · 2020-01-31T10:16:04Z

I'm OK with this implementation, but it would be good to clarify what's pandas motivation on having different implementations. Should we maybe open an issue?

Maybe but currently we are based on Pandas 0.25. So I think firstly needed to check this on Pandas 1.0 before opening the issue. To be honest I would like to spend time on other useful activity if you do not mind. @kozlov-alexey what do you think about that?

kozlov-alexey · 2020-01-31T11:02:15Z

To be honest I would like to spend time on other useful activity if you do not mind. @kozlov-alexey what do you think about that?

@densmirn I agree of course, we should put this as a low priority.

…ure/df_rolling_cov

Overload df.rolling.cov, add perf.test

bd11839

densmirn requested review from AlexanderKalistratov and kozlov-alexey January 30, 2020 11:32

densmirn added the Waiting on CI label Jan 30, 2020

densmirn added 4 commits January 30, 2020 16:08

Merge branch 'master' of https://github.com/IntelPython/sdc into feat…

b629b58

…ure/df_rolling_cov

Add missing perf.tests for df.rolling

c6c66c2

Fix issue with name of the method in exception msg

ab7bb5d

Merge branch 'master' of https://github.com/IntelPython/sdc into feat…

3b1adb8

…ure/df_rolling_cov

kozlov-alexey reviewed Jan 31, 2020

View reviewed changes

kozlov-alexey approved these changes Jan 31, 2020

View reviewed changes

densmirn added 5 commits January 31, 2020 14:44

Minor fixes for df.rolling.cov

76f5b7f

Merge branch 'master' of https://github.com/IntelPython/sdc into feat…

0149bf7

…ure/df_rolling_cov

Merge branch 'master' of https://github.com/IntelPython/sdc into feat…

9ce0c56

…ure/df_rolling_cov

Merge branch 'master' of https://github.com/IntelPython/sdc into feat…

8660b17

…ure/df_rolling_cov

Merge branch 'master' of https://github.com/IntelPython/sdc into feat…

f33a972

…ure/df_rolling_cov

densmirn added Ready for Review and removed Waiting on CI labels Feb 14, 2020

densmirn merged commit b5bbbaa into IntelPython:master Feb 14, 2020

densmirn deleted the feature/df_rolling_cov branch June 9, 2020 12:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Overload df.rolling.cov, add perf.test #547

Overload df.rolling.cov, add perf.test #547

Uh oh!

densmirn commented Jan 30, 2020

Uh oh!

kozlov-alexey Jan 31, 2020

Uh oh!

densmirn Jan 31, 2020

Uh oh!

densmirn Jan 31, 2020

Uh oh!

kozlov-alexey Jan 31, 2020

Uh oh!

densmirn Jan 31, 2020

Uh oh!

kozlov-alexey left a comment

Uh oh!

densmirn commented Jan 31, 2020

Uh oh!

kozlov-alexey commented Jan 31, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

name	nthreads	type	size	median	min	max	compile	boxing
numpy.array	1.0	SDC	100000000.0	1.0090000629425049	1.002000093460083	1.022000074386597	0.09800267219543456	0.09000849723815918
numpy.array	4.0	SDC	100000000.0	1.2319998741149902	1.0130000114440918	2.2649998664855957	0.0879981517791748	0.09700202941894531
numpy.repeat	1.0	SDC	100000000.0	0.4199998378753662	0.3970000743865967	0.49299979209899897	0.15896368026733398	5.0067901611328125e-06
numpy.repeat	4.0	SDC	100000000.0	0.3919999599456787	0.3859999179840088	0.4030001163482666	0.15500760078430176	1.6689300537109375e-06

Overload df.rolling.cov, add perf.test #547

Overload df.rolling.cov, add perf.test #547

Uh oh!

Conversation

densmirn commented Jan 30, 2020

Uh oh!

kozlov-alexey Jan 31, 2020

Choose a reason for hiding this comment

Uh oh!

densmirn Jan 31, 2020

Choose a reason for hiding this comment

Uh oh!

densmirn Jan 31, 2020

Choose a reason for hiding this comment

Uh oh!

kozlov-alexey Jan 31, 2020

Choose a reason for hiding this comment

Uh oh!

densmirn Jan 31, 2020

Choose a reason for hiding this comment

Uh oh!

kozlov-alexey left a comment

Choose a reason for hiding this comment

Uh oh!

densmirn commented Jan 31, 2020

Uh oh!

kozlov-alexey commented Jan 31, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants