add reduce #366

Rubtsowa · 2019-12-05T09:48:22Z

No description provided.

pep8speaks · 2019-12-05T09:48:26Z

Hello @Rubtsowa! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-12-27 10:54:22 UTC

densmirn · 2019-12-05T09:59:04Z

sdc/datatypes/hpat_pandas_dataframe_functions.py

-from sdc.datatypes.hpat_pandas_dataframe_types import DataFrameType
-
-
-@overload_method(DataFrameType, 'count')


Why did you remove the overload?

Looks like this can be done in separate PR.

This overload_method count is not used. It is ok to remove it. It looks like implementation of sdc_pandas_dataframe_reduce_columns function is preparation for other PR #363
not sure in it.

kozlov-alexey · 2019-12-05T23:23:15Z

sdc/datatypes/hpat_pandas_dataframe_functions.py

+    loc_vars = {}
+    func_text = '\n'.join(func_lines)
+
+    exec(func_text, {'hpat': sdc, 'np': numpy}, loc_vars)


@Rubtsowa I think we should stick to the new name:

Suggested change

exec(func_text, {'hpat': sdc, 'np': numpy}, loc_vars)

exec(func_text, {'sdc': sdc, 'np': numpy}, loc_vars)

and use sdc.hiframes instead hpat.hiframes in the generated function text too.

kozlov-alexey · 2019-12-05T23:31:47Z

sdc/datatypes/hpat_pandas_dataframe_functions.py

+    for i, d in enumerate(data_args):
+        line = '  {} = hpat.hiframes.api.init_series(hpat.hiframes.pd_dataframe_ext.get_dataframe_data(df, {}))'
+        func_lines.append(line.format(d + '_S', i))
+        func_lines.append('  {} = {}.{}()'.format(d + '_O', d + '_S', name))


Why do we call the method of series without any arguments?
You have to actually forward the arguments passed to DF call into the Series method. And also take care about how they actually map to each other (e.g. for DF.mean and Series.mean arguments are the same, but will it be so for all functions?)

Though arguments of df.mean and series.mean are the same, it can't be forwarded straight forward.

axis for series supports only index (0)

numeric_only is not support by series at all, and shouldn't be forwarded

level at the moment we are not storing series in data frame. So there are no hierarchical index and level parameter doesn't make any sense for underlying series, but still there is sense for dataframe. on the other hand at the moment hierarchical index is not support for dataframe either.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mean.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.mean.html

kozlov-alexey · 2019-12-05T23:35:34Z

sdc/datatypes/hpat_pandas_dataframe_functions.py

+    n_cols = len(saved_columns)
+    data_args = tuple('data{}'.format(i) for i in range(n_cols))
+    all_params = ['df']
+    for key, value in params:


A blank line, e.g. before a block that generates func text, wouldn't be that bad.
Also n_cols is used only once and probably can be removed at all.

kozlov-alexey

Please apply comments and fix forwarding of arguments into the inner Series call.

kozlov-alexey · 2019-12-16T11:48:11Z

sdc/datatypes/hpat_pandas_dataframe_functions.py

+    func_definition = 'def _reduce_impl({}):'.format(', '.join(all_params))
+    func_lines = [func_definition]
+    for i, d in enumerate(data_args):
+        line = '  {} = sdc.hiframes.api.init_series(sdc.hiframes.pd_dataframe_ext.get_dataframe_data(df, {}))'


Suggested change

line = ' {} = sdc.hiframes.api.init_series(sdc.hiframes.pd_dataframe_ext.get_dataframe_data(df, {}))'

line = ' {} = sdc.hiframes.api.init_series(sdc.hiframes.pd_dataframe_ext.get_dataframe_data(all_params[0], {}))'

densmirn · 2019-12-16T12:56:10Z

sdc/datatypes/hpat_pandas_dataframe_functions.py


-    if not (isinstance(axis, types.Omitted) or axis == 0):
-        raise TypingError("{} 'axis' unsupported. Given: {}".format(_func_name, axis))
+def sdc_pandas_dataframe_reduce_columns(df, name, params_s, params_df):


Looks like params_df is not used at all.

densmirn · 2019-12-16T13:00:43Z

sdc/datatypes/hpat_pandas_dataframe_functions.py

+    for key, value in params_s:
+        all_params.append('{}={}'.format(key, value))
+    ap = all_params.copy()
+    ap.pop(0)


I vote for avoiding of using pop? Let's separate all parameters and the last one initially.

densmirn · 2019-12-16T13:02:28Z

sdc/datatypes/hpat_pandas_dataframe_functions.py

+    for i, d in enumerate(data_args):
+        line = '  {} = sdc.hiframes.api.init_series(sdc.hiframes.pd_dataframe_ext.get_dataframe_data(df, {}))'
+        func_lines.append(line.format(d + '_S', i))
+        func_lines.append('  {} = {}.{}({})'.format(d + '_O', d + '_S', name, par))


func_lines.append(' {} = {}.{}({})'.format(d + '_O', d + '_S', name, par)) -> func_lines.append(' {}_O = {}_S.{}({})'.format(d, d, name, par))

kozlov-alexey · 2019-12-16T13:31:39Z

sdc/datatypes/hpat_pandas_dataframe_functions.py


-    if not (isinstance(numeric_only, types.Omitted) or numeric_only is False):
-        raise TypingError("{} 'numeric_only' unsupported. Given: {}".format(_func_name, axis))
+    for key, value in params_s:


Why we use params_s (is it parameters of series function?) to to create df function signature?

…reduce

kozlov-alexey · 2019-12-18T14:25:33Z

sdc/datatypes/hpat_pandas_dataframe_functions.py


-    if not (isinstance(axis, types.Omitted) or axis == 0):
-        raise TypingError("{} 'axis' unsupported. Given: {}".format(_func_name, axis))
+def sdc_pandas_dataframe_reduce_columns(df, name, params):


Suggested change

def sdc_pandas_dataframe_reduce_columns(df, name, params):

def sdc_pandas_dataframe_reduce_columns(df, name, series_call_params):

kozlov-alexey · 2019-12-18T14:40:37Z

sdc/datatypes/hpat_pandas_dataframe_functions.py

-        raise TypingError("{} 'numeric_only' unsupported. Given: {}".format(_func_name, axis))
+    for key, value in params:
+        all_params.append('{}={}'.format(key, value))
+    ap = all_params.copy()


I'm not sure the copy of all_params is needed. Also it would be good to have a comment here, to indicate that it relies on the fact we use series params to generate signature of DF method, i.e. something like:

# This relies on parameters part of the signature of Series method called below being the same # as for the corresponding DataFrame method series_call_params_str = '{}'.format(', '.join(all_params[1:])) func_definition = 'def _reduce_impl({}):'.format(', '.join(all_params))

AlexanderKalistratov · 2019-12-18T22:16:47Z

sdc/datatypes/hpat_pandas_dataframe_functions.py

+    for key, value in params:
+        all_params.append('{}={}'.format(key, value))
+    ap = all_params.copy()
+    par = '{}'.format(', '.join(ap[1:]))


That's not valid. Params which passed to dataframe method and params which pass to series methods are different params. You shouldn't pass axis or numeric_only params to series.

Also currently you always passing params with default values to series!

AlexanderKalistratov · 2019-12-18T22:19:42Z

sdc/datatypes/hpat_pandas_dataframe_functions.py

+        all_params.append('{}={}'.format(key, value))
+    ap = all_params.copy()
+    par = '{}'.format(', '.join(ap[1:]))
+    func_definition = 'def _reduce_impl({}):'.format(', '.join(all_params))


Please use fstring here and everywhere:

func_definition = f'def _reduce_impl({', '.join(all_params)}):'

Or, better:

func_params = ', '.join(all_params) func_definition = f'def _reduce_impl({func_params)}):'

sdc/datatypes/hpat_pandas_dataframe_functions.py

…reduce

kozlov-alexey · 2019-12-26T14:07:52Z

sdc/__init__.py

    Overload Numba function to allow call SDC pass in Numba compiler pipeline
    Functions are:
    - Numba DefaultPassBuilder define_nopython_pipeline()



I believe since @AlexanderKalistratov PR with rewrites this should be commented out. Please pay attention while rebasing.

kozlov-alexey · 2019-12-26T14:11:59Z

sdc/datatypes/hpat_pandas_dataframe_functions.py


-    saved_columns = df.columns
-    data_args = tuple('data{}'.format(i) for i in range(len(saved_columns)))
+def _dataframe_reduce_columns_codegen(func_name, func_params, series_params, columns):


Can you please add a multi-line comment at the top of this function (not a docstring) with a short example of how func_text will look like (for example for mean and a DF with 1 column)?

I'm still seeing here docstring instead of comment

kozlov-alexey

LGTM

…reduce

AlexanderKalistratov · 2019-12-26T18:04:18Z

@Rubtsowa please fix style issues:

Hello @Rubtsowa! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file sdc/__init__.py:

Line 63:121: E501 line too long (122 > 120 characters)
Line 64:121: E501 line too long (143 > 120 characters)

In the file sdc/datatypes/hpat_pandas_dataframe_functions.py:

Line 57:1: E302 expected 2 blank lines, found 0

In the file sdc/hiframes/pd_dataframe_ext.py:

Line 1633:1: E402 module level import not at top of file

Comment last updated at 2019-12-26 14:32:57 UTC

AlexanderKalistratov · 2019-12-26T18:06:08Z

sdc/tests/test_dataframe.py

        pd.testing.assert_series_equal(hpat_func(n), test_impl(n))

-    @skip_numba_jit
+    @skip_sdc_jit('SDC pipeline does not support arguments for Series.count()')


Doesn't feel right that count stops to work with sdc pipeline.

AlexanderKalistratov · 2019-12-26T18:20:12Z

sdc/datatypes/hpat_pandas_dataframe_functions.py


-    saved_columns = df.columns
-    data_args = tuple('data{}'.format(i) for i in range(len(saved_columns)))
+def _dataframe_reduce_columns_codegen(func_name, func_params, series_params, columns):


I'm still seeing here docstring instead of comment

AlexanderKalistratov · 2019-12-26T18:22:06Z

sdc/hiframes/pd_dataframe_ext.py

    return _impl
+
+
+from sdc.datatypes.hpat_pandas_dataframe_functions import *


It shouldn't be here

…reduce

PokhodenkoSA · 2019-12-27T10:00:36Z

sdc/__init__.py

+    # numba.compiler.DefaultPassBuilder.define_nopython_pipeline
    # numba.compiler.DefaultPassBuilder.define_nopython_pipeline = \
-    #     sdc.datatypes.hpat_pandas_dataframe_pass.sdc_nopython_pipeline_lite_register
+    # sdc.datatypes.hpat_pandas_dataframe_pass.sdc_nopython_pipeline_lite_register
+


Revert this changes.

PokhodenkoSA · 2019-12-27T10:01:04Z

sdc/datatypes/hpat_pandas_dataframe_functions.py


 from numba import types
 from numba.extending import (overload, overload_method, overload_attribute)
+from sdc.hiframes.pd_dataframe_ext import DataFrameType


Move imports from sdc after imports from numba.

PokhodenkoSA · 2019-12-27T10:04:36Z

sdc/datatypes/hpat_pandas_dataframe_functions.py

    """
    Pandas DataFrame method :meth:`pandas.DataFrame.count` implementation.

    .. only:: developer

-        Test: python -m sdc.runtests sdc.tests.test_dataframe.TestDataFrame.test_count
+    Test: python -m sdc.runtests sdc.tests.test_dataframe.TestDataFrame.test_count
+    Test: python -m sdc.runtests sdc.tests.test_dataframe.TestDataFrame.test_count1

    Parameters
    -----------
    self: :class:`pandas.DataFrame`
-        input arg
+    input arg
    axis:
-        *unsupported*
+    *unsupported*
    level:
-        *unsupported*
+    *unsupported*
    numeric_only:
-        *unsupported*
+    *unsupported*

    Returns
    -------
    :obj:`pandas.Series` or `pandas.DataFrame`
-            returns: For each column/row the number of non-NA/null entries. If level is specified returns a DataFrame.
+    for each column/row the number of non-NA/null entries. If level is specified returns a DataFrame.
    """


Documentation is not like in other function. See #455 and docs from other functions.

PokhodenkoSA · 2019-12-27T10:11:01Z

sdc/hiframes/pd_dataframe_ext.py

+if not sdc.config.config_pipeline_hpat_default:
+    from sdc.datatypes.hpat_pandas_dataframe_functions import *


Please, consider another @overload_method(DataFrameType, 'count') in pd_dataframe_ext.py on line ~1564. You should switch from old to new overload not only switch on new one.
@densmirn have we something like list _non_hpat_pipeline_attrs in pd_series_ext.py but for DataFrame?

add reduce

6669842

Rubtsowa requested a review from densmirn December 5, 2019 09:48

Rubtsowa requested a review from kozlov-alexey December 5, 2019 09:51

add reduce

911de85

densmirn reviewed Dec 5, 2019

View reviewed changes

densmirn closed this Dec 5, 2019

densmirn reopened this Dec 5, 2019

kozlov-alexey reviewed Dec 5, 2019

View reviewed changes

kozlov-alexey suggested changes Dec 5, 2019

View reviewed changes

kozlov-alexey added the Waiting on author label Dec 5, 2019

Rubtsowa and others added 3 commits December 6, 2019 18:23

change

573860a

change hpat->sdc, change input parameters

6fe14d5

Merge branch 'master' into df_reduce

5befff5

kozlov-alexey reviewed Dec 16, 2019

View reviewed changes

densmirn reviewed Dec 16, 2019

View reviewed changes

kozlov-alexey reviewed Dec 16, 2019

View reviewed changes

Rubtsowa added 2 commits December 17, 2019 10:03

change

028e048

Merge branch 'df_reduce' of https://github.com/Rubtsowa/hpat into df_…

0120972

…reduce

Rubtsowa added Ready for Review and removed Waiting on author labels Dec 18, 2019

Rubtsowa requested a review from kozlov-alexey December 18, 2019 10:10

kozlov-alexey reviewed Dec 18, 2019

View reviewed changes

add example for check reduce

0369775

AlexanderKalistratov reviewed Dec 18, 2019

View reviewed changes

Rubtsowa added 2 commits December 19, 2019 13:30

division into 2 functions

e080db8

add selection of parameters

e617ce4

Rubtsowa added 12 commits December 20, 2019 14:49

comment string in __init__

1df6dec

import ovetload for DF

864a26c

resolve conflict

0351469

unskip test

b9f35a9

Merge branch 'master' of https://github.com/IntelPython/hpat into df_…

6a944bf

…reduce

correction allocation params

bd439e3

correction default parametrs

9e2f45d

unskiped test, added input parameters for series

702b78a

Merge branch 'master' of https://github.com/IntelPython/hpat into df_…

a13a104

…reduce

delete print, skip with SDC_CONFIG_PIPELINE=1, not work with arguments

2d05dfc

Merge branch 'master' of https://github.com/IntelPython/hpat into df_…

ea0d9a0

…reduce

Merge branch 'master' of https://github.com/IntelPython/hpat into df_…

122f526

…reduce

kozlov-alexey reviewed Dec 26, 2019

View reviewed changes

comment string in __init__

5b26671

kozlov-alexey reviewed Dec 26, 2019

View reviewed changes

kozlov-alexey approved these changes Dec 26, 2019

View reviewed changes

Rubtsowa added 2 commits December 26, 2019 17:32

commented function

e39f58a

Merge branch 'master' of https://github.com/IntelPython/hpat into df_…

c2a108b

…reduce

AlexanderKalistratov suggested changes Dec 26, 2019

View reviewed changes

Rubtsowa added 5 commits December 27, 2019 10:02

Merge branch 'master' of https://github.com/IntelPython/hpat into df_…

367d882

…reduce

fixed style issues

ee64830

change

d4215cf

Merge branch 'master' of https://github.com/IntelPython/hpat into df_…

6a0de00

…reduce

Merge branch 'master' of https://github.com/IntelPython/hpat into df_…

8ff2214

…reduce

Rubtsowa added the Coverage decreased label Dec 27, 2019

AlexanderKalistratov approved these changes Dec 27, 2019

View reviewed changes

AlexanderKalistratov merged commit 923b23a into IntelPython:master Dec 27, 2019

PokhodenkoSA reviewed Dec 30, 2019

View reviewed changes

Rubtsowa deleted the df_reduce branch April 7, 2020 07:05

		from sdc.datatypes.hpat_pandas_dataframe_types import DataFrameType


		@overload_method(DataFrameType, 'count')

	exec(func_text, {'hpat': sdc, 'np': numpy}, loc_vars)
	exec(func_text, {'sdc': sdc, 'np': numpy}, loc_vars)

	line = ' {} = sdc.hiframes.api.init_series(sdc.hiframes.pd_dataframe_ext.get_dataframe_data(df, {}))'
	line = ' {} = sdc.hiframes.api.init_series(sdc.hiframes.pd_dataframe_ext.get_dataframe_data(all_params[0], {}))'

	def sdc_pandas_dataframe_reduce_columns(df, name, params):
	def sdc_pandas_dataframe_reduce_columns(df, name, series_call_params):

		return _impl


		from sdc.datatypes.hpat_pandas_dataframe_functions import *

		if not sdc.config.config_pipeline_hpat_default:
		from sdc.datatypes.hpat_pandas_dataframe_functions import *

add reduce #366

add reduce #366

Uh oh!

Conversation

Rubtsowa commented Dec 5, 2019

Uh oh!

pep8speaks commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2019-12-27 10:54:22 UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kozlov-alexey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kozlov-alexey Dec 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kozlov-alexey left a comment

Choose a reason for hiding this comment

Uh oh!

AlexanderKalistratov commented Dec 26, 2019

Comment last updated at 2019-12-26 14:32:57 UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

pep8speaks commented Dec 5, 2019 •

edited

Loading

kozlov-alexey Dec 18, 2019 •

edited

Loading