Refactor Series.median() in a new style via np.median #228

kozlov-alexey · 2019-10-15T08:58:54Z

No description provided.

Merge from public repo

densmirn · 2019-10-15T10:07:14Z

hpat/datatypes/hpat_pandas_series_functions.py

+    -----------
+    self: :obj:`pandas.Series`
+          input series
+    axis: axis for the function to be applied on, default None


Please specify type of the parameter.

densmirn · 2019-10-15T10:08:54Z

hpat/datatypes/hpat_pandas_series_functions.py

+        raise TypingError(
+            '{} The object must be a pandas.series. Given self: {}'.format(_func_name, self))
+
+    if not isinstance(self.dtype, (types.Integer, types.Float)):


You can consider shortcut types.Number for (types.Integer, types.Float).

densmirn · 2019-10-15T10:10:54Z

hpat/datatypes/hpat_pandas_series_functions.py

+    self: :obj:`pandas.Series`
+          input series
+    axis: axis for the function to be applied on, default None
+         *unsupported*


Please add check of unsupported parameters and raise exception in case of parameter is really unsupported. Moreover you could cover such cases by unit tests.

densmirn · 2019-10-15T10:12:37Z

hpat/tests/test_series.py

+        '''Verifies median implementation with default skipna=True argument on a series with NA values'''
+        def test_impl(S):
+            res = S.median()
+            print(res)


Please remove print if it was used for debugging. I don't think we need that here.

densmirn · 2019-10-15T10:13:06Z

hpat/tests/test_series.py

+        hpat_func = hpat.jit(test_impl)
+
+        S = pd.Series([2., 3., 5., np.nan, 5., 6., 7.])
+        self.assertEqual(hpat_func(S, ), test_impl(S))


typo: hpat_func(S, ) -> hpat_func(S)

densmirn · 2019-10-15T10:14:32Z

hpat/tests/test_series.py

+        '''Verifies median implementation with skipna=False on a series with NA values'''
+        def test_impl(S):
+            res = S.median(skipna=False)
+            print(res)


Is it debuggable print? If it's so please remove.

PokhodenkoSA · 2019-10-15T11:50:11Z

hpat/tests/test_series.py

+        # TODO: both return values are 'nan', but HPAT's is not np.nan, hence checking with
+        # assertIs() doesn't work - check if it's Numba relatated
+        S2 = pd.Series([2., 3., 5., np.nan, 5., 6., 7.])
+        self.assertTrue(np.isnan(hpat_func(S2)) and np.isnan(test_impl(S2)))


Suggested change

self.assertTrue(np.isnan(hpat_func(S2)) and np.isnan(test_impl(S2)))

self.assertEqual(np.isnan(hpat_func(S2)), np.isnan(test_impl(S2)))

It will give more info in case of error.

shssf · 2019-10-15T19:38:10Z

hpat/tests/test_series.py

+        S2 = pd.Series([2., 3., 5., np.nan, 5., 6., 7.])
+        self.assertTrue(np.isnan(hpat_func(S2)) and np.isnan(test_impl(S2)))
+
+    @unittest.skip('HPAT distribution is not working (new-style impl issue)')


This is because you delete this ability (see above)

shssf · 2019-10-15T19:38:50Z

hpat/hiframes/series_kernels.py

    'nsmallest_default': lambda A, name: hpat.hiframes.api.init_series(hpat.hiframes.api.nlargest(A, 5, False, lt_f), None, name),
    'head': lambda A, I, k, name: hpat.hiframes.api.init_series(A[:k], None, name),
    'head_index': lambda A, I, k, name: hpat.hiframes.api.init_series(A[:k], I[:k], name),
-    'median': lambda A: hpat.hiframes.api.median(A),


do not delete this. Comment it out if you need. In this PR this code is needed for parallelism.

shssf · 2019-10-15T19:39:01Z

hpat/hiframes/pd_series_ext.py

-    @bound_function("series.median")
-    def resolve_median(self, ary, args, kws):
-        assert not kws
-        dtype = ary.dtype
-        # median converts integer output to float
-        dtype = types.float64 if isinstance(dtype, types.Integer) else dtype
-        return signature(dtype, *args)
-


do not delete this. Comment it out if you need. In this PR this code is needed for parallelism.

@shssf But that's just typing the same we've got from the overload. Keeping it has no sense - if we don't use old-style implementation from series_kernels.py, i.e. hpat.hiframes.api.median.

shssf · 2019-10-15T19:42:20Z

hpat/hiframes/hiframes_typed.py


        if func_name in ('std', 'nunique', 'describe', 'isna',
-                         'isnull', 'median', 'idxmin', 'idxmax', 'unique'):
+                         'isnull', 'idxmin', 'idxmax', 'unique'):


usually I do following https://github.com/IntelPython/hpat/pull/189/files#diff-f13c21aac60fb8f7ae3c9527239d1e17R990
instead removing this name from this loop

_not_series_array_attrs = ['resolve_median', ...] just avoids replacement of series.median with array.median. As I see we are not able to keep both approaches just by adding 'resolve_median' to the list. Please correct me if I'm wrong.

@shssf @densmirn No, if we keep 'median' here we'll use and invoke the old-style implementation (so commenting hpat.hiframes.api.median in series_kernels would cause lowering error). It's not possible to keep both implementations (unless one or the other is selected basing on configured pipeline). But anyway, in the new-style the parallel test is broken, even if I call the old-style hpat.hiframes.api.median function from the overload itself.

shssf · 2019-10-15T19:43:20Z

hpat/datatypes/hpat_pandas_series_functions.py

+        raise TypingError(
+            '{} The object must be a pandas.series. Given self: {}'.format(_func_name, self))
+
+    if not isinstance(self.dtype, (types.Integer, types.Float)):


Suggested change

if not isinstance(self.dtype, (types.Integer, types.Float)):

if not isinstance(self._data.dtype, (types.Integer, types.Float)):

Minor correctness: self._data.dtype -> self.data.dtype

@shssf But why is that better (that's less typing and they should always be the same)?

shssf · 2019-10-15T19:44:02Z

hpat/datatypes/hpat_pandas_series_functions.py

+            '{} The function only applies to elements that are all numeric. Given data type: {}'.format(_func_name, self.dtype))
+
+    def hpat_pandas_series_median_impl(self, axis=None, skipna=True, level=None, numeric_only=None):
+        if skipna:


I see no checks for skipna type. What happened if it is None?

Corrected by adding checks.

densmirn · 2019-10-18T05:13:55Z

hpat/datatypes/hpat_pandas_series_functions.py

+    -----------
+    self: :obj:`pandas.Series`
+          input series
+    axis: {0 or `index`, None}, default None


Let's specify type of the parameter as :obj:<type>.

densmirn · 2019-10-18T05:25:54Z

hpat/datatypes/hpat_pandas_series_functions.py

+        raise TypingError(
+            '{} The object must be a pandas.series. Given self: {}'.format(_func_name, self))
+
+    if not isinstance(self.dtype, types.Number):


@shssf eventually what do you think about extracting dtype from self? self.dtype or self.data.dtype. BTW for me dtype means "data type", data.dtype means "data data dtype". I vote for shorter option.

densmirn · 2019-10-18T05:32:59Z

hpat/datatypes/hpat_pandas_series_functions.py

+    if not (isinstance(axis, (types.Integer, types.UnicodeType, types.Omitted)) or axis is None):
+        raise TypingError('{} The axis must be an Integer or a String. Currently unsupported. Given: {}'.format(_func_name, axis))
+
+    if not (isinstance(skipna, (types.Boolean, types.Omitted)) or skipna == True):


Do you exactly need to check skipna == True?

Yes, without it the check will fail, because the check is passed several times during typing. So if skipna argument is omitted during one pass skipna will have types.Omitted and during the other pass it will have type(skipna)=bool, hence the second part of the check is needed to work properly and not raise exception.

Maybe additional check on Python bool can help?

@densmirn I think this is a bug in Numba. Without second check it will not work.

densmirn · 2019-10-18T05:37:44Z

hpat/hiframes/hiframes_typed.py

            return self._replace_func(func, [data], pre_nodes=nodes)

+        if func_name == 'median':
+            return [assign]


You will achieve the line never, because 'median' is processed above. Moreover if func_name doesn't match any condition in this function return [assign] is by default.

@densmirn Yes, it was needed only when 'median' was removed from the tuple above.
It'll be removed in the next commit.

shssf · 2019-10-20T22:26:55Z

hpat/datatypes/hpat_pandas_series_functions.py

+        raise TypingError(
+            '{} The object must be a pandas.series. Given self: {}'.format(_func_name, self))
+
+    if not isinstance(self.dtype, types.Number):


Suggested change

if not isinstance(self.dtype, types.Number):

if not isinstance(self.data.dtype, types.Number):

kozlov-alexey added 3 commits October 8, 2019 13:05

Merge pull request #22 from IntelPython/master

9eb2e92

Merge from public repo

Merge pull request #25 from IntelPython/master

686cc9a

Merge from public repo

Merge pull request #26 from IntelPython/master

f1a48b3

Merge from public repo

kozlov-alexey requested review from AlexanderKalistratov, Hardcode84, densmirn, fschlimb and shssf and removed request for shssf October 15, 2019 08:58

densmirn reviewed Oct 15, 2019

View reviewed changes

PokhodenkoSA reviewed Oct 15, 2019

View reviewed changes

kozlov-alexey requested a review from shssf October 15, 2019 16:13

kozlov-alexey added the [WIP] Work in progress label Oct 15, 2019

shssf suggested changes Oct 15, 2019

View reviewed changes

kozlov-alexey force-pushed the feature/refactor_series_median branch 2 times, most recently from bb60683 to ca9915c Compare October 17, 2019 23:44

densmirn reviewed Oct 18, 2019

View reviewed changes

Refactor Series.median() in a new style via np.median()

2484e03

kozlov-alexey force-pushed the feature/refactor_series_median branch from ca9915c to 2484e03 Compare October 18, 2019 10:14

PokhodenkoSA mentioned this pull request Oct 18, 2019

Implement Series.sum() in new style #203

Merged

shssf reviewed Oct 20, 2019

View reviewed changes

shssf approved these changes Oct 20, 2019

View reviewed changes

Merge branch 'master' into feature/refactor_series_median

a8e2c28

shssf merged commit a03a273 into IntelPython:master Oct 21, 2019

	self.assertTrue(np.isnan(hpat_func(S2)) and np.isnan(test_impl(S2)))
	self.assertEqual(np.isnan(hpat_func(S2)), np.isnan(test_impl(S2)))

	if not isinstance(self.dtype, (types.Integer, types.Float)):
	if not isinstance(self._data.dtype, (types.Integer, types.Float)):

	if not isinstance(self.dtype, types.Number):
	if not isinstance(self.data.dtype, types.Number):

Uh oh!

Refactor Series.median() in a new style via np.median #228

Refactor Series.median() in a new style via np.median #228

Uh oh!

Conversation

kozlov-alexey commented Oct 15, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels