-
Couldn't load subscription status.
- Fork 62
Refactor Series.median() in a new style via np.median #228
Refactor Series.median() in a new style via np.median #228
Conversation
Merge from public repo
Merge from public repo
Merge from public repo
| ----------- | ||
| self: :obj:`pandas.Series` | ||
| input series | ||
| axis: axis for the function to be applied on, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please specify type of the parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| raise TypingError( | ||
| '{} The object must be a pandas.series. Given self: {}'.format(_func_name, self)) | ||
|
|
||
| if not isinstance(self.dtype, (types.Integer, types.Float)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can consider shortcut types.Number for (types.Integer, types.Float).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
| self: :obj:`pandas.Series` | ||
| input series | ||
| axis: axis for the function to be applied on, default None | ||
| *unsupported* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add check of unsupported parameters and raise exception in case of parameter is really unsupported. Moreover you could cover such cases by unit tests.
hpat/tests/test_series.py
Outdated
| '''Verifies median implementation with default skipna=True argument on a series with NA values''' | ||
| def test_impl(S): | ||
| res = S.median() | ||
| print(res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove print if it was used for debugging. I don't think we need that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
hpat/tests/test_series.py
Outdated
| hpat_func = hpat.jit(test_impl) | ||
|
|
||
| S = pd.Series([2., 3., 5., np.nan, 5., 6., 7.]) | ||
| self.assertEqual(hpat_func(S, ), test_impl(S)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: hpat_func(S, ) -> hpat_func(S)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected.
hpat/tests/test_series.py
Outdated
| '''Verifies median implementation with skipna=False on a series with NA values''' | ||
| def test_impl(S): | ||
| res = S.median(skipna=False) | ||
| print(res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it debuggable print? If it's so please remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
hpat/tests/test_series.py
Outdated
| # TODO: both return values are 'nan', but HPAT's is not np.nan, hence checking with | ||
| # assertIs() doesn't work - check if it's Numba relatated | ||
| S2 = pd.Series([2., 3., 5., np.nan, 5., 6., 7.]) | ||
| self.assertTrue(np.isnan(hpat_func(S2)) and np.isnan(test_impl(S2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self.assertTrue(np.isnan(hpat_func(S2)) and np.isnan(test_impl(S2))) | |
| self.assertEqual(np.isnan(hpat_func(S2)), np.isnan(test_impl(S2))) |
It will give more info in case of error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
hpat/tests/test_series.py
Outdated
| S2 = pd.Series([2., 3., 5., np.nan, 5., 6., 7.]) | ||
| self.assertTrue(np.isnan(hpat_func(S2)) and np.isnan(test_impl(S2))) | ||
|
|
||
| @unittest.skip('HPAT distribution is not working (new-style impl issue)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because you delete this ability (see above)
| 'nsmallest_default': lambda A, name: hpat.hiframes.api.init_series(hpat.hiframes.api.nlargest(A, 5, False, lt_f), None, name), | ||
| 'head': lambda A, I, k, name: hpat.hiframes.api.init_series(A[:k], None, name), | ||
| 'head_index': lambda A, I, k, name: hpat.hiframes.api.init_series(A[:k], I[:k], name), | ||
| 'median': lambda A: hpat.hiframes.api.median(A), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not delete this. Comment it out if you need. In this PR this code is needed for parallelism.
| @bound_function("series.median") | ||
| def resolve_median(self, ary, args, kws): | ||
| assert not kws | ||
| dtype = ary.dtype | ||
| # median converts integer output to float | ||
| dtype = types.float64 if isinstance(dtype, types.Integer) else dtype | ||
| return signature(dtype, *args) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not delete this. Comment it out if you need. In this PR this code is needed for parallelism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shssf But that's just typing the same we've got from the overload. Keeping it has no sense - if we don't use old-style implementation from series_kernels.py, i.e. hpat.hiframes.api.median.
hpat/hiframes/hiframes_typed.py
Outdated
|
|
||
| if func_name in ('std', 'nunique', 'describe', 'isna', | ||
| 'isnull', 'median', 'idxmin', 'idxmax', 'unique'): | ||
| 'isnull', 'idxmin', 'idxmax', 'unique'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually I do following https://github.com/IntelPython/hpat/pull/189/files#diff-f13c21aac60fb8f7ae3c9527239d1e17R990
instead removing this name from this loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_not_series_array_attrs = ['resolve_median', ...] just avoids replacement of series.median with array.median. As I see we are not able to keep both approaches just by adding 'resolve_median' to the list. Please correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shssf @densmirn No, if we keep 'median' here we'll use and invoke the old-style implementation (so commenting hpat.hiframes.api.median in series_kernels would cause lowering error). It's not possible to keep both implementations (unless one or the other is selected basing on configured pipeline). But anyway, in the new-style the parallel test is broken, even if I call the old-style hpat.hiframes.api.median function from the overload itself.
| raise TypingError( | ||
| '{} The object must be a pandas.series. Given self: {}'.format(_func_name, self)) | ||
|
|
||
| if not isinstance(self.dtype, (types.Integer, types.Float)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if not isinstance(self.dtype, (types.Integer, types.Float)): | |
| if not isinstance(self._data.dtype, (types.Integer, types.Float)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor correctness: self._data.dtype -> self.data.dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shssf But why is that better (that's less typing and they should always be the same)?
| '{} The function only applies to elements that are all numeric. Given data type: {}'.format(_func_name, self.dtype)) | ||
|
|
||
| def hpat_pandas_series_median_impl(self, axis=None, skipna=True, level=None, numeric_only=None): | ||
| if skipna: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no checks for skipna type. What happened if it is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected by adding checks.
bb60683 to
ca9915c
Compare
| ----------- | ||
| self: :obj:`pandas.Series` | ||
| input series | ||
| axis: {0 or `index`, None}, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's specify type of the parameter as :obj:<type>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done .
| raise TypingError( | ||
| '{} The object must be a pandas.series. Given self: {}'.format(_func_name, self)) | ||
|
|
||
| if not isinstance(self.dtype, types.Number): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shssf eventually what do you think about extracting dtype from self? self.dtype or self.data.dtype. BTW for me dtype means "data type", data.dtype means "data data dtype". I vote for shorter option.
| if not (isinstance(axis, (types.Integer, types.UnicodeType, types.Omitted)) or axis is None): | ||
| raise TypingError('{} The axis must be an Integer or a String. Currently unsupported. Given: {}'.format(_func_name, axis)) | ||
|
|
||
| if not (isinstance(skipna, (types.Boolean, types.Omitted)) or skipna == True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you exactly need to check skipna == True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, without it the check will fail, because the check is passed several times during typing. So if skipna argument is omitted during one pass skipna will have types.Omitted and during the other pass it will have type(skipna)=bool, hence the second part of the check is needed to work properly and not raise exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe additional check on Python bool can help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@densmirn I think this is a bug in Numba. Without second check it will not work.
hpat/hiframes/hiframes_typed.py
Outdated
| return self._replace_func(func, [data], pre_nodes=nodes) | ||
|
|
||
| if func_name == 'median': | ||
| return [assign] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will achieve the line never, because 'median' is processed above. Moreover if func_name doesn't match any condition in this function return [assign] is by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@densmirn Yes, it was needed only when 'median' was removed from the tuple above.
It'll be removed in the next commit.
ca9915c to
2484e03
Compare
| raise TypingError( | ||
| '{} The object must be a pandas.series. Given self: {}'.format(_func_name, self)) | ||
|
|
||
| if not isinstance(self.dtype, types.Number): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if not isinstance(self.dtype, types.Number): | |
| if not isinstance(self.data.dtype, types.Number): |
No description provided.