Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Conversation

@kozlov-alexey
Copy link
Contributor

No description provided.

@kozlov-alexey kozlov-alexey requested review from AlexanderKalistratov, Hardcode84, densmirn, fschlimb and shssf and removed request for shssf October 15, 2019 08:58
-----------
self: :obj:`pandas.Series`
input series
axis: axis for the function to be applied on, default None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please specify type of the parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

raise TypingError(
'{} The object must be a pandas.series. Given self: {}'.format(_func_name, self))

if not isinstance(self.dtype, (types.Integer, types.Float)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can consider shortcut types.Number for (types.Integer, types.Float).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

self: :obj:`pandas.Series`
input series
axis: axis for the function to be applied on, default None
*unsupported*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add check of unsupported parameters and raise exception in case of parameter is really unsupported. Moreover you could cover such cases by unit tests.

'''Verifies median implementation with default skipna=True argument on a series with NA values'''
def test_impl(S):
res = S.median()
print(res)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove print if it was used for debugging. I don't think we need that here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

hpat_func = hpat.jit(test_impl)

S = pd.Series([2., 3., 5., np.nan, 5., 6., 7.])
self.assertEqual(hpat_func(S, ), test_impl(S))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: hpat_func(S, ) -> hpat_func(S)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

'''Verifies median implementation with skipna=False on a series with NA values'''
def test_impl(S):
res = S.median(skipna=False)
print(res)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it debuggable print? If it's so please remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

# TODO: both return values are 'nan', but HPAT's is not np.nan, hence checking with
# assertIs() doesn't work - check if it's Numba relatated
S2 = pd.Series([2., 3., 5., np.nan, 5., 6., 7.])
self.assertTrue(np.isnan(hpat_func(S2)) and np.isnan(test_impl(S2)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertTrue(np.isnan(hpat_func(S2)) and np.isnan(test_impl(S2)))
self.assertEqual(np.isnan(hpat_func(S2)), np.isnan(test_impl(S2)))

It will give more info in case of error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

S2 = pd.Series([2., 3., 5., np.nan, 5., 6., 7.])
self.assertTrue(np.isnan(hpat_func(S2)) and np.isnan(test_impl(S2)))

@unittest.skip('HPAT distribution is not working (new-style impl issue)')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because you delete this ability (see above)

'nsmallest_default': lambda A, name: hpat.hiframes.api.init_series(hpat.hiframes.api.nlargest(A, 5, False, lt_f), None, name),
'head': lambda A, I, k, name: hpat.hiframes.api.init_series(A[:k], None, name),
'head_index': lambda A, I, k, name: hpat.hiframes.api.init_series(A[:k], I[:k], name),
'median': lambda A: hpat.hiframes.api.median(A),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not delete this. Comment it out if you need. In this PR this code is needed for parallelism.

Comment on lines 686 to 697
@bound_function("series.median")
def resolve_median(self, ary, args, kws):
assert not kws
dtype = ary.dtype
# median converts integer output to float
dtype = types.float64 if isinstance(dtype, types.Integer) else dtype
return signature(dtype, *args)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not delete this. Comment it out if you need. In this PR this code is needed for parallelism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shssf But that's just typing the same we've got from the overload. Keeping it has no sense - if we don't use old-style implementation from series_kernels.py, i.e. hpat.hiframes.api.median.


if func_name in ('std', 'nunique', 'describe', 'isna',
'isnull', 'median', 'idxmin', 'idxmax', 'unique'):
'isnull', 'idxmin', 'idxmax', 'unique'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually I do following https://github.com/IntelPython/hpat/pull/189/files#diff-f13c21aac60fb8f7ae3c9527239d1e17R990
instead removing this name from this loop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_not_series_array_attrs = ['resolve_median', ...] just avoids replacement of series.median with array.median. As I see we are not able to keep both approaches just by adding 'resolve_median' to the list. Please correct me if I'm wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shssf @densmirn No, if we keep 'median' here we'll use and invoke the old-style implementation (so commenting hpat.hiframes.api.median in series_kernels would cause lowering error). It's not possible to keep both implementations (unless one or the other is selected basing on configured pipeline). But anyway, in the new-style the parallel test is broken, even if I call the old-style hpat.hiframes.api.median function from the overload itself.

raise TypingError(
'{} The object must be a pandas.series. Given self: {}'.format(_func_name, self))

if not isinstance(self.dtype, (types.Integer, types.Float)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not isinstance(self.dtype, (types.Integer, types.Float)):
if not isinstance(self._data.dtype, (types.Integer, types.Float)):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor correctness: self._data.dtype -> self.data.dtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shssf But why is that better (that's less typing and they should always be the same)?

'{} The function only applies to elements that are all numeric. Given data type: {}'.format(_func_name, self.dtype))

def hpat_pandas_series_median_impl(self, axis=None, skipna=True, level=None, numeric_only=None):
if skipna:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no checks for skipna type. What happened if it is None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected by adding checks.

@kozlov-alexey kozlov-alexey force-pushed the feature/refactor_series_median branch 2 times, most recently from bb60683 to ca9915c Compare October 17, 2019 23:44
-----------
self: :obj:`pandas.Series`
input series
axis: {0 or `index`, None}, default None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's specify type of the parameter as :obj:<type>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done .

raise TypingError(
'{} The object must be a pandas.series. Given self: {}'.format(_func_name, self))

if not isinstance(self.dtype, types.Number):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shssf eventually what do you think about extracting dtype from self? self.dtype or self.data.dtype. BTW for me dtype means "data type", data.dtype means "data data dtype". I vote for shorter option.

if not (isinstance(axis, (types.Integer, types.UnicodeType, types.Omitted)) or axis is None):
raise TypingError('{} The axis must be an Integer or a String. Currently unsupported. Given: {}'.format(_func_name, axis))

if not (isinstance(skipna, (types.Boolean, types.Omitted)) or skipna == True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you exactly need to check skipna == True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, without it the check will fail, because the check is passed several times during typing. So if skipna argument is omitted during one pass skipna will have types.Omitted and during the other pass it will have type(skipna)=bool, hence the second part of the check is needed to work properly and not raise exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe additional check on Python bool can help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@densmirn I think this is a bug in Numba. Without second check it will not work.

return self._replace_func(func, [data], pre_nodes=nodes)

if func_name == 'median':
return [assign]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will achieve the line never, because 'median' is processed above. Moreover if func_name doesn't match any condition in this function return [assign] is by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@densmirn Yes, it was needed only when 'median' was removed from the tuple above.
It'll be removed in the next commit.

raise TypingError(
'{} The object must be a pandas.series. Given self: {}'.format(_func_name, self))

if not isinstance(self.dtype, types.Number):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not isinstance(self.dtype, types.Number):
if not isinstance(self.data.dtype, types.Number):

@shssf shssf merged commit a03a273 into IntelPython:master Oct 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants