Skip to content
This repository has been archived by the owner on Feb 2, 2024. It is now read-only.

Implement idxmin #216

Merged
merged 13 commits into from
Oct 20, 2019
Merged

Implement idxmin #216

merged 13 commits into from
Oct 20, 2019

Conversation

1e-to
Copy link
Contributor

@1e-to 1e-to commented Oct 11, 2019

No description provided.

etotmeni added 3 commits October 9, 2019 16:37
3 tests doesnt work until fix index
@@ -1096,6 +1096,57 @@ def hpat_pandas_series_ge_impl(self, other):
raise TypingError('{} The object must be a pandas.series and argument must be a number. Given: {} and other: {}'.format(_func_name, self, other))


@overload_method(SeriesType, 'idxmin')
def hpat_pandas_series_idxmin(self, axis=None, skipna=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1e-to I see in documentation the interface is a bit diffrent. It looks like self, axis=0, skipna=True, *args The last argument (*args)is not used but it might be good to put it in the function parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


Returns
-------
:obj:`pandas.Series` or :obj:`int` or :obj:`float`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like description is wrong. The method returns label of selected index. I'm not sure but it looks like it returns a value (corresponded type) of the index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

_func_name = 'Method idxmin().'

if not isinstance(self, SeriesType):
raise TypingError('{} The object must be a pandas.series. Given self: {}'.format(_func_name, self))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given self -> Given

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

if not isinstance(self, SeriesType):
raise TypingError('{} The object must be a pandas.series. Given self: {}'.format(_func_name, self))

if not isinstance(self.dtype, types.Number):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really focused on numeric data in series? No strings array supported?
Also, I would vote to avoid self.dtype using because I see it ambiguous. I think it is better to use self._data.dtype instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to correct a little bit: self._data.dtype -> self.data.dtype. self has not attribute _data on typing level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -241,6 +241,7 @@ def __init__(self, dmm, fe_type):
make_attribute_wrapper(SeriesType, 'name', '_name')


from hpat.datatypes.hpat_pandas_series_functions import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line should not be moved here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -866,7 +866,7 @@ def _run_call_series(self, assign, lhs, rhs, series_var, func_name):
return self._replace_func(func, [data], pre_nodes=nodes)

if func_name in ('std', 'nunique', 'describe', 'isna',
'isnull', 'median', 'idxmin', 'idxmax', 'unique'):
'isnull', 'median', 'idxmax', 'unique'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will not work (didn't check). if parallel tests are not working, please leave this line as in original, and add resolve_idxmin value to the array as here https://github.com/IntelPython/hpat/pull/191/files#diff-f13c21aac60fb8f7ae3c9527239d1e17R990

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, there were no parallel tests for this function, should I write them?
resolve_idxmin need to dont overlap arrays methods, but idxmin for array does not exist

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, let's wait test system results

@@ -530,6 +530,6 @@ def gt_f(a, b):
'head_index': lambda A, I, k, name: hpat.hiframes.api.init_series(A[:k], I[:k], name),
'median': lambda A: hpat.hiframes.api.median(A),
# TODO: handle NAs in argmin/argmax
'idxmin': lambda A: A.argmin(),
# 'idxmin': lambda A: A.argmin(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to leave it as in original

return S.idxmin()
hpat_func = hpat.jit(test_impl)

S = pd.Series([1, 2, 3], [4, 45, 14])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see this comment #217 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add 1 test and fix another

-add args
-update docs
-change check self._data.dtype
-add test for all
hpat_func = hpat.jit(test_impl)

S = pd.Series([1, 2, 3], [4, 45, 14])
print(hpat_func(S))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this in test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


@unittest.skip("Need index fix")
def test_series_idxmin(self):
def test_series_idxmin_impl(S):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to stick to the common name for the jitted func - test_impl.


test_input_data = data_simple + data_extra

for input_data in data_simple:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the first two loops for testing default index in Series? Then they should make one separate test. You will have another compilation of test_func on the last two loops anyway, so there'll be no benefit to put this all in one test.

[6, 6.1, 2.2, 1, 3, 3, 2.2, 1, 2],
]

data_extra = [[np.nan, np.nan, np.nan, np.nan],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to have one 'series_data' list (instead of data_simple + data_extra) with it's elements covering all possible situations (i.e. containing numbers, NaNs or both, in different combinations). There's no obvious benefit of using loops in unittests - better identify and cover different cases yourself.


for input_data in data_simple:
for index_data in data_simple:
S = pd.Series(input_data, index_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to test index_data at all? Does our implementation depend on the indexes?

Comment on lines 1106 to 1109
Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_idxmin1
Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_idxmin_str
Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_idxmin_int
Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_idxmin_no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the list of tests in docstring according to the real list.


if not isinstance(self.data.dtype, types.Number):
raise TypingError(
'{} Currently function supports only numeric values. Given data type: {}'.format(_func_name, self.dtype))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.dtype -> self.data.dtype


if not isinstance(skipna, (types.Omitted, types.Boolean, bool)):
raise TypingError(
'{} The parameter must be a boolean type. Given type skipna: {}'.format(_func_name, type(skipna)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type(skipna) -> skipna


return numpy.argmin(self._data)

return hpat_pandas_series_idxmin_impl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do return once outside of if-else statement.

Comment on lines 1602 to 1606
[1.1, 0.3, 2.1, 1, 3, 0.3, 2.1, 1.1, 2.2],
[6, 6.1, 2.2, 1, 3, 0, 2.2, 1, 2],
[6, 6, 2, 1, 3, np.inf, np.nan, np.nan, np.nan],
[3., 5.3, np.nan, np.nan, np.inf, np.inf, 4.4, 3.7, 8.9]
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix indentation.


hpat_func = hpat.jit(test_impl)

test_input_data = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the list is used nowhere.


hpat_func = hpat.jit(test_impl)

test_input_data = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the list is used nowhere.

Copy link
Contributor

@shssf shssf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index fixed. Please review tests again - it looks like they have NaN != NaN issues

@shssf shssf merged commit 584d9f3 into IntelPython:master Oct 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants