Implement idxmin #216

1e-to · 2019-10-11T16:19:22Z

No description provided.

3 tests doesnt work until fix index

shssf · 2019-10-15T18:39:23Z

hpat/datatypes/hpat_pandas_series_functions.py

@@ -1096,6 +1096,57 @@ def hpat_pandas_series_ge_impl(self, other):
    raise TypingError('{} The object must be a pandas.series and argument must be a number. Given: {} and other: {}'.format(_func_name, self, other))


+@overload_method(SeriesType, 'idxmin')
+def hpat_pandas_series_idxmin(self, axis=None, skipna=True):


@1e-to I see in documentation the interface is a bit diffrent. It looks like self, axis=0, skipna=True, *args The last argument (*args)is not used but it might be good to put it in the function parameters

shssf · 2019-10-15T18:42:17Z

hpat/datatypes/hpat_pandas_series_functions.py

+
+    Returns
+    -------
+    :obj:`pandas.Series` or :obj:`int` or :obj:`float`


It looks like description is wrong. The method returns label of selected index. I'm not sure but it looks like it returns a value (corresponded type) of the index.

shssf · 2019-10-15T18:42:41Z

hpat/datatypes/hpat_pandas_series_functions.py

+    _func_name = 'Method idxmin().'
+
+    if not isinstance(self, SeriesType):
+        raise TypingError('{} The object must be a pandas.series. Given self: {}'.format(_func_name, self))


Given self -> Given

shssf · 2019-10-15T18:44:53Z

hpat/datatypes/hpat_pandas_series_functions.py

+    if not isinstance(self, SeriesType):
+        raise TypingError('{} The object must be a pandas.series. Given self: {}'.format(_func_name, self))
+
+    if not isinstance(self.dtype, types.Number):


Do we really focused on numeric data in series? No strings array supported?
Also, I would vote to avoid self.dtype using because I see it ambiguous. I think it is better to use self._data.dtype instead.

I would like to correct a little bit: self._data.dtype -> self.data.dtype. self has not attribute _data on typing level.

shssf · 2019-10-15T18:45:36Z

hpat/hiframes/pd_series_ext.py

@@ -241,6 +241,7 @@ def __init__(self, dmm, fe_type):
 make_attribute_wrapper(SeriesType, 'name', '_name')


+from hpat.datatypes.hpat_pandas_series_functions import *


this line should not be moved here

shssf · 2019-10-15T18:48:22Z

hpat/hiframes/hiframes_typed.py

@@ -866,7 +866,7 @@ def _run_call_series(self, assign, lhs, rhs, series_var, func_name):
            return self._replace_func(func, [data], pre_nodes=nodes)

        if func_name in ('std', 'nunique', 'describe', 'isna',
-                         'isnull', 'median', 'idxmin', 'idxmax', 'unique'):
+                         'isnull', 'median', 'idxmax', 'unique'):


I think it will not work (didn't check). if parallel tests are not working, please leave this line as in original, and add resolve_idxmin value to the array as here https://github.com/IntelPython/hpat/pull/191/files#diff-f13c21aac60fb8f7ae3c9527239d1e17R990

Initially, there were no parallel tests for this function, should I write them?
resolve_idxmin need to dont overlap arrays methods, but idxmin for array does not exist

No, let's wait test system results

shssf · 2019-10-15T18:48:52Z

hpat/hiframes/series_kernels.py

@@ -530,6 +530,6 @@ def gt_f(a, b):
    'head_index': lambda A, I, k, name: hpat.hiframes.api.init_series(A[:k], I[:k], name),
    'median': lambda A: hpat.hiframes.api.median(A),
    # TODO: handle NAs in argmin/argmax
-    'idxmin': lambda A: A.argmin(),
+    # 'idxmin': lambda A: A.argmin(),


I think you need to leave it as in original

shssf · 2019-10-15T18:50:04Z

hpat/tests/test_series.py

+            return S.idxmin()
+        hpat_func = hpat.jit(test_impl)
+
+        S = pd.Series([1, 2, 3], [4, 45, 14])


please see this comment #217 (comment)

Add 1 test and fix another

-add args -update docs -change check self._data.dtype -add test for all

kozlov-alexey · 2019-10-16T13:13:11Z

hpat/tests/test_series.py

+        hpat_func = hpat.jit(test_impl)
+
+        S = pd.Series([1, 2, 3], [4, 45, 14])
+        print(hpat_func(S))


Do we need this in test?

kozlov-alexey · 2019-10-16T13:28:46Z

hpat/tests/test_series.py

+
+    @unittest.skip("Need index fix")
+    def test_series_idxmin(self):
+        def test_series_idxmin_impl(S):


I think it's better to stick to the common name for the jitted func - test_impl.

kozlov-alexey · 2019-10-16T13:32:14Z

hpat/tests/test_series.py

+
+        test_input_data = data_simple + data_extra
+
+        for input_data in data_simple:


Are the first two loops for testing default index in Series? Then they should make one separate test. You will have another compilation of test_func on the last two loops anyway, so there'll be no benefit to put this all in one test.

kozlov-alexey · 2019-10-16T13:37:31Z

hpat/tests/test_series.py

+                       [6, 6.1, 2.2, 1, 3, 3, 2.2, 1, 2],
+                       ]
+
+        data_extra = [[np.nan, np.nan, np.nan, np.nan],


I think it's better to have one 'series_data' list (instead of data_simple + data_extra) with it's elements covering all possible situations (i.e. containing numbers, NaNs or both, in different combinations). There's no obvious benefit of using loops in unittests - better identify and cover different cases yourself.

kozlov-alexey · 2019-10-16T13:38:49Z

hpat/tests/test_series.py

+
+        for input_data in data_simple:
+            for index_data in data_simple:
+                S = pd.Series(input_data, index_data)


Why do we need to test index_data at all? Does our implementation depend on the indexes?

densmirn · 2019-10-16T15:26:25Z

hpat/datatypes/hpat_pandas_series_functions.py

+        Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_idxmin1
+        Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_idxmin_str
+        Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_idxmin_int
+        Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_idxmin_no


Please update the list of tests in docstring according to the real list.

densmirn · 2019-10-16T15:26:52Z

hpat/datatypes/hpat_pandas_series_functions.py

+
+    if not isinstance(self.data.dtype, types.Number):
+        raise TypingError(
+            '{} Currently function supports only numeric values. Given data type: {}'.format(_func_name, self.dtype))


self.dtype -> self.data.dtype

densmirn · 2019-10-16T15:28:16Z

hpat/datatypes/hpat_pandas_series_functions.py

+
+    if not isinstance(skipna, (types.Omitted, types.Boolean, bool)):
+        raise TypingError(
+            '{} The parameter must be a boolean type. Given type skipna: {}'.format(_func_name, type(skipna)))


type(skipna) -> skipna

densmirn · 2019-10-16T15:29:27Z

hpat/datatypes/hpat_pandas_series_functions.py

+
+            return numpy.argmin(self._data)
+
+        return hpat_pandas_series_idxmin_impl


You could do return once outside of if-else statement.

densmirn · 2019-10-16T15:30:51Z

hpat/tests/test_series.py

+                       [1.1, 0.3, 2.1, 1, 3, 0.3, 2.1, 1.1, 2.2],
+                       [6, 6.1, 2.2, 1, 3, 0, 2.2, 1, 2],
+                       [6, 6, 2, 1, 3, np.inf, np.nan, np.nan, np.nan],
+                       [3., 5.3, np.nan, np.nan, np.inf, np.inf, 4.4, 3.7, 8.9]
+                       ]


Please fix indentation.

densmirn · 2019-10-16T15:31:48Z

hpat/tests/test_series.py

+
+        hpat_func = hpat.jit(test_impl)
+
+        test_input_data = []


Seems the list is used nowhere.

densmirn · 2019-10-16T15:31:57Z

hpat/tests/test_series.py

+
+        hpat_func = hpat.jit(test_impl)
+
+        test_input_data = []


Seems the list is used nowhere.

shssf

Index fixed. Please review tests again - it looks like they have NaN != NaN issues

etotmeni added 3 commits October 9, 2019 16:37

wip

0693c7b

wip

3ffdf2a

Implement idxmin

9a35161

3 tests doesnt work until fix index

1e-to requested review from shssf, densmirn, akharche and kozlov-alexey October 11, 2019 16:19

shssf suggested changes Oct 15, 2019

View reviewed changes

This was referenced Oct 15, 2019

Implement series.cumsum() in new style #192

Merged

Implement series.var() in new style #220

Merged

Add some fixes:

07c0725

-add args -update docs -change check self._data.dtype -add test for all

kozlov-alexey reviewed Oct 16, 2019

View reviewed changes

Fixed tests

957f2d3

kozlov-alexey reviewed Oct 16, 2019

View reviewed changes

Redesign tests

883da23

densmirn reviewed Oct 16, 2019

View reviewed changes

etotmeni and others added 3 commits October 16, 2019 19:09

Fix docs and parameters

a8ccdb9

Add comment to not working test (with index)

5b01b92

Merge branch 'master' into implement_idxmin

f2cd21d

shssf approved these changes Oct 18, 2019

View reviewed changes

Merge branch 'master' into implement_idxmin

02b6fa5

shssf suggested changes Oct 18, 2019

View reviewed changes

etotmeni and others added 3 commits October 18, 2019 15:51

Fix test's check for nan

c8f6d1b

PR216. Minor style changes. skipna not implemented

3d0d810

Merge branch 'master' into implement_idxmin

5340737

shssf approved these changes Oct 20, 2019

View reviewed changes

shssf merged commit 584d9f3 into IntelPython:master Oct 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement idxmin #216

Implement idxmin #216

1e-to commented Oct 11, 2019

shssf Oct 15, 2019

1e-to Oct 16, 2019

shssf Oct 15, 2019

1e-to Oct 16, 2019

shssf Oct 15, 2019

1e-to Oct 16, 2019

shssf Oct 15, 2019

densmirn Oct 16, 2019

1e-to Oct 16, 2019

shssf Oct 15, 2019

1e-to Oct 16, 2019

shssf Oct 15, 2019

1e-to Oct 16, 2019

shssf Oct 16, 2019

shssf Oct 15, 2019

shssf Oct 15, 2019

1e-to Oct 16, 2019

kozlov-alexey Oct 16, 2019

1e-to Oct 16, 2019

kozlov-alexey Oct 16, 2019

kozlov-alexey Oct 16, 2019

kozlov-alexey Oct 16, 2019

kozlov-alexey Oct 16, 2019

densmirn Oct 16, 2019

densmirn Oct 16, 2019

densmirn Oct 16, 2019

densmirn Oct 16, 2019

densmirn Oct 16, 2019

densmirn Oct 16, 2019

densmirn Oct 16, 2019

shssf left a comment

		@@ -241,6 +241,7 @@ def __init__(self, dmm, fe_type):
		make_attribute_wrapper(SeriesType, 'name', '_name')


		from hpat.datatypes.hpat_pandas_series_functions import *


		test_input_data = data_simple + data_extra

		for input_data in data_simple:


		return numpy.argmin(self._data)

		return hpat_pandas_series_idxmin_impl

Implement idxmin #216

Implement idxmin #216

Conversation

1e-to commented Oct 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shssf left a comment

Choose a reason for hiding this comment