Series.argsort() / Series.sort_values() #248

1e-to · 2019-10-22T15:39:51Z

No description provided.

akharche · 2019-10-22T20:48:00Z

hpat/datatypes/hpat_pandas_series_functions.py

+        input arg
+    axis: :obj:`int`
+        Has no effect but is accepted for compatibility with numpy.
+    kind: {‘mergesort’, ‘quicksort’, ‘heapsort’}, default ‘quicksort’


Whether it passed style check? I guess it was copied from somewhere with specific quotes formatting

akharche · 2019-10-22T20:57:02Z

hpat/datatypes/hpat_pandas_series_functions.py

+
+        return hpat_pandas_series_argsort_impl
+
+    def hpat_pandas_series_argsort_impl(self, axis=0, kind='quicksort', order=None):


The implementation is almost identical. Could we combine this solutions?

No we dont, because in cause with index we need to return (data, index)

densmirn · 2019-10-25T11:34:58Z

hpat/datatypes/hpat_pandas_series_functions.py

+    Parameters
+    -----------
+    self: :class:`pandas.Series`
+        input arg


input arg -> input series

densmirn · 2019-10-25T11:36:13Z

hpat/datatypes/hpat_pandas_series_functions.py

+        Has no effect but is accepted for compatibility with numpy.
+    kind: {‘mergesort’, ‘quicksort’, ‘heapsort’}, default ‘quicksort’
+        Choice of sorting algorithm. See np.sort for more information. ‘mergesort’ is the only stable algorithm
+    order: None


Could you add type of the parameter in docstring?

densmirn · 2019-10-25T11:40:06Z

hpat/datatypes/hpat_pandas_series_functions.py

+        raise TypingError('{} Currently function supports only numeric values. Given data type: {}'.format(_func_name,
+                                                                                             self.data.dtype))
+
+    if not (isinstance(axis, types.Omitted) or isinstance(axis, types.Integer) or axis == 0):


Actually axis can be represented as string, not only integer.

In pandas' docs axis can be only integer

axis : {0 or ‘index’}, default 0. The value ‘index’ is accepted for compatibility
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.sort_values.html

densmirn · 2019-10-25T11:41:52Z

hpat/tests/test_series.py

+
+        hpat_func = hpat.jit(test_impl)
+
+        data_test = [[6, 6, 2, 1, 3, 3, 2, 1, 2],


I propose to use globally defined data as input data in tests: f2435b0#diff-deca39d332649cea819383154a5d2cb3R39-R62

densmirn · 2019-10-25T11:43:16Z

hpat/datatypes/hpat_pandas_series_functions.py

+        *unsupported*
+    kind: {'mergesort', 'quicksort', 'heapsort'}, default 'quicksort'
+        Choice of sorting algorithm. See np.sort for more information. 'mergesort' is the only stable algorithm
+        *unsupported, uses python func - sorted()*


Please move *unsupported after description of the parameter.

densmirn · 2019-10-25T11:44:20Z

hpat/datatypes/hpat_pandas_series_functions.py

+       Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_sort_values1
+       Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_sort_values2
+       Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_sort_values_index1
+       Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_sort_values_noidx
+       Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_sort_values_idx
+       Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_sort_values_parallel1


You could use pattern to represent many tests via one line in docstring:
python -m hpat.runtests -k hpat.tests.test_series.TestSeries.test_series_sort_values*

densmirn · 2019-10-25T11:47:41Z

hpat/tests/test_series.py

+        data_test = [[6, 6, 2, 1, 3, 3, 2, 1, 2],
+                     [1.1, 0.3, 2.1, 1, 3, 0.3, 2.1, 1.1, 2.2],
+                     [6, 6.1, 2.2, 1, 3, 0, 2.2, 1, 2],
+                     [6, 6, 2, 1, 3, np.nan, np.nan, np.nan, np.nan],
+                     [3., 5.3, np.nan, np.nan, 33.2, 56.3, 4.4, 3.7, 8.9],
+                     ['a', 's', 'dd', 'm', 'll', '345', 'xrt', 'kd', 'qq'],
+                     ['dh', 'a', '', 'cv', 'b', '', 'b', 'b', 'p']
+                     ]


You could use globally defined data as input data in tests: f2435b0#diff-deca39d332649cea819383154a5d2cb3R39-R62

densmirn · 2019-10-30T13:39:30Z

hpat/datatypes/hpat_pandas_series_functions.py

-            index = numpy.arange(len(self._data))
-            my_index = numpy.arange(len(self._data))
-            used_index = numpy.full((len(self._data)), -1)
+            indices = numpy.arange(len(self._data))


Looks like indices is not required and index_result is enough. Correct me if I'm wrong.

index_result has type int, and our indexes can be other type, we need to create array of type equal self.index (in my case indices)

densmirn · 2019-10-30T13:41:13Z

hpat/datatypes/hpat_pandas_series_functions.py

-            index = self._index
-            my_index = numpy.arange(len(self._data))
-            used_index = numpy.full((len(self._data)), -1)
+            indices = self._index.copy()


The same, why do you need indices? Isn't it enough to have index_result?

densmirn · 2019-10-30T13:42:19Z

hpat/tests/test_series.py

@@ -65,6 +65,12 @@
    '大处着眼，小处着手。',
 ]

+test_global_input_data_unicode_kind1 = [


As known the list exists in master branch. Please do rebase.

densmirn · 2019-10-30T13:45:08Z

hpat/tests/test_series.py

+    return np.random.choice(rands_chars, size=nchars * size).view((np.str_, nchars))
+
+
+def gen_frand_array(size, min=-100, max=100):


Please use seed to be able to reproduce the data.

I think this is a bad idea to generate input randomly

densmirn · 2019-10-30T13:46:08Z

hpat/datatypes/hpat_pandas_series_functions.py

+    axis: :obj:`int`
+        Has no effect but is accepted for compatibility with numpy.
+        *unsupported*
+    kind: {'mergesort', 'quicksort', 'heapsort'}, default: 'quicksort'


Please describe type of the parameter, not only values.

densmirn · 2019-10-30T13:47:20Z

hpat/tests/test_series.py

+    float_list = (max - min) * np.random.sample(size) + min
+    return float_list


Why not return (max - min) * np.random.sample(size) + min?

@densmirn I have, at least, one explanation - it is easier to debug return value if you have it in separate value. You an not see result of equations if it written in return operator.
So, I would say this is a good practice to have float_list and use it in return later.

Actually you are always able to debug the result outside of the function.

shssf

Need to address diffrent perspective in this PR
If you have to use same algorithm (with minor differences) for diffrent branches - it is good idea to create some _methodname_algo function and use it here to avoid massive code duplication.

shssf · 2019-10-31T17:51:12Z

hpat/datatypes/hpat_pandas_series_functions.py

+
+    .. only:: developer
+
+       Test: python -m -k hpat.runtests hpat.tests.test_series.TestSeries.test_series_argsort*


I tried to use it:

(base) $ python -m -k hpat.runtests hpat.tests.test_series.TestSeries.test_series_argsort* /bin/python: No module named -k (base) $ python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_argsort* E ====================================================================== ERROR: test_series_argsort* (unittest.loader._FailedTest) ---------------------------------------------------------------------- AttributeError: type object 'TestSeries' has no attribute 'test_series_argsort*' ---------------------------------------------------------------------- Ran 1 test in 0.000s FAILED (errors=1)

You don't have to list all tests for this functionality here. one or two good test are enough by your choice.

I was mistaken, the correct call tests for the mask: python -m hpat.runtests -k hpat.tests.test_series.TestSeries.test_series_argsort*

shssf · 2019-10-31T17:52:50Z

hpat/datatypes/hpat_pandas_series_functions.py

+        raise TypingError('{} The object must be a pandas.series. Given: {}'.format(_func_name, self))
+
+    if not isinstance(self.data.dtype, types.Number):
+        raise TypingError('{} Currently function supports only numeric values. Given data type: {}'.format(_func_name,


Suggested change

raise TypingError('{} Currently function supports only numeric values. Given data type: {}'.format(_func_name,

raise TypingError('{} Non-numeric type unsupported. Given: {}'.format(_func_name,

Is it shorter, isn't it?

shssf · 2019-10-31T17:56:03Z

hpat/datatypes/hpat_pandas_series_functions.py

+        na = 0
+        for i in series_data.isna():
+            if i:
+                na += 1
+        id = 0
+        i = 0
+        list_no_nan = numpy.empty(len(self._data) - na)
+        for bool_value in series_data.isna():
+            if not bool_value:
+                list_no_nan[id] = self._data[i]
+                id += 1
+            i += 1
+        sort_no_nan = numpy.argsort(list_no_nan)
+        ne_na = sort[:len(sort) - na]
+        num = 0
+        result = numpy.full((len(self._data)), -1)
+        for i in numpy.sort(ne_na):
+            result[i] = sort_no_nan[num]
+            num += 1


It is ambiguous algorithm from my perspective. I think it might be easier.

shssf · 2019-10-31T17:57:35Z

hpat/datatypes/hpat_pandas_series_functions.py

+            for i in range(len(result_index)):
+                find = 0
+                for search in cycle:
+                    check = 0
+                    for j in used_index:
+                        if my_index[search] == j:
+                            check = 1
+                    if (self._data[search] == result[i]) and check == 0 and find == 0:
+                        result_index[i] = index[search]
+                        used_index[i] = my_index[search]
+                        find = 1
+            na = 0
+            for i in self.isna():
+                if i:
+                    na += 1
+            num = 0
+            for i in self.isna():
+                j = len(result_index) - na
+                if i and used_index[j] == -1:
+                    result_index[j] = index[num]
+                    used_index[j] = my_index[num]
+                    na -= 1
+                num += 1


I think Numpy can be used more intensively here to reduce complexity of the code.

shssf · 2019-10-31T17:58:43Z

hpat/datatypes/hpat_pandas_series_functions.py

 @overload_method(SeriesType, 'dropna')
 def hpat_pandas_series_dropna(self, axis=0, inplace=False):
    """
    Pandas Series method :meth:`pandas.Series.dropna` implementation.
-


the line should be here to form documentation properly

shssf · 2019-10-31T17:59:05Z

hpat/datatypes/hpat_pandas_series_functions.py

@@ -2556,3 +2814,4 @@ def hpat_pandas_series_dropna_impl(self, axis=0, inplace=False):
        return pandas.Series(data, index, self._name)

    return hpat_pandas_series_dropna_impl
+


this line should not be here :-)

shssf · 2019-10-31T18:00:53Z

hpat/tests/test_series.py

+    return np.random.choice(rands_chars, size=nchars * size).view((np.str_, nchars))
+
+
+def gen_frand_array(size, min=-100, max=100):


I think this is a bad idea to generate input randomly

shssf · 2019-10-31T18:06:39Z

hpat/datatypes/hpat_pandas_series_functions.py

+        raise TypingError('{} Unsupported parameters. Given axis: {}'.format(_func_name, axis))
+
+    if not isinstance(self.index, types.NoneType):
+        def hpat_pandas_series_argsort_impl(self, axis=0, kind='quicksort', order=None):


All these branches looks big. I think this is a good practice to insert a comments inside such branch with short explanations and test name for this branch.
First question will be asked here is "what differences between these branches?"

also, please don't use same name of functions. it looks like you use hpat_pandas_series_argsort_impl for any branch

…t_argsort

pep8speaks · 2019-11-01T13:10:39Z

Hello @1e-to! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file hpat/datatypes/hpat_pandas_series_functions.py:

Line 2789:49: E128 continuation line under-indented for visual indent
Line 2826:49: E128 continuation line under-indented for visual indent
Line 2845:49: E128 continuation line under-indented for visual indent
Line 2882:49: E128 continuation line under-indented for visual indent
Line 3061:50: W292 no newline at end of file

In the file hpat/tests/test_series.py:

Line 76:1: E302 expected 2 blank lines, found 1
Line 367:5: E303 too many blank lines (2)

Comment last updated at 2019-11-05 20:09:28 UTC

…t_argsort

Add functional and 3 tests

d658914

1e-to requested review from shssf, akharche, kozlov-alexey and densmirn October 22, 2019 15:39

1e-to changed the title ~~Add functional and 3 tests~~ Series.argsort() Add functional and 3 tests Oct 22, 2019

akharche reviewed Oct 22, 2019

View reviewed changes

Add Series.sort_values() and tests

b7d4481

1e-to changed the title ~~Series.argsort() Add functional and 3 tests~~ Series.argsort() / Series.sort_values() Oct 25, 2019

densmirn suggested changes Oct 25, 2019

View reviewed changes

densmirn added the Waiting on author label Oct 25, 2019

etotmeni and others added 5 commits October 28, 2019 13:20

Fix docs

6f40be4

Started redesigning tests

bcfc98c

Merge branch 'master' into implement_argsort

780d49e

Optimized algo + redesign tests

9e5bc36

Merge branch 'master' into implement_argsort

e961c5c

1e-to added Ready for Review and removed Waiting on author labels Oct 30, 2019

densmirn reviewed Oct 30, 2019

View reviewed changes

densmirn added Waiting on author and removed Ready for Review labels Oct 30, 2019

Fix docs + test

d9a796e

1e-to added Ready for Review and removed Waiting on author labels Oct 30, 2019

1e-to added 2 commits October 31, 2019 19:30

Merge branch 'master' into implement_argsort

a7257b5

Merge branch 'master' into implement_argsort

5e9e1bf

shssf suggested changes Oct 31, 2019

View reviewed changes

shssf removed the Ready for Review label Oct 31, 2019

shssf added the Waiting on author label Oct 31, 2019

etotmeni added 2 commits November 1, 2019 16:08

Some fixes + optimization argsort algo

2e96009

Merge remote-tracking branch 'origin/implement_argsort' into implemen…

3863bf1

…t_argsort

Merge branch 'master' into implement_argsort

0730e37

1e-to added [WIP] Work in progress and removed Waiting on author labels Nov 1, 2019

etotmeni and others added 3 commits November 5, 2019 13:23

Fix func names

844d2c4

Merge remote-tracking branch 'origin/implement_argsort' into implemen…

4195272

…t_argsort

Merge branch 'master' into implement_argsort

de7ff34

1e-to added Ready for Review and removed [WIP] Work in progress labels Nov 5, 2019

Merge branch 'master' into implement_argsort

19714a7

shssf approved these changes Nov 5, 2019

View reviewed changes

Merge branch 'master' into implement_argsort

5f13093

shssf merged commit a62195d into IntelPython:master Nov 5, 2019


		return hpat_pandas_series_argsort_impl

		def hpat_pandas_series_argsort_impl(self, axis=0, kind='quicksort', order=None):


		hpat_func = hpat.jit(test_impl)

		data_test = [[6, 6, 2, 1, 3, 3, 2, 1, 2],

		return np.random.choice(rands_chars, size=nchars * size).view((np.str_, nchars))


		def gen_frand_array(size, min=-100, max=100):

		float_list = (max - min) * np.random.sample(size) + min
		return float_list


		.. only:: developer

		Test: python -m -k hpat.runtests hpat.tests.test_series.TestSeries.test_series_argsort*

	raise TypingError('{} Currently function supports only numeric values. Given data type: {}'.format(_func_name,
	raise TypingError('{} Non-numeric type unsupported. Given: {}'.format(_func_name,

		@@ -2556,3 +2814,4 @@ def hpat_pandas_series_dropna_impl(self, axis=0, inplace=False):
		return pandas.Series(data, index, self._name)

		return hpat_pandas_series_dropna_impl

Series.argsort() / Series.sort_values() #248

Series.argsort() / Series.sort_values() #248

Conversation

1e-to commented Oct 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shssf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Nov 1, 2019 • edited Loading

Comment last updated at 2019-11-05 20:09:28 UTC

pep8speaks commented Nov 1, 2019 •

edited

Loading