-
Notifications
You must be signed in to change notification settings - Fork 62
Series.argsort() / Series.sort_values() #248
Conversation
input arg | ||
axis: :obj:`int` | ||
Has no effect but is accepted for compatibility with numpy. | ||
kind: {‘mergesort’, ‘quicksort’, ‘heapsort’}, default ‘quicksort’ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether it passed style check? I guess it was copied from somewhere with specific quotes formatting
|
||
return hpat_pandas_series_argsort_impl | ||
|
||
def hpat_pandas_series_argsort_impl(self, axis=0, kind='quicksort', order=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation is almost identical. Could we combine this solutions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we dont, because in cause with index we need to return (data, index)
Parameters | ||
----------- | ||
self: :class:`pandas.Series` | ||
input arg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input arg
-> input series
Has no effect but is accepted for compatibility with numpy. | ||
kind: {‘mergesort’, ‘quicksort’, ‘heapsort’}, default ‘quicksort’ | ||
Choice of sorting algorithm. See np.sort for more information. ‘mergesort’ is the only stable algorithm | ||
order: None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add type of the parameter in docstring?
raise TypingError('{} Currently function supports only numeric values. Given data type: {}'.format(_func_name, | ||
self.data.dtype)) | ||
|
||
if not (isinstance(axis, types.Omitted) or isinstance(axis, types.Integer) or axis == 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually axis
can be represented as string
, not only integer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pandas' docs axis can be only integer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
axis : {0 or ‘index’}, default 0. The value ‘index’ is accepted for compatibility
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.sort_values.html
hpat/tests/test_series.py
Outdated
|
||
hpat_func = hpat.jit(test_impl) | ||
|
||
data_test = [[6, 6, 2, 1, 3, 3, 2, 1, 2], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to use globally defined data as input data in tests: f2435b0#diff-deca39d332649cea819383154a5d2cb3R39-R62
*unsupported* | ||
kind: {'mergesort', 'quicksort', 'heapsort'}, default 'quicksort' | ||
Choice of sorting algorithm. See np.sort for more information. 'mergesort' is the only stable algorithm | ||
*unsupported, uses python func - sorted()* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move *unsupported
after description of the parameter.
Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_sort_values1 | ||
Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_sort_values2 | ||
Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_sort_values_index1 | ||
Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_sort_values_noidx | ||
Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_sort_values_idx | ||
Test: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_sort_values_parallel1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use pattern to represent many tests via one line in docstring:
python -m hpat.runtests -k hpat.tests.test_series.TestSeries.test_series_sort_values*
hpat/tests/test_series.py
Outdated
data_test = [[6, 6, 2, 1, 3, 3, 2, 1, 2], | ||
[1.1, 0.3, 2.1, 1, 3, 0.3, 2.1, 1.1, 2.2], | ||
[6, 6.1, 2.2, 1, 3, 0, 2.2, 1, 2], | ||
[6, 6, 2, 1, 3, np.nan, np.nan, np.nan, np.nan], | ||
[3., 5.3, np.nan, np.nan, 33.2, 56.3, 4.4, 3.7, 8.9], | ||
['a', 's', 'dd', 'm', 'll', '345', 'xrt', 'kd', 'qq'], | ||
['dh', 'a', '', 'cv', 'b', '', 'b', 'b', 'p'] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use globally defined data as input data in tests: f2435b0#diff-deca39d332649cea819383154a5d2cb3R39-R62
index = numpy.arange(len(self._data)) | ||
my_index = numpy.arange(len(self._data)) | ||
used_index = numpy.full((len(self._data)), -1) | ||
indices = numpy.arange(len(self._data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like indices
is not required and index_result
is enough. Correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index_result has type int, and our indexes can be other type, we need to create array of type equal self.index (in my case indices)
index = self._index | ||
my_index = numpy.arange(len(self._data)) | ||
used_index = numpy.full((len(self._data)), -1) | ||
indices = self._index.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same, why do you need indices
? Isn't it enough to have index_result
?
hpat/tests/test_series.py
Outdated
@@ -65,6 +65,12 @@ | |||
'大处着眼,小处着手。', | |||
] | |||
|
|||
test_global_input_data_unicode_kind1 = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As known the list exists in master
branch. Please do rebase.
return np.random.choice(rands_chars, size=nchars * size).view((np.str_, nchars)) | ||
|
||
|
||
def gen_frand_array(size, min=-100, max=100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use seed to be able to reproduce the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bad idea to generate input randomly
axis: :obj:`int` | ||
Has no effect but is accepted for compatibility with numpy. | ||
*unsupported* | ||
kind: {'mergesort', 'quicksort', 'heapsort'}, default: 'quicksort' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please describe type of the parameter, not only values.
hpat/tests/test_series.py
Outdated
float_list = (max - min) * np.random.sample(size) + min | ||
return float_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return (max - min) * np.random.sample(size) + min
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@densmirn I have, at least, one explanation - it is easier to debug return value if you have it in separate value. You an not see result of equations if it written in return
operator.
So, I would say this is a good practice to have float_list
and use it in return later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you are always able to debug the result outside of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to address diffrent perspective in this PR
If you have to use same algorithm (with minor differences) for diffrent branches - it is good idea to create some _methodname_algo
function and use it here to avoid massive code duplication.
|
||
.. only:: developer | ||
|
||
Test: python -m -k hpat.runtests hpat.tests.test_series.TestSeries.test_series_argsort* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to use it:
(base) $ python -m -k hpat.runtests hpat.tests.test_series.TestSeries.test_series_argsort*
/bin/python: No module named -k
(base) $ python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_argsort*
E
======================================================================
ERROR: test_series_argsort* (unittest.loader._FailedTest)
----------------------------------------------------------------------
AttributeError: type object 'TestSeries' has no attribute 'test_series_argsort*'
----------------------------------------------------------------------
Ran 1 test in 0.000s
FAILED (errors=1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to list all tests for this functionality here. one or two good test are enough by your choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mistaken, the correct call tests for the mask: python -m hpat.runtests -k hpat.tests.test_series.TestSeries.test_series_argsort*
raise TypingError('{} The object must be a pandas.series. Given: {}'.format(_func_name, self)) | ||
|
||
if not isinstance(self.data.dtype, types.Number): | ||
raise TypingError('{} Currently function supports only numeric values. Given data type: {}'.format(_func_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise TypingError('{} Currently function supports only numeric values. Given data type: {}'.format(_func_name, | |
raise TypingError('{} Non-numeric type unsupported. Given: {}'.format(_func_name, |
Is it shorter, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
na = 0 | ||
for i in series_data.isna(): | ||
if i: | ||
na += 1 | ||
id = 0 | ||
i = 0 | ||
list_no_nan = numpy.empty(len(self._data) - na) | ||
for bool_value in series_data.isna(): | ||
if not bool_value: | ||
list_no_nan[id] = self._data[i] | ||
id += 1 | ||
i += 1 | ||
sort_no_nan = numpy.argsort(list_no_nan) | ||
ne_na = sort[:len(sort) - na] | ||
num = 0 | ||
result = numpy.full((len(self._data)), -1) | ||
for i in numpy.sort(ne_na): | ||
result[i] = sort_no_nan[num] | ||
num += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is ambiguous algorithm from my perspective. I think it might be easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
for i in range(len(result_index)): | ||
find = 0 | ||
for search in cycle: | ||
check = 0 | ||
for j in used_index: | ||
if my_index[search] == j: | ||
check = 1 | ||
if (self._data[search] == result[i]) and check == 0 and find == 0: | ||
result_index[i] = index[search] | ||
used_index[i] = my_index[search] | ||
find = 1 | ||
na = 0 | ||
for i in self.isna(): | ||
if i: | ||
na += 1 | ||
num = 0 | ||
for i in self.isna(): | ||
j = len(result_index) - na | ||
if i and used_index[j] == -1: | ||
result_index[j] = index[num] | ||
used_index[j] = my_index[num] | ||
na -= 1 | ||
num += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Numpy can be used more intensively here to reduce complexity of the code.
@overload_method(SeriesType, 'dropna') | ||
def hpat_pandas_series_dropna(self, axis=0, inplace=False): | ||
""" | ||
Pandas Series method :meth:`pandas.Series.dropna` implementation. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the line should be here to form documentation properly
@@ -2556,3 +2814,4 @@ def hpat_pandas_series_dropna_impl(self, axis=0, inplace=False): | |||
return pandas.Series(data, index, self._name) | |||
|
|||
return hpat_pandas_series_dropna_impl | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line should not be here :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
return np.random.choice(rands_chars, size=nchars * size).view((np.str_, nchars)) | ||
|
||
|
||
def gen_frand_array(size, min=-100, max=100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bad idea to generate input randomly
raise TypingError('{} Unsupported parameters. Given axis: {}'.format(_func_name, axis)) | ||
|
||
if not isinstance(self.index, types.NoneType): | ||
def hpat_pandas_series_argsort_impl(self, axis=0, kind='quicksort', order=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these branches looks big. I think this is a good practice to insert a comments inside such branch with short explanations and test name for this branch.
First question will be asked here is "what differences between these branches?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, please don't use same name of functions. it looks like you use hpat_pandas_series_argsort_impl
for any branch
Hello @1e-to! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2019-11-05 20:09:28 UTC |
No description provided.