-
Notifications
You must be signed in to change notification settings - Fork 62
Argmin/argmax + nanargmin/nanargmax #580
Conversation
sdc/functions/numpy_like.py
Outdated
| if isinstance(dtype, types.Number): | ||
| def sdc_nanargmin_impl(self): | ||
| res = max_ref | ||
| position = max_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is max_?
sdc/functions/numpy_like.py
Outdated
| position = max_ | ||
| length = len(self) | ||
| for i in prange(length): | ||
| if min(res, self[i]) == self[i]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks weird.
if not isnan(self[i]):
res = min(res, self[i])
if res == self[i]:
position = min(position, i)|
Python version is 10000 times faster on one thread? Am I getting it right? |
|
@PokhodenkoSA I can't find previous measurements of argmax/argmin anywhere. Do we have it? |
Yes, it is, perhaps the python finds and returns the first nan by function min (x,y)
Now we use argmin that named "Numba" at the table of results |
sdc/functions/numpy_like.py
Outdated
| position = max_ | ||
| length = len(self) | ||
| for i in prange(length): | ||
| if not isnan(self[i]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try it this way:
result_is_nan = False
if not isnan(self[i]):
if not result_is_nan:
res = min(res, self[i])
if res == self[i]:
position = min(position, i)
else:
position = min(position, i)
result_is_nan = TrueThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we need different position for nan result and for not nan result:
result_is_nan = False
nanposition = max_
if not isnan(self[i]):
if not result_is_nan:
res = min(res, self[i])
if res == self[i]:
position = min(position, i)
else:
nanposition = min(nanposition, i)
result_is_nan = True
Can we test also without |
sdc/functions/numpy_like.py
Outdated
| res = min_ref | ||
| pos = max_int64 | ||
| for j in range(chunk.start, chunk.stop): | ||
| if max(res, self[j]) == self[j]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are using chunks now, can't we simplify it just to:
if self[j] > res:
res = self[j]
pos = j|
Could you please think if you can generalize argmin/argmax implementation in the same way it was done here: #541 @PokhodenkoSA has details |
|
any performance numbers? |
| CE(type_='Numba', code='data.astype(np.int64)', jitted=True), | ||
| CE(type_='SDC', code='sdc.functions.numpy_like.astype(data, np.int64)', jitted=True), | ||
| ], usecase_params='data'), | ||
| TC(name='nanargmin', size=[10 ** 7], call_expr=[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did not you add cases for Numba for np.nanargmin and np.nanargmax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because its not exist. Numba dont have this methods
| if isinstance(self.data, StringArrayType): | ||
| def hpat_pandas_series_idxmax_impl(self, axis=None, skipna=None): | ||
| if skipna is None: | ||
| _skipna = True | ||
| else: | ||
| raise ValueError("Method idxmax(). Unsupported parameter 'skipna'=False with str data") | ||
|
|
||
| return numpy.argmax(self._data) | ||
|
|
||
| return hpat_pandas_series_idxmax_impl | ||
|
|
||
| def hpat_pandas_series_idxmax_impl(self, axis=None, skipna=None): | ||
| # return numpy.argmax(self._data) | ||
| if skipna is None: | ||
| _skipna = True | ||
| else: | ||
| _skipna = skipna | ||
|
|
||
| if _skipna: | ||
| return numpy_like.nanargmax(self._data) | ||
|
|
||
| return numpy_like.argmax(self._data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would create variable none_index = isinstance(self.index, types.NoneType) or self.index is None, then make common implementations hpat_pandas_series_idxmax_impl and hpat_pandas_series_idxmax_impl for both cases with index and without index. In the implementations I would add something like that:
if none_index == True: # noqa
return result
else:
self._index[int(result)]The same is applicable for idxmin.
sdc/functions/numpy_like.py
Outdated
| if reduce_op(res, self[j]) == self[j]: | ||
| if not isnan(self[j]): | ||
| if res == self[j]: | ||
| pos = min(pos, j) | ||
| else: | ||
| pos = j | ||
| res = self[j] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's decrease indentations:
if reduce_op(res, self[j]) != self[j]:
continue
if isnan(self[j]):
continue
if res == self[j]:
pos = min(pos, j)
else:
pos = j
res = self[j]
sdc/functions/numpy_like.py
Outdated
| if reduce_op(general_res, arr_res[i]) == arr_res[i]: | ||
| if general_res == arr_res[i]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's decrease indentations.
| def sdc_impl(a): | ||
| return numpy_like.argmin(a) | ||
|
|
||
| sdc_func = self.jit(sdc_impl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's combine that to:
@self.jit
def sdc_impl(a):
return numpy_like.argmin(a)This change is acceptable for other tests.
|
@1e-to conflict |

Uh oh!
There was an error while loading. Please reload this page.