Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Conversation

@1e-to
Copy link
Contributor

@1e-to 1e-to commented Feb 7, 2020

image

@pep8speaks
Copy link

pep8speaks commented Feb 7, 2020

Hello @1e-to! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-02-19 12:48:19 UTC

if isinstance(dtype, types.Number):
def sdc_nanargmin_impl(self):
res = max_ref
position = max_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is max_?

position = max_
length = len(self)
for i in prange(length):
if min(res, self[i]) == self[i]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks weird.

if not isnan(self[i]):
    res = min(res, self[i])
    if res == self[i]:
        position = min(position, i)

@AlexanderKalistratov
Copy link
Collaborator

Python version is 10000 times faster on one thread? Am I getting it right?

@AlexanderKalistratov
Copy link
Collaborator

@PokhodenkoSA I can't find previous measurements of argmax/argmin anywhere. Do we have it?

@1e-to
Copy link
Contributor Author

1e-to commented Feb 7, 2020

Python version is 10000 times faster on one thread? Am I getting it right?

Yes, it is, perhaps the python finds and returns the first nan by function min (x,y)

I can't find previous measurements of argmax/argmin anywhere. Do we have it?

Now we use argmin that named "Numba" at the table of results

position = max_
length = len(self)
for i in prange(length):
if not isnan(self[i]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try it this way:

result_is_nan = False
if not isnan(self[i]):
    if not result_is_nan:
        res = min(res, self[i])
        if res == self[i]:
            position = min(position, i)
else:
    position = min(position, i)
    result_is_nan = True

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we need different position for nan result and for not nan result:

result_is_nan = False
nanposition = max_
if not isnan(self[i]):
    if not result_is_nan:
        res = min(res, self[i])
        if res == self[i]:
            position = min(position, i)
else:
    nanposition = min(nanposition, i)
    result_is_nan = True

@AlexanderKalistratov
Copy link
Collaborator

Python version is 10000 times faster on one thread? Am I getting it right?

Yes, it is, perhaps the python finds and returns the first nan by function min (x,y)

Can we test also without nans?

res = min_ref
pos = max_int64
for j in range(chunk.start, chunk.stop):
if max(res, self[j]) == self[j]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using chunks now, can't we simplify it just to:

if self[j] > res:
    res = self[j]
    pos = j

@AlexanderKalistratov
Copy link
Collaborator

Could you please think if you can generalize argmin/argmax implementation in the same way it was done here: #541

@PokhodenkoSA has details

@AlexanderKalistratov
Copy link
Collaborator

any performance numbers?

@1e-to
Copy link
Contributor Author

1e-to commented Feb 19, 2020

any performance numbers?
Skipna = True and False has equal results:
image

CE(type_='Numba', code='data.astype(np.int64)', jitted=True),
CE(type_='SDC', code='sdc.functions.numpy_like.astype(data, np.int64)', jitted=True),
], usecase_params='data'),
TC(name='nanargmin', size=[10 ** 7], call_expr=[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did not you add cases for Numba for np.nanargmin and np.nanargmax?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because its not exist. Numba dont have this methods

Comment on lines 2985 to 3006
if isinstance(self.data, StringArrayType):
def hpat_pandas_series_idxmax_impl(self, axis=None, skipna=None):
if skipna is None:
_skipna = True
else:
raise ValueError("Method idxmax(). Unsupported parameter 'skipna'=False with str data")

return numpy.argmax(self._data)

return hpat_pandas_series_idxmax_impl

def hpat_pandas_series_idxmax_impl(self, axis=None, skipna=None):
# return numpy.argmax(self._data)
if skipna is None:
_skipna = True
else:
_skipna = skipna

if _skipna:
return numpy_like.nanargmax(self._data)

return numpy_like.argmax(self._data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create variable none_index = isinstance(self.index, types.NoneType) or self.index is None, then make common implementations hpat_pandas_series_idxmax_impl and hpat_pandas_series_idxmax_impl for both cases with index and without index. In the implementations I would add something like that:

if none_index == True:  # noqa
    return result
else:
    self._index[int(result)]

The same is applicable for idxmin.

Comment on lines 197 to 203
if reduce_op(res, self[j]) == self[j]:
if not isnan(self[j]):
if res == self[j]:
pos = min(pos, j)
else:
pos = j
res = self[j]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's decrease indentations:

if reduce_op(res, self[j]) != self[j]:
    continue
if isnan(self[j]):
    continue
if res == self[j]:
    pos = min(pos, j)
else:
    pos = j
    res = self[j]

Comment on lines 210 to 211
if reduce_op(general_res, arr_res[i]) == arr_res[i]:
if general_res == arr_res[i]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's decrease indentations.

Comment on lines +247 to +250
def sdc_impl(a):
return numpy_like.argmin(a)

sdc_func = self.jit(sdc_impl)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's combine that to:

@self.jit
def sdc_impl(a):
    return numpy_like.argmin(a)

This change is acceptable for other tests.

@AlexanderKalistratov
Copy link
Collaborator

@1e-to conflict

@AlexanderKalistratov AlexanderKalistratov merged commit d529471 into IntelPython:master Feb 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants