Skip to content
This repository has been archived by the owner on Feb 2, 2024. It is now read-only.

Fix new-style implementation of Series.append() method #262

Merged

Conversation

kozlov-alexey
Copy link
Contributor

No description provided.

Copy link
Collaborator

@AlexanderKalistratov AlexanderKalistratov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kozlov-alexey could you please write additional tests with nan, including cases when nan is in index?

hpat/datatypes/hpat_pandas_series_functions.py Outdated Show resolved Hide resolved
hpat/datatypes/hpat_pandas_series_functions.py Outdated Show resolved Hide resolved
hpat/datatypes/hpat_pandas_series_functions.py Outdated Show resolved Hide resolved
hpat/hiframes/api.py Outdated Show resolved Hide resolved


@overload(_append)
def _append_overload(A, B):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good place for hidden function used nowhere in hiframes.
Why do you need it overloaded?

Copy link
Contributor Author

@kozlov-alexey kozlov-alexey Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shssf I agree that there should be a better place for this function. Do you think it should be placed in a new file in datatypes?
The reason why it's needed is that without it I have to duplicate the same code in hpat_pandas_series_append, since both data and index can be numpy arrays/string_array and they are concatenated differently and also concatenation of array with list of arrays is not supported by Numba's np.concatenate, so it has to be implemented ourselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to hpat/datatypes/common_functions.py

Comment on lines 614 to 623
self: :obj:`pandas.Series`
input series
to_append : :obj:`pandas.Series` object or :obj:`list` or :obj:`set`
Series (or list or tuple of Series) to append with self
ignore_index: :obj:`bool`, default False
If True, do not use the index labels.
Supported as literal value only
verify_integrity: :obj:`bool`, default False
If True, raise Exception on creating index with duplicates.
*unsupported*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's align indentations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

def hpat_pandas_series_append_single_impl(self, to_append, ignore_index=False, verify_integrity=False):

new_data = hpat.hiframes.api._append(self._data, to_append._data)
new_index = hpat.hiframes.api._append(self.index, to_append.index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work with self.index? Shouldn't we use self._index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@densmirn Yes, self.index returns np.arange index if self._index is None.
Thanks for the comment, I'll update my tests to check that series without index are handled too.

Copy link
Contributor Author

@kozlov-alexey kozlov-alexey Nov 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@densmirn Actually, it turned out tests with suffix index_default (e.g. test_series_append_single_index_default) check exactly that self.index is called, while self._index is None (as we unbox default series index into None).

def hpat_pandas_series_append_list_impl(self, to_append, ignore_index=False, verify_integrity=False):

data_arrays_to_append = [series._data for series in to_append]
index_arrays_to_append = [series.index for series in to_append]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work with series.index? Shouldn't we use series._index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see above.

def hpat_pandas_series_append_impl(self, to_append):
return pandas.Series(self._data + to_append._data)
new_data = hpat.hiframes.api._append(self._data, data_arrays_to_append)
new_index = hpat.hiframes.api._append(self.index, index_arrays_to_append)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same, self.index -> self._index?

S2 = pd.Series([-2., 5.0])
# Test single series
np.testing.assert_array_equal(hpat_func(S1, S2), test_impl(S1, S2))
dtype_to_data = {'float': [[-2., 3., 9.1], [-2., 5.0]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using of globally defined input data for tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@densmirn Hmm, I would use that if there's a reason of testing wider range of values, but append doesn't do any calculations or doesn't depend on values being used in a series (at least it shouldn't). I prefer it this way, but if you really think there might be a problem with this approach, I can change it of course.

@pep8speaks
Copy link

pep8speaks commented Nov 5, 2019

Hello @kozlov-alexey! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-11-08 14:21:07 UTC


if ignore_index_is_false:
def hpat_pandas_series_append_impl(self, to_append, ignore_index=False, verify_integrity=False):
if to_append_is_series == True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add # noqa at the end of line to say linter skip this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@densmirn Thanks!

hpat/datatypes/common_functions.py Outdated Show resolved Hide resolved
hpat/tests/test_series.py Show resolved Hide resolved
hpat/tests/test_series.py Show resolved Hide resolved
hpat/tests/test_series.py Outdated Show resolved Hide resolved
hpat/tests/test_series.py Show resolved Hide resolved
hpat/tests/test_series.py Show resolved Hide resolved
hpat/tests/test_series.py Outdated Show resolved Hide resolved
hpat/tests/test_series.py Show resolved Hide resolved
hpat/tests/test_series.py Show resolved Hide resolved
return var == value


def hpat_arrays_append(A, B):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this function is needed? As far as I understand you could use other decorator to register function to be used in JIT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shssf This should provide different overloads for the same append operation for various underlying data types (np.array, StringArray). What are you suggesting to use instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shssf please specify the exact way to implement it, or let's close this discussion

return var.literal_value == value


def has_python_value(var, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not easy to understand why with function by the name. Please add docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

from hpat.str_arr_ext import (string_array_type, num_total_chars, append_string_array_to)


def has_literal_value(var, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not easy to understand why with function by the name. Please add docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@@ -1288,7 +1288,7 @@ def hpat_pandas_series_ctor_impl(data=None, index=None, dtype=None, name=None, c

'''' use binop here as otherwise Numba's dead branch pruning doesn't work
TODO: replace with 'if not is_index_none' when resolved '''
if is_index_none == False:
if is_index_none == False: # noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix it accordion PEP


@numba.njit(no_cpython_wrapper=True)
def append_string_array_to(result, pos, A):
# precondition: result is allocated with the size enough to contain A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add docstring


if ignore_index_is_false:
def hpat_pandas_series_append_impl(self, to_append, ignore_index=False, verify_integrity=False):
if to_append_is_series == True: # noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix style according PEP

else:
def hpat_pandas_series_append_ignore_index_impl(self, to_append, ignore_index=False, verify_integrity=False):

if to_append_is_series == True: # noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix style according PEP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shssf It's not possible - with "if to_append_is_series" Numba will fail to compile, as dead branch pruning won't eliminate the dead branch (it's similar as in
https://github.com/kozlov-alexey/hpat/blob/45c06272f576ce53dc76038da08d59338770b39a/hpat/hiframes/pd_series_ext.py#L1278).

@shssf shssf merged commit b857c10 into IntelPython:master Nov 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants