-
Notifications
You must be signed in to change notification settings - Fork 61
Fix new-style implementation of Series.append() method #262
Fix new-style implementation of Series.append() method #262
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kozlov-alexey could you please write additional tests with nan
, including cases when nan
is in index?
hpat/hiframes/api.py
Outdated
|
||
|
||
@overload(_append) | ||
def _append_overload(A, B): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good place for hidden function used nowhere in hiframes
.
Why do you need it overloaded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shssf I agree that there should be a better place for this function. Do you think it should be placed in a new file in datatypes?
The reason why it's needed is that without it I have to duplicate the same code in hpat_pandas_series_append, since both data and index can be numpy arrays/string_array and they are concatenated differently and also concatenation of array with list of arrays is not supported by Numba's np.concatenate, so it has to be implemented ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to hpat/datatypes/common_functions.py
self: :obj:`pandas.Series` | ||
input series | ||
to_append : :obj:`pandas.Series` object or :obj:`list` or :obj:`set` | ||
Series (or list or tuple of Series) to append with self | ||
ignore_index: :obj:`bool`, default False | ||
If True, do not use the index labels. | ||
Supported as literal value only | ||
verify_integrity: :obj:`bool`, default False | ||
If True, raise Exception on creating index with duplicates. | ||
*unsupported* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's align indentations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
def hpat_pandas_series_append_single_impl(self, to_append, ignore_index=False, verify_integrity=False): | ||
|
||
new_data = hpat.hiframes.api._append(self._data, to_append._data) | ||
new_index = hpat.hiframes.api._append(self.index, to_append.index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work with self.index
? Shouldn't we use self._index
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@densmirn Yes, self.index returns np.arange index if self._index is None.
Thanks for the comment, I'll update my tests to check that series without index are handled too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@densmirn Actually, it turned out tests with suffix index_default (e.g. test_series_append_single_index_default) check exactly that self.index is called, while self._index is None (as we unbox default series index into None).
def hpat_pandas_series_append_list_impl(self, to_append, ignore_index=False, verify_integrity=False): | ||
|
||
data_arrays_to_append = [series._data for series in to_append] | ||
index_arrays_to_append = [series.index for series in to_append] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work with series.index
? Shouldn't we use series._index
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see above.
def hpat_pandas_series_append_impl(self, to_append): | ||
return pandas.Series(self._data + to_append._data) | ||
new_data = hpat.hiframes.api._append(self._data, data_arrays_to_append) | ||
new_index = hpat.hiframes.api._append(self.index, index_arrays_to_append) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same, self.index
-> self._index
?
hpat/tests/test_series.py
Outdated
S2 = pd.Series([-2., 5.0]) | ||
# Test single series | ||
np.testing.assert_array_equal(hpat_func(S1, S2), test_impl(S1, S2)) | ||
dtype_to_data = {'float': [[-2., 3., 9.1], [-2., 5.0]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using of globally defined input data for tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@densmirn Hmm, I would use that if there's a reason of testing wider range of values, but append doesn't do any calculations or doesn't depend on values being used in a series (at least it shouldn't). I prefer it this way, but if you really think there might be a problem with this approach, I can change it of course.
699f1fc
to
03c9265
Compare
Hello @kozlov-alexey! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-11-08 14:21:07 UTC |
|
||
if ignore_index_is_false: | ||
def hpat_pandas_series_append_impl(self, to_append, ignore_index=False, verify_integrity=False): | ||
if to_append_is_series == True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add # noqa
at the end of line to say linter skip this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@densmirn Thanks!
03c9265
to
d1fb912
Compare
return var == value | ||
|
||
|
||
def hpat_arrays_append(A, B): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this function is needed? As far as I understand you could use other decorator to register function to be used in JIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shssf This should provide different overloads for the same append operation for various underlying data types (np.array, StringArray). What are you suggesting to use instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shssf please specify the exact way to implement it, or let's close this discussion
return var.literal_value == value | ||
|
||
|
||
def has_python_value(var, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not easy to understand why with function by the name. Please add docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
from hpat.str_arr_ext import (string_array_type, num_total_chars, append_string_array_to) | ||
|
||
|
||
def has_literal_value(var, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not easy to understand why with function by the name. Please add docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
d1fb912
to
50a7b52
Compare
50a7b52
to
f53c4db
Compare
f53c4db
to
7579497
Compare
@@ -1288,7 +1288,7 @@ def hpat_pandas_series_ctor_impl(data=None, index=None, dtype=None, name=None, c | |||
|
|||
'''' use binop here as otherwise Numba's dead branch pruning doesn't work | |||
TODO: replace with 'if not is_index_none' when resolved ''' | |||
if is_index_none == False: | |||
if is_index_none == False: # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix it accordion PEP
|
||
@numba.njit(no_cpython_wrapper=True) | ||
def append_string_array_to(result, pos, A): | ||
# precondition: result is allocated with the size enough to contain A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add docstring
|
||
if ignore_index_is_false: | ||
def hpat_pandas_series_append_impl(self, to_append, ignore_index=False, verify_integrity=False): | ||
if to_append_is_series == True: # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix style according PEP
else: | ||
def hpat_pandas_series_append_ignore_index_impl(self, to_append, ignore_index=False, verify_integrity=False): | ||
|
||
if to_append_is_series == True: # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix style according PEP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shssf It's not possible - with "if to_append_is_series" Numba will fail to compile, as dead branch pruning won't eliminate the dead branch (it's similar as in
https://github.com/kozlov-alexey/hpat/blob/45c06272f576ce53dc76038da08d59338770b39a/hpat/hiframes/pd_series_ext.py#L1278).
No description provided.