-
Notifications
You must be signed in to change notification settings - Fork 61
Refactor Series.dropna() to a new style w/o inplace support #239
Refactor Series.dropna() to a new style w/o inplace support #239
Conversation
---------- | ||
self: :obj:`pandas.Series` | ||
input series | ||
axis: {0 or `index`}, default 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type of the argument has to be defined
89c6093
to
994394c
Compare
if isinstance(self.dtype, (types.Number, types.Boolean)): | ||
def hpat_pandas_series_np_arrays_dropna_impl(self, axis=0, inplace=False): | ||
# generate Series index if needed by using SeriesType.index (i.e. not self._index) | ||
print("DEBUG: running new-style impl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug traces
994394c
to
6504270
Compare
6504270
to
266b85c
Compare
hpat_func = hpat.jit(test_impl) | ||
|
||
S1 = pd.Series([1.0, 2.0, np.nan, 1.0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to copy in tests that don't test inplace operation
hpat_func = hpat.jit(test_impl) | ||
|
||
S1 = pd.Series([1.0, 2.0, np.nan, 1.0]) | ||
S1 = pd.Series([1.0, 2.0, np.nan, 1.0, np.inf]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use floats defined globally? e.g. f2435b0#diff-deca39d332649cea819383154a5d2cb3R39-R42
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@densmirn Sure, but in most of the tests indexes of different types have to be specified too. I added using test_global_input_data_float64 for one of the tests that uses default index: test_series_dropna_float_index1. That should be enough probably.
|
||
# TODO: handle inplace argument (currently supported with old-style impl only via assign.target) | ||
# Find a way to assign back to self._data and return None | ||
if isinstance(self.dtype, (types.Number, types.Boolean)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to merge if and else statements? I mean to use one of the approaches for both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@densmirn Yes, the else branch actually handles both cases. But as in other places legacy implementation tends to provide lightweight overloads where possible. I will add new gen_nan_mask function for arrays in the next commit.
|
||
.. only:: developer | ||
|
||
Tests: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_dropna_axis1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to use pattern reflected all tests:
python -m hpat.runtests -k hpat.tests.test_series.TestSeries.test_series_dropna*
-k
lets to run it in command line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
266b85c
to
e8772dd
Compare
|
||
def hpat_pandas_series_dropna_impl(self, axis=0, inplace=False): | ||
# generate Series index if needed by using SeriesType.index (i.e. not self._index) | ||
na_data_arr = hpat.hiframes.api.get_nan_mask(self._data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to implement or get the function which can return self._data == numpy.nan
and use it in a loop here.
It will solve issue with index generation. Also, this function will be very helpful in other series methods
if isinstance(arr, types.Array): | ||
dtype = arr.dtype | ||
if isinstance(dtype, types.Float): | ||
return lambda arr: np.isnan(arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouls propose to avoid lambda here and make it in the same way as get_nan_mask_via_isna_impl
def test_series_dropna_axis1(self): | ||
'''Verifies Series.dropna() implementation handles 'index' as axis argument''' | ||
def test_impl(S): | ||
return S.dropna(axis='index') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changed original test. Please implement new one with this parameter.
No description provided.