Skip to content
This repository has been archived by the owner on Feb 2, 2024. It is now read-only.

Refactor Series.dropna() to a new style w/o inplace support #239

Merged

Conversation

kozlov-alexey
Copy link
Contributor

No description provided.

----------
self: :obj:`pandas.Series`
input series
axis: {0 or `index`}, default 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type of the argument has to be defined

if isinstance(self.dtype, (types.Number, types.Boolean)):
def hpat_pandas_series_np_arrays_dropna_impl(self, axis=0, inplace=False):
# generate Series index if needed by using SeriesType.index (i.e. not self._index)
print("DEBUG: running new-style impl")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove debug traces

hpat_func = hpat.jit(test_impl)

S1 = pd.Series([1.0, 2.0, np.nan, 1.0])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to copy in tests that don't test inplace operation

hpat_func = hpat.jit(test_impl)

S1 = pd.Series([1.0, 2.0, np.nan, 1.0])
S1 = pd.Series([1.0, 2.0, np.nan, 1.0, np.inf])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use floats defined globally? e.g. f2435b0#diff-deca39d332649cea819383154a5d2cb3R39-R42

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@densmirn Sure, but in most of the tests indexes of different types have to be specified too. I added using test_global_input_data_float64 for one of the tests that uses default index: test_series_dropna_float_index1. That should be enough probably.


# TODO: handle inplace argument (currently supported with old-style impl only via assign.target)
# Find a way to assign back to self._data and return None
if isinstance(self.dtype, (types.Number, types.Boolean)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to merge if and else statements? I mean to use one of the approaches for both cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@densmirn Yes, the else branch actually handles both cases. But as in other places legacy implementation tends to provide lightweight overloads where possible. I will add new gen_nan_mask function for arrays in the next commit.


.. only:: developer

Tests: python -m hpat.runtests hpat.tests.test_series.TestSeries.test_series_dropna_axis1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to use pattern reflected all tests:
python -m hpat.runtests -k hpat.tests.test_series.TestSeries.test_series_dropna*
-k lets to run it in command line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@densmirn densmirn self-requested a review October 25, 2019 10:57

def hpat_pandas_series_dropna_impl(self, axis=0, inplace=False):
# generate Series index if needed by using SeriesType.index (i.e. not self._index)
na_data_arr = hpat.hiframes.api.get_nan_mask(self._data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to implement or get the function which can return self._data == numpy.nan and use it in a loop here.
It will solve issue with index generation. Also, this function will be very helpful in other series methods

if isinstance(arr, types.Array):
dtype = arr.dtype
if isinstance(dtype, types.Float):
return lambda arr: np.isnan(arr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouls propose to avoid lambda here and make it in the same way as get_nan_mask_via_isna_impl

def test_series_dropna_axis1(self):
'''Verifies Series.dropna() implementation handles 'index' as axis argument'''
def test_impl(S):
return S.dropna(axis='index')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changed original test. Please implement new one with this parameter.

@shssf shssf merged commit d697beb into IntelPython:master Oct 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants