Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask dataframe isna #3294

Merged
merged 16 commits into from Mar 25, 2018
4 changes: 4 additions & 0 deletions dask/dataframe/__init__.py
Expand Up @@ -16,3 +16,7 @@
from .io import read_parquet, to_parquet
except ImportError:
pass
try:
from .core import isna
except ImportError:
pass
6 changes: 6 additions & 0 deletions dask/dataframe/core.py
Expand Up @@ -4144,6 +4144,12 @@ def to_timedelta(arg, unit='ns', errors='raise'):
meta=meta)


if hasattr(pd, 'isna'):
@wraps(pd.isna)
def isna(arg):
return map_partitions(pd.isna, arg)


def _repr_data_series(s, index):
"""A helper for creating the ``_repr_data`` property"""
npartitions = len(index) - 1
Expand Down
10 changes: 10 additions & 0 deletions dask/dataframe/tests/test_dataframe.py
Expand Up @@ -2813,6 +2813,16 @@ def test_to_timedelta():
dd.to_timedelta(ds, errors='coerce'))


@pytest.mark.skipif(PANDAS_VERSION < '0.22.0',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrocklin, should this use the hasattr method of checking for the isna instead? Just occurred to me that isna could be deprecated in future versions of pandas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't worry about isna being deprecated.

reason="No isna method")
@pytest.mark.parametrize('values', [[np.NaN, 0], [1, 1]])
def test_isna(values):
s = pd.Series(values)
ds = dd.from_pandas(s, npartitions=2)

assert_eq(pd.isna(s), dd.isna(ds))


@pytest.mark.parametrize('drop', [0, 9])
def test_slice_on_filtered_boundary(drop):
# https://github.com/dask/dask/issues/2211
Expand Down
7 changes: 4 additions & 3 deletions docs/source/changelog.rst
Expand Up @@ -2,7 +2,7 @@ Changelog
=========


0.XX.X / 2018-MM-DD
0.17.3 / 2018-MM-DD
-------------------

Array
Expand All @@ -13,7 +13,7 @@ Array
DataFrame
+++++++++

-
- Added top level `isna` method for Dask DataFrames (:pr:`3294`) `Christopher Ren`_

Bag
+++
Expand All @@ -40,7 +40,7 @@ Array
DataFrame
+++++++++

- Fixed bug in shuffle due to aggressive truncation (:pr:`3201`) `Matthew Rocklin`_
- Fixed bug in shuffle due to aggressive truncation (:pr:`3201`) `Matthew Rocklin`_
- Support specifying categorical columns on ``read_parquet`` with ``categories=[…]`` for ``engine="pyarrow"`` (:pr:`3177`) `Uwe Korn`_
- Add ``dd.tseries.Resampler.agg`` (:pr:`3202`) `Richard Postelnik`_
- Support operations that mix dataframes and arrays (:pr:`3230`) `Matthew Rocklin`_
Expand Down Expand Up @@ -1043,3 +1043,4 @@ Other
.. _`Richard Postelnik`: https://github.com/postelrich
.. _`Daniel Collins`: https://github.com/dancollins34
.. _`Gabriele Lanaro`: https://github.com/gabrielelanaro
.. _`Christopher Ren`: https://github.com/cr458