Dask dataframe isna #3294

cr458 · 2018-03-18T17:30:51Z

Top level dd.isna method to stay consistent with pandas. From #3239.

mrocklin · 2018-03-19T13:10:24Z

dask/dataframe/core.py

+@wraps(pd.isna)
+def isna(arg):
+    meta = pd.Series([True])
+    return map_partitions(pd.isna, arg, meta=meta)


I suspect that if the input series has a name or other metadata (I'm not sure if other metadata exists) then this information will not be captured with the meta= definition above. I suspect that you can trigger a failure in your test by adding name='foo' to the construction of your input series (the pandas version will be named while the dask.dataframe version will not be named)

I looked at map_partitions to see how it handles metadata and it seems to be using a function dask.dataframe.core._emulate. I'm curious, can we let it handle this and not provide meta= as a keyword argument here? It might do the right thing by default.

Seems like it is working, I will try and make the test fail as well by adding the name='foo'.

jcrist · 2018-03-19T13:15:13Z

dask/dataframe/core.py

@@ -4123,6 +4123,12 @@ def to_timedelta(arg, unit='ns', errors='raise'):
                          meta=meta)


+@wraps(pd.isna)


Since this function is only in newer versions of pandas, you'll have to wrap both the definition and the import in __init__.py in a case statement. I'd do something like:

if hasattr(pd, 'isna'): @wraps(pd.isna) def isna(arg): return map_partitions(pd.isna, arg)

And in __init__.py:

try: from .core import isna except ImportError: pass

jcrist · 2018-03-19T19:14:02Z

dask/dataframe/__init__.py

+try:
+    from .core import isna
+except ImportError:
+    pass


Missing a newline at the end of the file (usually this is something you can configure in your editor to prevent this).

Have now configured my editor to automate this :). Thanks a lot for your guidance!

jcrist · 2018-03-20T05:13:18Z

dask/dataframe/core.py

@@ -4123,6 +4123,12 @@ def to_timedelta(arg, unit='ns', errors='raise'):
                          meta=meta)


+if hasattr(pd, isna):


This should be if hasattr(pd, 'isna'): (as seen in the travis logs).

In general, I recommend trying to run the tests yourself before pushing to github. We use py.test for testing:

$ conda install pytest # or $ pip install pytest $ py.test dask # run the whole test suite $ py.test dask/dataframe/tests/test_dataframe.py -k test_isna # test just your added test

Thanks! Should be fixed now.

mrocklin · 2018-03-22T22:16:03Z

This looks great to me. Thank you for implementing this @cr458 . A few other administrative things to clean up, hopefully these are easy.

Can you add a small note to the changelog in docs/source/changelog.rst. Hopefully the pattern there should be clear enough (although this may be the first change in the new version, so you may have to look down a bit for examples to copy). Don't forget to add your name and a website to the bottom of that document
Should we add this as a method on _Frame as well so that both DataFrame and Series get this method? The version issue might come up. I recommend that we check the version within the method and raise an informative error if appropriate.
After that can you add isna to the DataFrame API docs in docs/source/dataframe-api.rst?

cr458 · 2018-03-23T23:20:32Z

The pleasure was all mine @mrocklin. Thanks for the helpful comments, I think it's fair to say the little work that was done here was actually done by you guys. Have updated the changelog now.
Happy to implement the method in _Frame as well. Which based on the other methods for _Frame I'm guessing it would look something like:

    @derived_from(pd.DataFrame)
    def isna(self):
        if hasattr(pd, 'isna'):
            return self.map_partitions(M.isna)
        else:
            raise ImportError

although I'm having a hard time seeing where the variable M is set in the class? It seems to be used throughout various methods.

mrocklin · 2018-03-24T02:26:00Z

It's not part of the class. It is imported from utils.py I think

…

On Fri, Mar 23, 2018 at 7:20 PM, Christopher Ren ***@***.***> wrote: The pleasure was all mine @mrocklin <http:///mrocklin>. Thanks for the helpful comments, I think it's fair to say the little work that was done here was actually done by you guys. Have updated the changelog now. Happy to implement the method in _Frame as well. Which based on the other methods for _Frame I'm guessing it would look something like: @derived_from(pd.DataFrame) def isna(self): if hasattr(pd, 'isna'): return self.map_partitions(M.isna) else: raise ImportError although I'm having a hard time seeing where the variable M is set in the class? It seems to be used throughout various methods. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3294 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszLmHLkIPrQnzbEUCONPJavpIsJc7ks5thYNBgaJpZM4SvSkf> .

cr458 · 2018-03-25T12:03:37Z

dask/dataframe/tests/test_dataframe.py

@@ -2813,6 +2813,16 @@ def test_to_timedelta():
              dd.to_timedelta(ds, errors='coerce'))


+@pytest.mark.skipif(PANDAS_VERSION < '0.22.0',


@mrocklin, should this use the hasattr method of checking for the isna instead? Just occurred to me that isna could be deprecated in future versions of pandas.

I wouldn't worry about isna being deprecated.

cr458 · 2018-03-25T12:25:28Z

@mrocklin shall I add a test to test_embarrassingly_parallel_operations in test_dataframe.py for isna as well?

mrocklin · 2018-03-25T18:39:55Z

It's not necessary, the test you have seems fine to me. An alternative test would have been to add isna to that test instead, but what you've done here seems good to me.

In general I'm pretty satisfied with this. I've merged with master and added isna to API docs. Merging.

Thanks @cr458 for your work on this! Hopefully it is the first of many :)

cr458 added 6 commits March 18, 2018 16:40

added top level isna method to dask.dataframe

a3fc478

test for isna

7f11231

whitespace

4f3df24

changed init file to import isna method

214d5a5

version dependent test

48ea054

skip test if version < 0.22.0

a69e797

mrocklin reviewed Mar 19, 2018

View reviewed changes

jcrist reviewed Mar 19, 2018

View reviewed changes

cr458 added 2 commits March 19, 2018 19:00

try import for earlier versions of pandas

0de127a

line

003ff99

jcrist reviewed Mar 19, 2018

View reviewed changes

newline

fb973c0

jcrist reviewed Mar 20, 2018

View reviewed changes

fixed hasattr call

58a04b7

mrocklin changed the title ~~[WIP] Dask dataframe isna~~ Dask dataframe isna Mar 22, 2018

cr458 added 2 commits March 23, 2018 22:55

Merge branch 'master' of github.com:cr458/dask into dask_dataframe_isna

0b366e2

update changelog

f9617ef

cr458 commented Mar 25, 2018

View reviewed changes

isna method for _Frame

2f71e0a

mrocklin added 3 commits March 25, 2018 12:35

Merge branch 'master' into dask_dataframe_isna

7e34482

flake8

c0fc19f

[skip ci] add isna to dataframe docs

0fcc5f5

mrocklin merged commit 907434b into dask:master Mar 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dask dataframe isna #3294

Dask dataframe isna #3294

cr458 commented Mar 18, 2018 •

edited

mrocklin Mar 19, 2018

cr458 Mar 22, 2018

jcrist Mar 19, 2018

jcrist Mar 19, 2018

cr458 Mar 19, 2018

jcrist Mar 20, 2018

cr458 Mar 22, 2018

mrocklin commented Mar 22, 2018

cr458 commented Mar 23, 2018

mrocklin commented Mar 24, 2018 via email

cr458 Mar 25, 2018

TomAugspurger Mar 25, 2018

cr458 commented Mar 25, 2018

mrocklin commented Mar 25, 2018

		@@ -4123,6 +4123,12 @@ def to_timedelta(arg, unit='ns', errors='raise'):
		meta=meta)


		@wraps(pd.isna)

		@@ -4123,6 +4123,12 @@ def to_timedelta(arg, unit='ns', errors='raise'):
		meta=meta)


		if hasattr(pd, isna):

		@@ -2813,6 +2813,16 @@ def test_to_timedelta():
		dd.to_timedelta(ds, errors='coerce'))


		@pytest.mark.skipif(PANDAS_VERSION < '0.22.0',

Dask dataframe isna #3294

Dask dataframe isna #3294

Conversation

cr458 commented Mar 18, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrocklin commented Mar 22, 2018

cr458 commented Mar 23, 2018

mrocklin commented Mar 24, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cr458 commented Mar 25, 2018

mrocklin commented Mar 25, 2018

cr458 commented Mar 18, 2018 •

edited