New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask dataframe isna #3294
Dask dataframe isna #3294
Conversation
dask/dataframe/core.py
Outdated
@wraps(pd.isna) | ||
def isna(arg): | ||
meta = pd.Series([True]) | ||
return map_partitions(pd.isna, arg, meta=meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that if the input series has a name or other metadata (I'm not sure if other metadata exists) then this information will not be captured with the meta=
definition above. I suspect that you can trigger a failure in your test by adding name='foo'
to the construction of your input series (the pandas version will be named while the dask.dataframe version will not be named)
I looked at map_partitions to see how it handles metadata and it seems to be using a function dask.dataframe.core._emulate
. I'm curious, can we let it handle this and not provide meta=
as a keyword argument here? It might do the right thing by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it is working, I will try and make the test fail as well by adding the name='foo'
.
dask/dataframe/core.py
Outdated
@@ -4123,6 +4123,12 @@ def to_timedelta(arg, unit='ns', errors='raise'): | |||
meta=meta) | |||
|
|||
|
|||
@wraps(pd.isna) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this function is only in newer versions of pandas, you'll have to wrap both the definition and the import in __init__.py
in a case statement. I'd do something like:
if hasattr(pd, 'isna'):
@wraps(pd.isna)
def isna(arg):
return map_partitions(pd.isna, arg)
And in __init__.py
:
try:
from .core import isna
except ImportError:
pass
try: | ||
from .core import isna | ||
except ImportError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a newline at the end of the file (usually this is something you can configure in your editor to prevent this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have now configured my editor to automate this :). Thanks a lot for your guidance!
dask/dataframe/core.py
Outdated
@@ -4123,6 +4123,12 @@ def to_timedelta(arg, unit='ns', errors='raise'): | |||
meta=meta) | |||
|
|||
|
|||
if hasattr(pd, isna): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be if hasattr(pd, 'isna'):
(as seen in the travis logs).
In general, I recommend trying to run the tests yourself before pushing to github. We use py.test
for testing:
$ conda install pytest
# or
$ pip install pytest
$ py.test dask # run the whole test suite
$ py.test dask/dataframe/tests/test_dataframe.py -k test_isna # test just your added test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Should be fixed now.
This looks great to me. Thank you for implementing this @cr458 . A few other administrative things to clean up, hopefully these are easy.
|
The pleasure was all mine @mrocklin. Thanks for the helpful comments, I think it's fair to say the little work that was done here was actually done by you guys. Have updated the changelog now.
although I'm having a hard time seeing where the variable |
It's not part of the class. It is imported from utils.py I think
…On Fri, Mar 23, 2018 at 7:20 PM, Christopher Ren ***@***.***> wrote:
The pleasure was all mine @mrocklin <http:///mrocklin>. Thanks for the
helpful comments, I think it's fair to say the little work that was done
here was actually done by you guys. Have updated the changelog now.
Happy to implement the method in _Frame as well. Which based on the other
methods for _Frame I'm guessing it would look something like:
@derived_from(pd.DataFrame)
def isna(self):
if hasattr(pd, 'isna'):
return self.map_partitions(M.isna)
else:
raise ImportError
although I'm having a hard time seeing where the variable M is set in the
class? It seems to be used throughout various methods.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3294 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszLmHLkIPrQnzbEUCONPJavpIsJc7ks5thYNBgaJpZM4SvSkf>
.
|
@@ -2813,6 +2813,16 @@ def test_to_timedelta(): | |||
dd.to_timedelta(ds, errors='coerce')) | |||
|
|||
|
|||
@pytest.mark.skipif(PANDAS_VERSION < '0.22.0', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrocklin, should this use the hasattr
method of checking for the isna
instead? Just occurred to me that isna
could be deprecated in future versions of pandas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't worry about isna being deprecated.
@mrocklin shall I add a test to |
It's not necessary, the test you have seems fine to me. An alternative test would have been to add isna to that test instead, but what you've done here seems good to me. In general I'm pretty satisfied with this. I've merged with master and added isna to API docs. Merging. Thanks @cr458 for your work on this! Hopefully it is the first of many :) |
Top level
dd.isna
method to stay consistent with pandas. From #3239.