Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DataFrame.map to dask.DataFrame API #10246

Merged
merged 8 commits into from May 3, 2023
Merged

Conversation

phofl
Copy link
Collaborator

@phofl phofl commented May 2, 2023

  • Closes #xxxx
  • Tests added / passed
  • Passes pre-commit run --all-files

@phofl phofl added the upstream label May 2, 2023
@phofl phofl closed this May 2, 2023
@phofl phofl reopened this May 2, 2023
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for handling this @phofl. Overall looks good -- just left a couple of small suggestions

@@ -5932,6 +5932,10 @@ def apply(
def applymap(self, func, meta="__no_default__"):
return elemwise(M.applymap, self, func, meta=meta)

@derived_from(pd.DataFrame)
def map(self, func, meta="__no_default__"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the no_default variable here instead. If you could update applymap above as well, that would be great (though I realize this pre-dates this PR)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Comment on lines 3242 to 3244
with pytest.raises(
NotImplementedError, match="Base package doesn't support 'map'."
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming from when we try to get the pd.DataFrame.map docstring in the @derived_from(pd.DataFrame) decorator. It's nice that we get a NotImplementedError error, but something along the lines of (I've not tested / linted the snippet below)

if not PANDAS_GT_210:
    raise NotImplementedError(
        "DataFrame.map requires pandas>=2.1.0, but pandas=X.Y.Z is installed."
    )

would, I think, be a bit more clear for users and devs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I had this initially :)

Changed, and removed the derived_from.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @phofl! Just fixed a merge conflict + included a small fixup -- this looks good to go now 👍 Will merge when CI finishes

@jrbourbeau jrbourbeau mentioned this pull request May 3, 2023
@jrbourbeau jrbourbeau merged commit ee65166 into dask:main May 3, 2023
26 of 28 checks passed
@phofl phofl deleted the add_dataframe_map branch May 3, 2023 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants