Skip to content

Let get_dummies use meta computation in map_partitions#8898

Merged
jsignell merged 1 commit intodask:mainfrom
jsignell:get_dummies
Apr 7, 2022
Merged

Let get_dummies use meta computation in map_partitions#8898
jsignell merged 1 commit intodask:mainfrom
jsignell:get_dummies

Conversation

@jsignell
Copy link
Copy Markdown
Member

@jsignell jsignell commented Apr 7, 2022

Based on the note that I am deleting here, I suspect that this is no longer necessary. Let's see if the tests pass 🤞

# We explicitly create `meta` on `data._meta` (the empty version) to
# work around https://github.com/pandas-dev/pandas/issues/21993
package_name = data._meta.__class__.__module__.split(".")[0]
dummies = sys.modules[package_name].get_dummies
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😮 Just noticed this code, what a hack!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YEAH that is intense. I feel like it's probably not necessary but I'm not 100% sure what the best practice is. I guess maybe we should have a dispatch for this method? I don't really know how this is solved for the dataframe case. Like ideally calling pd.get_dummies on a dataframe would return a datatframe with the same type as the input but I'm not sure if that is the case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Categorizer should sort categories

2 participants