Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] MultiIndex #8153

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

[WIP] MultiIndex #8153

wants to merge 2 commits into from

Conversation

jsignell
Copy link
Member

This first commit is pulled from @TomAugspurger's original branch: TomAugspurger@0e741e1

My plan is to try to keep moving forward with that work and raise NotImplemented all over the place.

@github-actions github-actions bot added dataframe dispatch Related to `Dispatch` extension objects io labels Sep 15, 2021
@jsignell jsignell marked this pull request as draft September 15, 2021 20:38
Comment on lines +450 to +455
def _collapse(partition):
return pd.Series(
list(partition.itertuples(index=False, name=None)),
index=partition.index,
name=tuple(partition.columns),
)
Copy link
Member

@rjzamora rjzamora Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooo - It may make sense to precede this PR with a simpler PR to support multi-column sort_values using this trick.

@charlesbluca - Note that this approach is not as performant as direct DataFrame.quantiles/DataFrame.searchsorted support in pandas, but it should "unblock" multi-column sorting :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this looks nice - thanks for the heads up! I can start up a WIP using this in sort_values

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool yeah! I expect this to take a while to work out. There are still some open questions about how things should behave. So anything that can come out of here and be useful is great!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesbluca - I started exploring this a bit in this branch (couldn't help myself). It is quite slow compared to 0th column partitioning, but does seem to work for cases where multiple columns are required for sufficient repartitioning.

@github-actions github-actions bot added the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Nov 1, 2021
@jsignell jsignell removed the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataframe dispatch Related to `Dispatch` extension objects io
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Full support for multiindex in dataframes
4 participants