Skip to content

API: Ensure set_index always sorts the index#2290

Merged
mrocklin merged 4 commits intodask:masterfrom
TomAugspurger:non-sorted-loc
May 3, 2017
Merged

API: Ensure set_index always sorts the index#2290
mrocklin merged 4 commits intodask:masterfrom
TomAugspurger:non-sorted-loc

Conversation

@TomAugspurger
Copy link
Copy Markdown
Member

There are edge cases (like the test case here) where we find that things are sorted between partitions, but don't check that they are sorted within partitions.

This fixes the problem by just calling sort_index on each partition. There's might be a more elegant / efficient solution where we can avoid calling sort_index in some cases, but I haven't been able to find it yet.

Closes #2288

result = dd.concat([ddf, ddf.rename(columns={"B": "C"})], axis=1)
expected = pd.concat([pdf, pdf.rename(columns={"B": "C"})], axis=1)
assert_eq(result, expected)
assert not result.compute().index.is_monotonic # didn't accidentally sort
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unrelated test that I had to fix. It's now out of date, since set_index will sort.

@mrocklin
Copy link
Copy Markdown
Member

mrocklin commented May 3, 2017

+1 from me

@mrocklin mrocklin merged commit 370bd14 into dask:master May 3, 2017
@TomAugspurger TomAugspurger deleted the non-sorted-loc branch May 3, 2017 18:24
@sinhrks sinhrks added this to the 0.14.2 milestone May 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Keyerror when slicing dating

3 participants