Properly support dd.Index in setitem #5703

TomAugspurger · 2019-12-11T17:16:07Z

We don't want to align these. It has the same semantics as setting a dask array: we validate that the number of blocks match the number of partitions, and assume that the size of each block / partition matches.

This is a bit light on tests at the moment for invalid input (since we're re-using the dask.array implementation).

Closes #5702

cc @AlbertDeFusco

We don't want to align these.

TomAugspurger · 2019-12-11T22:10:54Z

I updated assert_eq a bit. We're generating key-names that are slightly too long for assert_sane_keynames (101, and the cutoff is 102).

I updated assert_eq to be able to control what it considers too long. Alternatively, I could debug why we're making longer key names.

mrocklin · 2019-12-11T22:33:19Z

Alternatively, I could debug why we're making longer key names.

This sounds like a good idea to me if you have the time. No expectations though if the cost/benefit ratio isn't high enough.

jrbourbeau

Just a couple of nitpicks

dask/dataframe/core.py

dask/dataframe/io/io.py

Co-Authored-By: James Bourbeau <jrbourbeau@users.noreply.github.com>

TomAugspurger · 2019-12-12T12:33:36Z

This sounds like a good idea to me if you have the time. No expectations though if the cost/benefit ratio isn't high enough.

The short version is we have task names like {new-key}-{new-token}-index-{old-token}

from_pandas-8595dbb3832b7f73fae91c38aae7c616-index-da2e17dacb52404ea6cc043a671e8659

but I haven't figured out why yet, or what the old names were.

mrocklin · 2019-12-12T15:20:16Z

We should probably just include the old token (or the old dataframe) within the tokenize call when we generate the new token. Having multiple tokens lying around is probably a bad idea.

TomAugspurger added 2 commits December 11, 2019 11:13

Properly support dd.Index in setitem

2f80be4

We don't want to align these.

update tests

24a7ca0

jrbourbeau reviewed Dec 11, 2019

View reviewed changes

dask/dataframe/core.py Outdated Show resolved Hide resolved

dask/dataframe/io/io.py Outdated Show resolved Hide resolved

TomAugspurger and others added 2 commits December 12, 2019 06:31

Update dask/dataframe/core.py

797b4d5

Co-Authored-By: James Bourbeau <jrbourbeau@users.noreply.github.com>

Update dask/dataframe/io/io.py

d9f55bf

Co-Authored-By: James Bourbeau <jrbourbeau@users.noreply.github.com>

Base automatically changed from master to main March 8, 2021 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly support dd.Index in setitem #5703

Properly support dd.Index in setitem #5703

TomAugspurger commented Dec 11, 2019 •

edited

TomAugspurger commented Dec 11, 2019

mrocklin commented Dec 11, 2019

jrbourbeau left a comment

TomAugspurger commented Dec 12, 2019

mrocklin commented Dec 12, 2019

Properly support dd.Index in setitem #5703

Are you sure you want to change the base?

Properly support dd.Index in setitem #5703

Conversation

TomAugspurger commented Dec 11, 2019 • edited

TomAugspurger commented Dec 11, 2019

mrocklin commented Dec 11, 2019

jrbourbeau left a comment

Choose a reason for hiding this comment

TomAugspurger commented Dec 12, 2019

mrocklin commented Dec 12, 2019

TomAugspurger commented Dec 11, 2019 •

edited