subs() comparing key hashes by madsbk · Pull Request #6559 · dask/dask

madsbk · 2020-08-26T12:25:20Z

This PR makes subs() compare hashes of keys instead of comparing type and equality of tuple items.

This follows the graph specification: "A key is any hashable value that is not a task", which implies that different typed keys that hashes to the same value are the same key?.

This is motivated by my work on culling where keys containing int and numpy.int are substituted.

assert hash(('mykey', 42)) == hash(('mykey', np.int64(42)))

Finally, I expect this to be slightly faster than the current implementation.

Tests added / passed
Passes black dask / flake8 dask

mrocklin · 2020-08-26T14:08:52Z

Thanks @madsbk

cc'ing @eriknw who might have thoughts on this

eriknw · 2020-08-26T14:56:47Z

I wouldn't rely only on hashes, which seems far too risky. But you're also right that relying on types may be overly restrictive.

How about len({x, y}) == 1? This checks both hash and equality (as one would expect), and should be pretty fast.

eriknw · 2020-08-26T16:36:28Z

Aha, this is even better:

x in {y}

eriknw · 2020-08-26T18:40:25Z

This follows the graph specification: "A key is any hashable value that is not a task", which implies that different typed keys that hashes to the same value are the same key?.

To clarify, "hashable" does not imply only using the hash value. Typically, it also implies using equality as well, hence why I don't think we should only use hash values.

In general, I think it's probably okay to relax the type check. This definitely needs to go in the release notes, because I wouldn't be surprised if this change breaks some code somewhere (unlikely, but possible).

madsbk · 2020-08-26T19:12:33Z

To clarify, "hashable" does not imply only using the hash value. Typically, it also implies using equality as well, hence why I don't think we should only use hash values.

Make sense and I really like x in {y} !

dask/core.py

eriknw · 2020-08-26T20:40:19Z

LGTM. I like it! Thanks @madsbk

madsbk · 2020-08-28T06:32:00Z

@mrocklin, I think this is ready to be merged.
I don't know what is going on with CI. It complains about black reformat of files I haven't touched. On my machine, both black and flake8 passes.

mrocklin · 2020-08-28T13:39:20Z

Black recently updated. I think that Tom resolved this on the Dask side with #6568 . Pushing an empty commit to trigger CI

subs(): now compare key hashes

14628a5

madsbk marked this pull request as ready for review August 26, 2020 13:09

subs(): now use hash and equality matching

e0c2756

Merge branch 'master' of github.com:dask/dask into sub_use_key_hash

9039fa8

eriknw reviewed Aug 26, 2020

View reviewed changes

dask/core.py Outdated Show resolved Hide resolved

minor cleanup

27acf33

madsbk mentioned this pull request Aug 27, 2020

Culling high level graphs #6510

Merged

trigger ci

d6df31f

mrocklin merged commit 627b2bc into dask:master Aug 28, 2020

madsbk deleted the sub_use_key_hash branch September 3, 2020 17:45

kumarprabhu1988 pushed a commit to kumarprabhu1988/dask that referenced this pull request Oct 29, 2020

Compare key hashes in subs() (dask#6559)

b7374c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

subs() comparing key hashes#6559

subs() comparing key hashes#6559
mrocklin merged 5 commits intodask:masterfrom
madsbk:sub_use_key_hash

madsbk commented Aug 26, 2020 •

edited

Loading

Uh oh!

mrocklin commented Aug 26, 2020

Uh oh!

eriknw commented Aug 26, 2020

Uh oh!

eriknw commented Aug 26, 2020

Uh oh!

eriknw commented Aug 26, 2020

Uh oh!

madsbk commented Aug 26, 2020

Uh oh!

Uh oh!

eriknw commented Aug 26, 2020

Uh oh!

madsbk commented Aug 28, 2020

Uh oh!

mrocklin commented Aug 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

madsbk commented Aug 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrocklin commented Aug 26, 2020

Uh oh!

eriknw commented Aug 26, 2020

Uh oh!

eriknw commented Aug 26, 2020

Uh oh!

eriknw commented Aug 26, 2020

Uh oh!

madsbk commented Aug 26, 2020

Uh oh!

Uh oh!

eriknw commented Aug 26, 2020

Uh oh!

madsbk commented Aug 28, 2020

Uh oh!

mrocklin commented Aug 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

madsbk commented Aug 26, 2020 •

edited

Loading