Avoid NumPy scalar string representation in tokenize by jrbourbeau · Pull Request #5527 · dask/dask

jrbourbeau · 2019-10-23T18:03:56Z

This PR updates how tokenize operates on NumPy scalars to avoid using array string representations.

Since NumPy allows users to customize how arrays are printed, the current tokenize behavior can lead to non-deterministic hashes for NumPy scalars in some edge cases. Instead, here we use x.item() to convert NumPy scalars to Python scalars which are then used in tokenize. This allows us to cover edge cases where str(x) has been modified without a performance degradation. With the changes in this PR we have:

In [1]: import numpy as np

In [2]: import dask

In [3]: %timeit dask.base.tokenize(np.array(1.23))
13 µs ± 273 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

and on the current master branch:

In [3]: %timeit dask.base.tokenize(np.array(1.23))
16.2 µs ± 372 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cc @jcrist if you get a moment to take a look at this or have any thoughts on the topic

Tests added / passed
Passes black dask / flake8 dask

jcrist · 2019-10-23T18:21:16Z

This makes sense to me. Didn't know numpy supported redefining printers.

dask/tests/test_base.py

jrbourbeau · 2019-10-23T21:07:33Z

Thanks for reviewing @jcrist @TomAugspurger

jrbourbeau added 2 commits October 23, 2019 12:46

Use item in numpy scalar tokenization

b328157

Add test

078d3e4

TomAugspurger reviewed Oct 23, 2019

View reviewed changes

dask/tests/test_base.py Outdated Show resolved Hide resolved

Ensure string printing is always reset

45ee81b

TomAugspurger approved these changes Oct 23, 2019

View reviewed changes

jrbourbeau merged commit 7aca451 into dask:master Oct 23, 2019

jrbourbeau deleted the numpy-scalar-hashing branch October 23, 2019 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid NumPy scalar string representation in tokenize#5527

Avoid NumPy scalar string representation in tokenize#5527
jrbourbeau merged 3 commits intodask:masterfrom
Quansight-Labs:numpy-scalar-hashing

jrbourbeau commented Oct 23, 2019 •

edited

Loading

Uh oh!

jcrist commented Oct 23, 2019

Uh oh!

Uh oh!

jrbourbeau commented Oct 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jrbourbeau commented Oct 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcrist commented Oct 23, 2019

Uh oh!

Uh oh!

jrbourbeau commented Oct 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jrbourbeau commented Oct 23, 2019 •

edited

Loading