Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid overflow in statitics.mean #7426

Merged
merged 1 commit into from Dec 20, 2022
Merged

Conversation

mrocklin
Copy link
Member

I don't know why, but for some reason statistics.mean was overflowing in CI. See https://github.com/dask/distributed/actions/runs/3741526593/jobs/6351258185

I'm trying a naive implementation instead. It also happens to be faster and simpler.

In [1]: from statistics import mean

In [2]: x = list(range(1000))

In [3]: %timeit mean(x)
196 µs ± 777 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [4]: %timeit sum(x) / len(x)
4.82 µs ± 15.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

I don't know why, but for some reason statistics.mean was overflowing in
CI.  See https://github.com/dask/distributed/actions/runs/3741526593/jobs/6351258185

I'm trying a naive implementation instead.  It also happens to be faster
and simpler.

```python
In [1]: from statistics import mean

In [2]: x = list(range(1000))

In [3]: %timeit mean(x)
196 µs ± 777 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [4]: %timeit sum(x) / len(x)
4.82 µs ± 15.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
```
@github-actions
Copy link
Contributor

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       20 files  +         8         20 suites  +8   9h 0m 52s ⏱️ + 4h 41m 15s
  3 271 tests ±         0    3 184 ✔️ +         9       85 💤  -   10  2 +1 
33 337 runs  +13 057  31 907 ✔️ +12 397  1 426 💤 +657  4 +3 

For more details on these failures, see this check.

Results for commit 4e90750. ± Comparison against base commit 3ac8631.

@mrocklin
Copy link
Member Author

This is trivial enough and coming up enough in current PRs' CI that I plan to merge tomorrow US-time if there are no comments.

@jrbourbeau jrbourbeau changed the title Avoid overflow in statitics.mean Avoid overflow in statitics.mean Dec 20, 2022
@jrbourbeau jrbourbeau merged commit c21e715 into dask:main Dec 20, 2022
@mrocklin mrocklin deleted the dashboard-mean branch December 20, 2022 20:01
@mrocklin
Copy link
Member Author

Woot. Thanks

@fjetter
Copy link
Member

fjetter commented Jan 3, 2023

This is still an issue but now in the new code...

  File "d:\a\distributed\distributed\distributed\dashboard\components\shared.py", line 569, in update
    self.label_source.data["memory"] = [
  File "d:\a\distributed\distributed\distributed\dashboard\components\shared.py", line 571, in <listcomp>
    f.__name__, dask.utils.format_bytes(f(self.source.data["memory"]))
  File "d:\a\distributed\distributed\distributed\dashboard\components\shared.py", line 562, in mean
    return sum(x) / len(x)

Looks like x is sometimes a numpy array and I assume we're overflowing int64??? (or we're using a different dtype somewhere, int64 overflow sounds crazy even if not impossible)

image

(the above screenshot is not reproducing, just a snapshot showing the data, I believe this is RSS memory, and the dtype)

@fjetter
Copy link
Member

fjetter commented Jan 3, 2023

At the very least, this is giving exactly the same warning

In [1]: import numpy as np

In [2]: sum(np.array([2**63-1, 1], dtype=np.int64))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants