Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P2P cannot execute a shuffle twice #7452

Closed
hendrikmakait opened this issue Jan 4, 2023 · 0 comments · Fixed by #7486
Closed

P2P cannot execute a shuffle twice #7452

hendrikmakait opened this issue Jan 4, 2023 · 0 comments · Fixed by #7486
Assignees
Labels
enhancement Improve existing functionality or make things work better shuffle

Comments

@hendrikmakait
Copy link
Member

The tombstones prohibit us from executing a shuffle twice even if it succeeded the first time and has since been forgotten. See

@pytest.mark.xfail(reason="Tombstone prohibits multiple calls to head")
@gen_cluster(client=True, nthreads=[("127.0.0.1", 4)] * 2)
async def test_repeat(c, s, a, b):
df = dask.datasets.timeseries(
start="2000-01-01",
end="2000-01-10",
dtypes={"x": float, "y": float},
freq="100 s",
)
out = dd.shuffle.shuffle(df, "x", shuffle="p2p")
await c.compute(out.head(compute=False))
await clean_worker(a, timeout=2)
await clean_worker(b, timeout=2)
await clean_scheduler(s, timeout=2)
await c.compute(out.tail(compute=False))
await clean_worker(a, timeout=2)
await clean_worker(b, timeout=2)
await clean_scheduler(s, timeout=2)
await c.compute(out.head(compute=False))
await clean_worker(a, timeout=2)
await clean_worker(b, timeout=2)
await clean_scheduler(s, timeout=2)
for a reproducer.

A user should be able to execute a successful shuffle a second time after it has been forgotten. This is particularly important for interactive workloads. A possible solution may be to signed attempts for tasks (#7272) to differentiate between different attempts of a shuffle and limit tombstones to a specific attempt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improve existing functionality or make things work better shuffle
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant