Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise in P2P if column dtype is wrong #8167

Merged

Conversation

hendrikmakait
Copy link
Member

@hendrikmakait hendrikmakait commented Sep 6, 2023

  • Tests added / passed
  • Passes pre-commit run --all-files

Comment on lines 2182 to 2205
def test_set_index_p2p_with_existing_index():
df = pd.DataFrame({"a": np.random.randint(0, 3, 20)}, index=np.random.random(20))
ddf = dd.from_pandas(
df,
npartitions=4,
)
with Client() as c:
with pytest.raises(TypeError, match="_partitions.*integer"):
ddf.set_index("a", shuffle="p2p")


def test_sort_values_p2p_with_existing_divisions():
"Regression test for #8165"
df = pd.DataFrame(
{"a": np.random.randint(0, 3, 20), "b": np.random.randint(0, 3, 20)}
)
ddf = dd.from_pandas(
df,
npartitions=4,
)
with Client() as c:
with dask.config.set({"dataframe.shuffle.method": "p2p"}):
with pytest.raises(TypeError, match="_partitions.*integer"):
ddf = ddf.set_index("a").sort_values("b")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two tests need to be updated once dask/dask#10493 is merged.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       21 files  ±  0         21 suites  ±0   10h 40m 28s ⏱️ - 7m 53s
  3 798 tests +  3    3 690 ✔️ +  4     107 💤 ±0  1  - 1 
36 699 runs  +24  34 894 ✔️ +24  1 803 💤 ±0  2 ±0 

For more details on these failures, see this check.

Results for commit 6d3a76a. ± Comparison against base commit 20def28.

♻️ This comment has been updated with latest results.

from dask.dataframe.core import new_dd_object

meta = df._meta
if not pd.api.types.is_integer_dtype(meta[column]):
raise TypeError(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be tested before dask/dask#10493 has been merged.

Copy link
Collaborator

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple comments

distributed/shuffle/_shuffle.py Outdated Show resolved Hide resolved
with LocalCluster(
n_workers=2, dashboard_address=":0", loop=loop
) as cluster, Client(cluster) as c:
ddf.set_index("a", shuffle="p2p")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you not using the result of this op?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoopsie, changed this test too many times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

hendrikmakait and others added 2 commits September 6, 2023 15:52
Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>
@hendrikmakait hendrikmakait merged commit 3dd8691 into dask:main Sep 6, 2023
24 of 28 checks passed
@hendrikmakait hendrikmakait deleted the p2p-raise-on-column-dtype-errror branch September 6, 2023 15:57
@hendrikmakait hendrikmakait mentioned this pull request Sep 6, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants