Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

32bit test suite: pandas 1.3.3 "Buffer dtype mismatch" #8169

Closed
bnavigator opened this issue Sep 22, 2021 · 5 comments · Fixed by #8851
Closed

32bit test suite: pandas 1.3.3 "Buffer dtype mismatch" #8169

bnavigator opened this issue Sep 22, 2021 · 5 comments · Fixed by #8851
Labels
needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer.

Comments

@bnavigator
Copy link
Contributor

What happened:

When running the test suite in a 32-bit environment on the openSUSE buildservice, I get the following error

[  672s] =================================== FAILURES ===================================
[  672s] ______________________ test_categorical_set_index[tasks] _______________________
[  672s] [gw1] linux -- Python 3.9.7 /usr/bin/python3.9
[  672s] 
[  672s] shuffle = 'tasks'
[  672s] 
[  672s]     @pytest.mark.parametrize("shuffle", ["disk", "tasks"])
[  672s]     def test_categorical_set_index(shuffle):
[  672s]         df = pd.DataFrame({"x": [1, 2, 3, 4], "y": ["a", "b", "b", "c"]})
[  672s]         df["y"] = pd.Categorical(df["y"], categories=["a", "b", "c"], ordered=True)
[  672s]         a = dd.from_pandas(df, npartitions=2)
[  672s]     
[  672s]         with dask.config.set(scheduler="sync", shuffle=shuffle):
[  672s]             b = a.set_index("y", npartitions=a.npartitions)
[  672s]             d1, d2 = b.get_partition(0), b.get_partition(1)
[  672s] >           assert list(d1.index.compute()) == ["a"]
[  672s] 
[  672s] /usr/lib/python3.9/site-packages/dask/dataframe/tests/test_categorical.py:274: 
[  672s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[  672s] /usr/lib/python3.9/site-packages/dask/base.py:288: in compute
[  672s]     (result,) = compute(self, traverse=False, **kwargs)
[  672s] /usr/lib/python3.9/site-packages/dask/base.py:570: in compute
[  672s]     results = schedule(dsk, keys, **kwargs)
[  672s] /usr/lib/python3.9/site-packages/dask/local.py:563: in get_sync
[  672s]     return get_async(
[  672s] /usr/lib/python3.9/site-packages/dask/local.py:506: in get_async
[  672s]     for key, res_info, failed in queue_get(queue).result():
[  672s] /usr/lib/python3.9/concurrent/futures/_base.py:438: in result
[  672s]     return self.__get_result()
[  672s] /usr/lib/python3.9/concurrent/futures/_base.py:390: in __get_result
[  672s]     raise self._exception
[  672s] /usr/lib/python3.9/site-packages/dask/local.py:548: in submit
[  672s]     fut.set_result(fn(*args, **kwargs))
[  672s] /usr/lib/python3.9/site-packages/dask/local.py:237: in batch_execute_tasks
[  672s]     return [execute_task(*a) for a in it]
[  672s] /usr/lib/python3.9/site-packages/dask/local.py:237: in <listcomp>
[  672s]     return [execute_task(*a) for a in it]
[  672s] /usr/lib/python3.9/site-packages/dask/local.py:228: in execute_task
[  672s]     result = pack_exception(e, dumps)
[  672s] /usr/lib/python3.9/site-packages/dask/local.py:223: in execute_task
[  672s]     result = _execute_task(task, data)
[  672s] /usr/lib/python3.9/site-packages/dask/core.py:121: in _execute_task
[  672s]     return func(*(_execute_task(a, cache) for a in args))
[  672s] /usr/lib/python3.9/site-packages/dask/dataframe/shuffle.py:861: in shuffle_group
[  672s]     return group_split_dispatch(df, c, k, ignore_index=ignore_index)
[  672s] /usr/lib/python3.9/site-packages/dask/utils.py:576: in __call__
[  672s]     return meth(arg, *args, **kwargs)
[  672s] /usr/lib/python3.9/site-packages/dask/dataframe/backends.py:358: in group_split_pandas
[  672s]     indexer, locations = pd._libs.algos.groupsort_indexer(
[  672s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[  672s] 
[  672s] >   ???
[  672s] E   ValueError: Buffer dtype mismatch, expected 'const intp_t' but got 'long long'
[  672s] 
[  672s] pandas/_libs/algos.pyx:194: ValueError

If I didn't overlook something, it is the same error for 187 tests:

[  681s] =========================== short test summary info ============================
[  681s] FAILED dataframe/tests/test_categorical.py::test_categorical_set_index[tasks]
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[False-False-1-1] - ...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[False-False-1-4] - ...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[False-False-3-1] - ...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[False-False-3-4] - ...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[False-True-1-1] - V...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[False-True-1-4] - V...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[False-True-3-1] - V...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[False-True-3-4] - V...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[True-False-1-1] - V...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[True-False-1-4] - V...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[True-False-3-1] - V...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[True-False-3-4] - V...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_value_counts_with_normalize - ...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[True-True-1-1] - Va...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_value_counts_with_normalize_and_dropna
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[True-True-1-4] - Va...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[True-True-3-1] - Va...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_series_map[True-True-3-4] - Va...
[  681s] FAILED dataframe/tests/test_groupby.py::test_groupby_reduction_split[split_out]
[  681s] FAILED dataframe/tests/test_groupby.py::test_groupby_apply_tasks - ValueError...
[  681s] FAILED dataframe/tests/test_groupby.py::test_groupby_split_out_multiindex[column0-2]
[  681s] FAILED dataframe/tests/test_groupby.py::test_groupby_split_out_multiindex[column0-3]
[  681s] FAILED dataframe/tests/test_groupby.py::test_groupby_split_out_multiindex[column1-2]
[  681s] FAILED dataframe/tests/test_groupby.py::test_groupby_split_out_multiindex[column1-3]
[  681s] FAILED dataframe/tests/test_groupby.py::test_groupby_split_out_multiindex[column2-2]
[  681s] FAILED dataframe/tests/test_groupby.py::test_groupby_split_out_multiindex[column2-3]
[  681s] FAILED dataframe/tests/test_groupby.py::test_hash_groupby_aggregate[5-2-1] - ...
[  681s] FAILED dataframe/tests/test_groupby.py::test_hash_groupby_aggregate[5-2-4] - ...
[  681s] FAILED dataframe/tests/test_groupby.py::test_hash_groupby_aggregate[5-2-20]
[  681s] FAILED dataframe/tests/test_groupby.py::test_hash_groupby_aggregate[5-5-1] - ...
[  681s] FAILED dataframe/tests/test_groupby.py::test_hash_groupby_aggregate[5-5-4] - ...
[  681s] FAILED dataframe/tests/test_groupby.py::test_hash_groupby_aggregate[5-5-20]
[  681s] FAILED dataframe/tests/test_groupby.py::test_hash_groupby_aggregate[20-2-1]
[  681s] FAILED dataframe/tests/test_groupby.py::test_hash_groupby_aggregate[20-2-4]
[  681s] FAILED dataframe/tests/test_groupby.py::test_hash_groupby_aggregate[20-2-20]
[  681s] FAILED dataframe/tests/test_groupby.py::test_hash_groupby_aggregate[20-5-1]
[  681s] FAILED dataframe/tests/test_groupby.py::test_hash_groupby_aggregate[20-5-4]
[  681s] FAILED dataframe/tests/test_groupby.py::test_hash_groupby_aggregate[20-5-20]
[  681s] FAILED dataframe/tests/test_groupby.py::test_split_out_multi_column_groupby
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[idx-inner]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[idx-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[idx-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[idx-outer]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[on1-inner]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[on1-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[on1-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[on1-outer]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[on2-inner]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[on2-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[on2-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[on2-outer]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[on3-inner]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[on3-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[on3-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_unknown[on3-outer]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[idx-inner]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[idx-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[idx-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[idx-outer]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[on1-inner]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[on1-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[on1-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[on2-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[on1-outer]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[on2-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[on2-inner]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[on2-outer]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[on2-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[on3-inner]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[on3-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[on2-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[on3-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[on2-outer]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[on3-outer]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[on3-inner]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_double_bcast_right[idx-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[on3-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[on3-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_double_bcast_right[on1-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_known[on3-outer]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[idx-inner]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[idx-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_double_bcast_right[on2-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[idx-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_double_bcast_right[on3-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[idx-outer]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_double_bcast_left[idx-True-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[on1-inner]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[on1-left]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_double_bcast_left[idx-0.75-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[on1-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_double_bcast_left[on1-True-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[on1-outer]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_double_bcast_left[on1-0.75-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_unknown_to_unknown[on2-inner]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_double_bcast_left[on2-True-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_double_bcast_left[on2-0.75-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_double_bcast_left[on3-True-right]
[  681s] FAILED dataframe/tests/test_merge_column_and_index.py::test_merge_known_to_double_bcast_left[on3-0.75-right]
[  681s] FAILED dataframe/tests/test_multi.py::test_hash_join[tasks-inner] - ValueErro...
[  681s] FAILED dataframe/tests/test_multi.py::test_hash_join[tasks-left] - ValueError...
[  681s] FAILED dataframe/tests/test_multi.py::test_hash_join[tasks-right] - ValueErro...
[  681s] FAILED dataframe/tests/test_multi.py::test_hash_join[tasks-outer] - ValueErro...
[  681s] FAILED dataframe/tests/test_multi.py::test_merge[tasks-inner] - ValueError: B...
[  681s] FAILED dataframe/tests/test_multi.py::test_merge[tasks-outer] - ValueError: B...
[  681s] FAILED dataframe/tests/test_multi.py::test_merge[tasks-left] - ValueError: Bu...
[  681s] FAILED dataframe/tests/test_multi.py::test_merge[tasks-right] - ValueError: B...
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_by_index_patterns[inner-tasks]
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_by_index_patterns[outer-tasks]
[  681s] FAILED tests/test_distributed.py::test_fused_blockwise_dataframe_merge[True]
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_by_index_patterns[left-tasks]
[  681s] FAILED tests/test_distributed.py::test_fused_blockwise_dataframe_merge[False]
[  681s] FAILED dataframe/tests/test_dataframe.py::test_hash_split_unique[5-2-1] - Val...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_hash_split_unique[5-2-4] - Val...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_hash_split_unique[5-2-20] - Va...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_hash_split_unique[5-5-1] - Val...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_hash_split_unique[5-5-4] - Val...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_hash_split_unique[5-5-20] - Va...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_hash_split_unique[20-2-1] - Va...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_hash_split_unique[20-2-4] - Va...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_hash_split_unique[20-2-20] - V...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_hash_split_unique[20-5-1] - Va...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_hash_split_unique[20-5-4] - Va...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_hash_split_unique[20-5-20] - V...
[  681s] FAILED dataframe/tests/test_dataframe.py::test_split_out_drop_duplicates[None]
[  681s] FAILED dataframe/tests/test_dataframe.py::test_split_out_drop_duplicates[2]
[  681s] FAILED dataframe/tests/test_dataframe.py::test_split_out_value_counts[None]
[  681s] FAILED dataframe/tests/test_dataframe.py::test_split_out_value_counts[2] - Va...
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_by_index_patterns[right-tasks]
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_by_multiple_columns[tasks-inner]
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_by_multiple_columns[tasks-outer]
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_by_multiple_columns[tasks-left]
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_by_multiple_columns[tasks-right]
[  681s] FAILED tests/test_distributed.py::test_combo_of_layer_types - ValueError: Buf...
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_index_without_divisions[tasks]
[  681s] FAILED dataframe/tests/test_multi.py::test_half_indexed_dataframe_avoids_shuffle
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_tasks_large_to_small[lg-28-left]
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_tasks_large_to_small[lg-28-right]
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_tasks_large_to_small[lg-32-left]
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_tasks_large_to_small[lg-32-right]
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_tasks_large_to_small[sm-28-left]
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_tasks_large_to_small[sm-28-right]
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_tasks_large_to_small[sm-32-left]
[  681s] FAILED dataframe/tests/test_multi.py::test_merge_tasks_large_to_small[sm-32-right]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_shuffle[tasks] - ValueError: Buf...
[  681s] FAILED dataframe/tests/test_shuffle.py::test_shuffle_npartitions_task - Value...
[  681s] FAILED dataframe/tests/test_shuffle.py::test_shuffle_npartitions_lt_input_partitions_task
[  681s] FAILED dataframe/tests/test_shuffle.py::test_index_with_non_series[tasks] - V...
[  681s] FAILED dataframe/tests/test_shuffle.py::test_index_with_dataframe[tasks] - Va...
[  681s] FAILED tests/test_distributed.py::test_shuffle_priority - ValueError: Buffer ...
[  681s] FAILED dataframe/tests/test_shuffle.py::test_shuffle_from_one_partition_to_one_other[tasks]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_shuffle_empty_partitions[tasks]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_set_index_tasks[4] - ValueError:...
[  681s] FAILED dataframe/tests/test_shuffle.py::test_set_index_tasks[7] - ValueError:...
[  681s] FAILED dataframe/tests/test_shuffle.py::test_set_index_tasks[23] - ValueError...
[  681s] FAILED tests/test_distributed.py::test_map_partitions_df_input - ValueError: ...
[  681s] FAILED dataframe/tests/test_shuffle.py::test_set_index_tasks_2[tasks] - Value...
[  681s] FAILED dataframe/tests/test_shuffle.py::test_set_index_tasks_3[tasks] - Value...
[  681s] FAILED dataframe/tests/test_shuffle.py::test_shuffle_sort[tasks] - ValueError...
[  681s] FAILED dataframe/tests/test_shuffle.py::test_rearrange[threads-tasks] - Value...
[  681s] FAILED dataframe/tests/test_shuffle.py::test_rearrange[processes-tasks] - Val...
[  681s] FAILED dataframe/tests/test_shuffle.py::test_gh_2730 - ValueError: Buffer dty...
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[None-id-None]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[None-id-True]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[None-id-False]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[None-name-None]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[None-name-True]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[None-name-False]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[None-on2-None]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[None-on2-True]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[None-on2-False]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[None-on3-None]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[None-on3-True]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[None-on3-False]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[4-id-None]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[4-id-True]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[4-id-False]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[4-name-None]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[4-name-True]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[4-name-False]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[4-on2-None]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[4-on2-True]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[4-on2-False]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[4-on3-None]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[4-on3-True]
[  681s] FAILED dataframe/tests/test_shuffle.py::test_dataframe_shuffle_on_tasks_api[4-on3-False]
...
[  682s] = 187 failed, 7973 passed, 891 skipped, 119 xfailed, 8 xpassed, 4462 warnings, 5 rerun in 636.78s (0:10:36) =

Full buildlog:
dask-test-i586_log.txt

pytest --pyargs dask -rfEs -n auto

Anything else we need to know?:

@group_split_dispatch.register((pd.DataFrame, pd.Series, pd.Index))
def group_split_pandas(df, c, k, ignore_index=False):
indexer, locations = pd._libs.algos.groupsort_indexer(
c.astype(np.int64, copy=False), k
)

Environment:

  • Dask version: 2019.9.1
  • Python version: 3.9.7
  • Operating System: openSUSE Tumbleweed
  • Install method (conda, pip, source): source (rpm package)
@TomAugspurger
Copy link
Member

Can you check if changing

c.astype(np.int64, copy=False), k
from np.int64 to np.intp fixes the issue?

@bnavigator
Copy link
Contributor Author

Checking right now. Meanwhile here is the change in pandas for reference: pandas-dev/pandas#40528

@bnavigator
Copy link
Contributor Author

I can confirm that changing the casting to np.intp fixes the failures with pandas 1.3.3 Not sure how to make this backwards compatible with pandas < 1.3 where groupsort_indexer expects an np.int64.

@github-actions github-actions bot added the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Oct 25, 2021
@detrout
Copy link
Contributor

detrout commented Mar 5, 2022

With Dask 2022.01 and 2022.02 this patch gets past the initial test failure test_categorical_set_index[tasks]:

--- a/dask/dataframe/backends.py
+++ b/dask/dataframe/backends.py
@@ -352,7 +352,7 @@
 @group_split_dispatch.register((pd.DataFrame, pd.Series, pd.Index))
 def group_split_pandas(df, c, k, ignore_index=False):
     indexer, locations = pd._libs.algos.groupsort_indexer(
-        c.astype(np.int64, copy=False), k
+        c.astype(np.intp, copy=False), k
     )
     df2 = df.take(indexer)
     locations = locations.cumsum()

However the new info(verbose=True) option introduced in #8222 gives a new error when running on 32 bit.

(Traceback from my 2022.02 test run

_____________________________ test_categorize_info _____________________________

    @pytest.mark.skipif(not PANDAS_GT_120, reason="need newer version of Pandas")
    def test_categorize_info():
        # assert that we can call info after categorize
        # workaround for: https://github.com/pydata/pandas/issues/14368
        from io import StringIO
    
        pandas_format._put_lines = put_lines
    
        df = pd.DataFrame(
            {"x": [1, 2, 3, 4], "y": pd.Series(list("aabc")), "z": pd.Series(list("aabc"))},
            index=[0, 1, 2, 3],
        )
        ddf = dd.from_pandas(df, npartitions=4).categorize(["y"])
    
        # Verbose=False
        buf = StringIO()
        ddf.info(buf=buf, verbose=True)
        expected = (
            "<class 'dask.dataframe.core.DataFrame'>\n"
            "Int64Index: 4 entries, 0 to 3\n"
            "Data columns (total 3 columns):\n"
            " #   Column  Non-Null Count  Dtype\n"
            "---  ------  --------------  -----\n"
            " 0   x       4 non-null      int64\n"
            " 1   y       4 non-null      category\n"
            " 2   z       4 non-null      object\n"
            "dtypes: category(1), object(1), int64(1)\n"
            "memory usage: 496.0 bytes\n"
        )
>       assert buf.getvalue() == expected
E       assert "<class 'dask...312.0 bytes\n" == "<class 'dask...496.0 bytes\n"
E           <class 'dask.dataframe.core.DataFrame'>
E           Int64Index: 4 entries, 0 to 3
E           Data columns (total 3 columns):
E            #   Column  Non-Null Count  Dtype
E           ---  ------  --------------  -----
E            0   x       4 non-null      int64
E            1   y       4 non-null      category...
E         
E         ...Full output truncated (7 lines hidden), use '-vv' to show

I'm pretty sure that 32-bit architectures will use less memory than 64-bit architectures and that should be expected.

(pdb) ddf.info(verbose=True)
<class 'dask.dataframe.core.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   x       4 non-null      int64
 1   y       4 non-null      category
 2   z       4 non-null      object
dtypes: category(1), object(1), int64(1)
memory usage: 312.0 bytes

Would it be better to not include the memory usage in the test, or make it something that could be altered depending on the architecture?

I tested the following patch as a solution to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006537 but I could also image just trimming the expected and response to end just at usage:" and delete the "496.0 bytes\n".

--- a/dask/dataframe/tests/test_dataframe.py
+++ b/dask/dataframe/tests/test_dataframe.py
@@ -3,6 +3,7 @@
 import xml.etree.ElementTree
 from itertools import product
 from operator import add
+import platform
 
 import numpy as np
 import pandas as pd
@@ -3597,6 +3598,12 @@
     # Verbose=False
     buf = StringIO()
     ddf.info(buf=buf, verbose=True)
+
+    if platform.architecture()[0] == "32bit":
+        memory_usage = "312.0"
+    else:
+        memory_usage = "496.0"
+
     expected = (
         "<class 'dask.dataframe.core.DataFrame'>\n"
         "Int64Index: 4 entries, 0 to 3\n"
@@ -3607,7 +3614,7 @@
         " 1   y       4 non-null      category\n"
         " 2   z       4 non-null      object\n"
         "dtypes: category(1), object(1), int64(1)\n"
-        "memory usage: 496.0 bytes\n"
+        "memory usage: {} bytes\n".format(memory_usage)
     )
     assert buf.getvalue() == expected
 

@mgorny
Copy link
Contributor

mgorny commented Mar 21, 2022

Could you submit the fixes as PRs, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants