Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python][CI] Nightly dask integration tests failing (test_categorize_info, change in StringArray nbytes) #39028

Closed
jorisvandenbossche opened this issue Dec 1, 2023 · 0 comments · Fixed by #39029

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 1, 2023

See for example https://github.com/ursacomputing/crossbow/actions/runs/7054010641/job/19202136552

The test_categorize_info tests the info display, which includes the memory size of the dataframe. When the dataframe is using pyarrow string dtype, this test is failing with latest pyarrow.

Checking locally, it seems that with pyarrow main, the StringArray that gets constructed under the hood no longer has a validity bitmap allocated (for a case without nulls, so this is a fine optimization), while with pyarrow 14.0, the bitmap is present. This results in a change in .nbytes, making the test fail. This will need to be fixed downstream in dask.

@jorisvandenbossche jorisvandenbossche self-assigned this Dec 1, 2023
jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue Dec 1, 2023
jorisvandenbossche added a commit that referenced this issue Dec 1, 2023
…ping test_categorize_info (#39029)

The test requires an downstream fix in dask (because of a valid change in Arrow), until then temporarily skipping this test (see the issue for more details).

* Closes: #39028

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche jorisvandenbossche added this to the 15.0.0 milestone Dec 1, 2023
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…y skipping test_categorize_info (apache#39029)

The test requires an downstream fix in dask (because of a valid change in Arrow), until then temporarily skipping this test (see the issue for more details).

* Closes: apache#39028

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment