Skip to content

Fix test_melt with pyarrow strings active#10052

Merged
jrbourbeau merged 8 commits intodask:mainfrom
j-bennet:j-bennet/pyarrow-strings-melt
Mar 14, 2023
Merged

Fix test_melt with pyarrow strings active#10052
jrbourbeau merged 8 commits intodask:mainfrom
j-bennet:j-bennet/pyarrow-strings-melt

Conversation

@j-bennet
Copy link
Copy Markdown
Contributor

Part of #10029.

Follow-up for #10000.

  • Tests added / passed
  • Passes pre-commit run --all-files

@j-bennet j-bennet requested a review from jrbourbeau March 11, 2023 00:06
@j-bennet j-bennet changed the title Fix test_melt. [pyarrow strings] Fix test_melt Mar 11, 2023
Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @j-bennet

It's not totally clear to me whether this is a case where we should convert to string[pyarrow] or leave as object when combining mixed dtype columns (e.g. combining strings and floats). What's your thinking behind the current approach?

@j-bennet j-bennet force-pushed the j-bennet/pyarrow-strings-melt branch from 9461bd9 to 8105b19 Compare March 13, 2023 19:44
@j-bennet j-bennet force-pushed the j-bennet/pyarrow-strings-melt branch from 8105b19 to d2d9805 Compare March 13, 2023 19:45
@j-bennet
Copy link
Copy Markdown
Contributor Author

j-bennet commented Mar 13, 2023

It's not totally clear to me whether this is a case where we should convert to string[pyarrow] or leave as object when combining mixed dtype columns (e.g. combining strings and floats). What's your thinking behind the current approach?

For some reason, I had an impression that pandas would lose string[pyarrow] dtypes when melting them together, but this turns out to not be the case (perhaps I've seen this in the past, but it was fixed? not sure). pandas has reasonable upcasting behavior:

obj + str = obj
obj + int = obj
str + str = str
str + int = obj

so the best fix seems to be to turn off convert_string and delegate to upstream. This PR is now doing just that.

Comment on lines +550 to +553
(
a,
b,
) = _maybe_convert_string(a, b)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not checked locally -- is pre-commit happy with this formatting? Not a big deal if we need what's current in this PR, but just thought I'd check

Suggested change
(
a,
b,
) = _maybe_convert_string(a, b)
a, b = _maybe_convert_string(a, b)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what pre-commit did, for whatever reason! I didn't do that. :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, weird. When I make this change, pre-commit seems to be happy locally

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's happy now.

@jrbourbeau jrbourbeau changed the title [pyarrow strings] Fix test_melt Fix test_melt with pyarrow strings active Mar 13, 2023
Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @j-bennet -- will merge after CI finishes

@j-bennet j-bennet changed the title Fix test_melt with pyarrow strings active [pyarrow strings] Fix test_melt Mar 13, 2023
@j-bennet
Copy link
Copy Markdown
Contributor Author

Test failure is unrelated (upstream): FAILED dask/dataframe/tests/test_pyarrow_compat.py::test_pickle_roundtrip[uint8].

@jrbourbeau jrbourbeau changed the title [pyarrow strings] Fix test_melt Fix test_melt with pyarrow strings active Mar 14, 2023
@jrbourbeau jrbourbeau merged commit 9364281 into dask:main Mar 14, 2023
@j-bennet j-bennet deleted the j-bennet/pyarrow-strings-melt branch March 14, 2023 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants