-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved support for pyarrow
strings
#10000
Conversation
14201f7
to
37b6d81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dask/partd#64 should fix some of the shuffle-related serialization errors. You might try temporarily pointing CI environments at that PR to see what tests start passing
e96e2bf
to
7e65678
Compare
@jrbourbeau Thanks, I added this to install script. I couldn't add it to the conda env file because of version conflict - when installed from git, partd version is 1.0, and we have 1.2.0 in requirements. |
889a4fb
to
43a6d96
Compare
pyarrow
strings
2b5eed4
to
6ca705b
Compare
This PR is using a couple of different ways to handle
These tests use a fixture called
Those are marked with |
4c98ed1
to
400bb23
Compare
I simplified how the failures are handled. We have two different types now:
|
…d continue-on-error.
1c1b692
to
b295ef9
Compare
pyarrow
stringspyarrow
strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work here @j-bennet!
# pyarrow dtypes currently fail, so we allow continuing on error for that specific build | ||
# TODO: Remove the `continue-on-error` line below once tests are all passing | ||
continue-on-error: ${{ matrix.extra == 'pyarrow' }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
🎉 |
Part of #9946.
Upstream issues:
pd.concat
fails withGroupBy.head()
andpd.StringDtype["pyarrow"]
pandas-dev/pandas#51734string[pyarrow]
column tocategory
triggers an error into_parquet
pandas-dev/pandas#51752to_parquet
fails with a category field backed by pyarrow string apache/arrow#34449 (closed) is a duplicate of the following issue:sum
aggregation)Issues to follow up in Dask:
string[pyarrow]
inMultiIndex
#10025pre-commit run --all-files