New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow conversion to/from pandas without requiring PyArrow #15845
Comments
Our pandas conversion logic (both ways) goes through PyArrow. In some special cases (object columns) we go through NumPy. I suppose it would be more ideal to go through NumPy if that would be sufficient, instead of failing. Or write our own conversion logic. There is also |
Just as a note of caution, there were some pretty bad bugs in the pandas implementation of the interchange protocol before 2.2.2 for nullable dtypes. If you're just converting from pandas classic numpy-backed dtypes (as I think you are in the linked PR) then it should be OK (unfortunately, this is the downside of having bundled the interchange protocol with pandas itself - by the time minimum versions have been bumped sufficiently, it's going to be years until it's fully usable) |
@adrinjalali suppose for the sake of argument that in Polars' next release, conversion from pandas to Polars for pandas primitive dtypes could happen without PyArrow installed Would you then be OK with bumping the Polars version for the scikit-learn docs to the most recent one, as opposed to using the interchange protocol? |
Since I would be okay with that. Note that we're already probably bumping the min required version to a more recent one for |
Awesome, thanks Fancy waiting 1 week more (that's the usual release cadence for Polars) so you can bump it all the way just once and avoid |
Works for me :) |
Encountered this while reviewing this PR on the scikit-learn side, xref: scikit-learn/scikit-learn#28804 (comment)
Basically, if the environment doesn't have
pyarrow
, conversion frompandas
seems to requirepyarrow
eventhough thepandas.DataFrame
isn't usingpyarrow
.Minimal reproducible:
Note that in the above example the other way around (conversion from polars to pandas) works fine.
The PR on the scikit-learn side, introduced this line:
which seems very odd, having to move to numpy and then to polars. Also, if the above line is correct, polars could be doing almost the same internally and not require pyarrow for the conversion.
The text was updated successfully, but these errors were encountered: