-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Support polars via Python dataframe interchange protocol #474
Comments
Do you think something like this in Plotly, just to get a zero-copy to pandas @lorentzenchr? I see you have an Polars rewrite as a branch @jbogaardt - is converting to polars what you consider the future of the package? 😄 |
For some reason i didnt see this issue when @lorentzenchr created it. It is an worthwhile enhacement since thats the direction other tools seem to be going. @johalnes, I did start the pl_tri branch to experiment with polars over a numpy/sparse backend. It is much simpler code, fewer dependencies is generally faster than the main branch, and can possibly be extended to other languages where the polars API has been implemented (R, node.js). There are some computations that are slower though. I think where arrays make more sense, polars might be slower that the current implementation, but for data manipulation it is much faster. Overall, I've been impressed with the speed + simplification polars brings. If I can get it to a point where there is minimal impact to the end user API and performance, then it probably is the right move. |
Totally agree with you @jbogaardt! I do think the code get easier and more elegant with Polars. But quite a lot of work it would seem, even if it looks like you already have done a lot! Added an attempt at the interchange protocol, and luckily the pandas dev team have done most of the work for us 😊 Is there any more fundament changes you have considered? Since the package still isn't at 1.0, you could do some breaking changes and no one can arrest you for doing it😉 Given more performant, future proof and readable code I would think most users would be quite happy! @lorentzenchr - give it a try if you have the time! |
To be clear, my proposal is just to support dataframes that support the dataframe interchange protocol in |
I know @lorentzenchr, sorry for going off topic. Just got exited! I think the pull request merged closes this issue? Or have I missed something? |
Yes, we can close. Thanks for the fast PR. |
Is your feature request related to a problem? Please describe.
I would like to preprocess my data in other data containers than pandas, e.g. pyarrow and polars, and then apply
Triangle
to it.Is your feature request at odds with the scope of the package?
results in
Describe the solution you'd like
Supporting data via the Python dataframe interchange protocol might be an optimal approach.
Describe alternatives you've considered
One could also consider polars specific code path. But this might result in more maintenance burden: Where to stop? Add pyarrow, too?
Additional context
A similar feature request is scikit-learn/scikit-learn#25896 which is currently worked on.
The text was updated successfully, but these errors were encountered: