Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Support polars via Python dataframe interchange protocol #474

Closed
lorentzenchr opened this issue Nov 23, 2023 · 6 comments
Closed

[ENH] Support polars via Python dataframe interchange protocol #474

lorentzenchr opened this issue Nov 23, 2023 · 6 comments
Assignees

Comments

@lorentzenchr
Copy link

Is your feature request related to a problem? Please describe.
I would like to preprocess my data in other data containers than pandas, e.g. pyarrow and polars, and then apply Triangle to it.

Is your feature request at odds with the scope of the package?

import chainladder as cl
import pandas as pd
import polars as pl


raa_df = pd.read_csv(
    "https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/raa.csv"
)

# This works just fine
raa = cl.Triangle(
    raa_df,
    origin="origin",
    development="development",
    columns="values",
    cumulative=True,
)

# This gives an error
raa_pl = pl.read_csv(
    "https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/raa.csv"
)

cl.Triangle(
    raa_pl,
    origin="origin",
    development="development",
    columns="values",
    cumulative=True,
)

results in

AttributeError: 'DataFrame' object has no attribute 'iloc'

Describe the solution you'd like
Supporting data via the Python dataframe interchange protocol might be an optimal approach.

Describe alternatives you've considered
One could also consider polars specific code path. But this might result in more maintenance burden: Where to stop? Add pyarrow, too?

Additional context
A similar feature request is scikit-learn/scikit-learn#25896 which is currently worked on.

@johalnes
Copy link
Contributor

Do you think something like this in Plotly, just to get a zero-copy to pandas @lorentzenchr?

I see you have an Polars rewrite as a branch @jbogaardt - is converting to polars what you consider the future of the package? 😄

@jbogaardt
Copy link
Collaborator

For some reason i didnt see this issue when @lorentzenchr created it. It is an worthwhile enhacement since thats the direction other tools seem to be going.

@johalnes, I did start the pl_tri branch to experiment with polars over a numpy/sparse backend. It is much simpler code, fewer dependencies is generally faster than the main branch, and can possibly be extended to other languages where the polars API has been implemented (R, node.js). There are some computations that are slower though. I think where arrays make more sense, polars might be slower that the current implementation, but for data manipulation it is much faster. Overall, I've been impressed with the speed + simplification polars brings. If I can get it to a point where there is minimal impact to the end user API and performance, then it probably is the right move.

@johalnes
Copy link
Contributor

johalnes commented Dec 15, 2023

Totally agree with you @jbogaardt! I do think the code get easier and more elegant with Polars. But quite a lot of work it would seem, even if it looks like you already have done a lot! Added an attempt at the interchange protocol, and luckily the pandas dev team have done most of the work for us 😊

Is there any more fundament changes you have considered? Since the package still isn't at 1.0, you could do some breaking changes and no one can arrest you for doing it😉 Given more performant, future proof and readable code I would think most users would be quite happy!

@lorentzenchr - give it a try if you have the time!

@lorentzenchr
Copy link
Author

To be clear, my proposal is just to support dataframes that support the dataframe interchange protocol in cl.Triangle.
Replacing pandas as the internal computation backend is a totally different story that is better discussed separately.

@johalnes
Copy link
Contributor

I know @lorentzenchr, sorry for going off topic. Just got exited!

I think the pull request merged closes this issue? Or have I missed something?

@lorentzenchr
Copy link
Author

Yes, we can close. Thanks for the fast PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants