Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for load_as_pyarrow_dataset or load_as_pyarrow_table #238

Open
chitralverma opened this issue Dec 21, 2022 · 1 comment · May be fixed by #240
Open

Support for load_as_pyarrow_dataset or load_as_pyarrow_table #238

chitralverma opened this issue Dec 21, 2022 · 1 comment · May be fixed by #240
Assignees

Comments

@chitralverma
Copy link

chitralverma commented Dec 21, 2022

This is a new feature request or rather a little refactoring in the code for reader to allow users to read datasets directly as pyarrow datasets and tables.

As you can see here, we are anyways creating the pyarrow dataset and table, which is then used to convert to a pandas DF in the to_pandas method

I would like to refactor this part and expose this as separate functionalities - to_pyarrow_dataset and to_pyarrow_table.

Advantage of this refactoring is that users will then be able to efficiently get the pyarrow things directly without an additional full copy/ conversion to pandas dataframe if required. This will allow the extension of delta-sharing on other processing systems like Datafusion, Polars etc, since they all extensively rely on pyarrow datasets.

Please let me know if this issue makes sense to you, I can raise a PR quick for this in a day or so.

Note: the existing functionalities will remain unaffected by this refactoring.

@jacobmarble
Copy link

I was googling "delta sharing polars" and found this issue. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants