Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars Parquet reader/writer #417

Closed
Tracked by #410
skrawcz opened this issue Oct 2, 2023 · 4 comments
Closed
Tracked by #410

Polars Parquet reader/writer #417

skrawcz opened this issue Oct 2, 2023 · 4 comments
Assignees
Labels
hacktoberfest Hacktoberfest issues

Comments

@skrawcz
Copy link
Collaborator

skrawcz commented Oct 2, 2023

Implement https://pola-rs.github.io/polars/py-polars/html/reference/io.html#parquet as a data saver/loader.

  1. It should go into polars_extensions.py, one class for the writer, on class for the reader, e.g. much like how the polars csv reader and writer are structured.
  2. There should be requisite tests to exercise the functionality. We can do one test that writes & then reads it's own output.
  3. Then an example should be added to a new directory called materialization under example/polars. It should mirror the pandas materialization example in structure.
  4. If there's an issue with type hints, let me know and we can chat through it.
@skrawcz skrawcz changed the title Parquet Polars Parquet reader/writer Oct 2, 2023
@swapdewalkar
Copy link
Contributor

I am taking it up!

@TanyaKansal
Copy link

Want to work on this issue. Let me know if it's open

@swapdewalkar
Copy link
Contributor

@TanyaKansal I am already working on this. check #410
Some version other are open.

@swapdewalkar swapdewalkar mentioned this issue Oct 3, 2023
7 tasks
skrawcz added a commit that referenced this issue Oct 4, 2023
Adds a Polars Parquet Data Saver /Loader.
I.e. exposes https://pola-rs.github.io/polars/py-polars/html/reference/io.html#parquet as a data saver/loader.

This enables someone to then do the following:

```
    to.parquet(
        dependencies=output_columns,
        id="df_to_parquet",
        path="./df.parquet",
        combine=df_builder,
    ),
```
when dealing with polars dataframes.

Note: pre-commit fails here, but will fix when merged because that's simpler.

--- squashed commits:

* Intial Parquet Change

* Added Materializations example.

* Add Writer.

* Fix compression in unit test

* Fix n_rows in unit test

* Only change for parquet polars

* Change dict to Dict for python > 3.7

* Remove unused import base

* reformatted

* Update comment and name of file.

* Remove excess code

Removing code that is not required for the example.

---------

Co-authored-by: Swapnil Dewalkar <sdewalkar@fanatics.com>
Co-authored-by: Stefan Krawczyk <stefan@dagworks.io>
@skrawcz
Copy link
Collaborator Author

skrawcz commented Oct 4, 2023

Merged. Thanks @swapdewalkar ! -- this will be released soon.

@skrawcz skrawcz closed this as completed Oct 4, 2023
@skrawcz skrawcz added the hacktoberfest Hacktoberfest issues label Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest Hacktoberfest issues
Projects
None yet
Development

No branches or pull requests

3 participants