Feat/pandas reader parquet #429

flaviassantos · 2023-10-03T19:39:44Z

This pull request adds parquet read and write functionality for issue #406.

Changes

The changes to the files are as described below:

pandas_extensions.py: Added the classes to read and write parquet files
test_pandas_extensions.py: Added a single test case that exercises the writing and reading functionality respectively
notebook.ipynb: Added the example of parquet materialization
my_script.py: Added the example of parquet materialization

How I tested this

By running successfully:

unit test for the PandasParquetWriter and PandasParquetReader classes.
the Jupyter notebook.
the my_script.py
cicleci job locally

Notes

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

…ng/reading of a dataframe

…showing writing/reading of a dataframe

hamilton/plugins/pandas_extensions.py

flaviassantos · 2023-10-03T20:07:03Z

@skrawcz , last point in the checklist. Can you point me to the "Project documentation" mentioned?

skrawcz · 2023-10-03T20:22:45Z

@skrawcz , last point in the checklist. Can you point me to the "Project documentation" mentioned?

Yep so that should be automatically updated for you. It should show up here (based on the build of this branch) https://hamilton--429.org.readthedocs.build/en/429/reference/io/available-data-adapters/.

…riter to handle kwargs not listed in Pandas' docs

tests/plugins/test_pandas_extensions.py

skrawcz

looks good, just the minor comment on the test!

skrawcz · 2023-10-03T20:44:29Z

try:
result = self.api.parquet.read_table(
path_or_handle, columns=columns, **kwargs
).to_pandas(**to_pandas_kwargs)
E TypeError: read_table() got an unexpected keyword argument 'dtype_backend'

../venvs/hamilton-venv/lib/python3.7/site-packages/pandas/io/parquet.py:240: TypeError

You will need to gate this parameter for 3.7. So you'll see others have:

        if sys.version_info >= (3, 8) and self.dtype_backend is not None:
            kwargs["dtype_backend"] = self.dtype_backend

…uetReader class

flaviassantos added 4 commits October 3, 2023 19:12

feat(parquet): implemented and tested the writer for pandas parquet

9ca9561

feat(parquet): implemented and tested the reader for pandas parquet

b4457e9

feat(parquet): added to the pandas materializer example showing writi…

447327d

…ng/reading of a dataframe

feat(parquet): added to the my_script in pandas materializer example …

3fd6f6d

…showing writing/reading of a dataframe

flaviassantos commented Oct 3, 2023

View reviewed changes

hamilton/plugins/pandas_extensions.py Show resolved Hide resolved

feat(parquet): added an extra_kwargs dictionary to the PandasParquetW…

7b09999

…riter to handle kwargs not listed in Pandas' docs

skrawcz reviewed Oct 3, 2023

View reviewed changes

tests/plugins/test_pandas_extensions.py Show resolved Hide resolved

skrawcz reviewed Oct 3, 2023

View reviewed changes

flaviassantos added 2 commits October 3, 2023 23:21

feat(parquet): added conditional dtype_backend support for PandasParq…

288f3f5

…uetReader class

feat(parquet): updates unit test with assert_frame_equal usage

6d1e39f

skrawcz approved these changes Oct 4, 2023

View reviewed changes

skrawcz merged commit 28c955e into DAGWorks-Inc:main Oct 4, 2023
21 checks passed

skrawcz added the hacktoberfest-accepted label Oct 4, 2023

JoJo10Smith mentioned this pull request Oct 30, 2023

'extra_kwargs' vs writing out each argument #506

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/pandas reader parquet #429

Feat/pandas reader parquet #429

flaviassantos commented Oct 3, 2023 •

edited by skrawcz

flaviassantos commented Oct 3, 2023

skrawcz commented Oct 3, 2023

skrawcz left a comment

skrawcz commented Oct 3, 2023

Feat/pandas reader parquet #429

Feat/pandas reader parquet #429

Conversation

flaviassantos commented Oct 3, 2023 • edited by skrawcz

Changes

How I tested this

Notes

Checklist

flaviassantos commented Oct 3, 2023

skrawcz commented Oct 3, 2023

skrawcz left a comment

Choose a reason for hiding this comment

skrawcz commented Oct 3, 2023

flaviassantos commented Oct 3, 2023 •

edited by skrawcz