Skip to content

Commit

Permalink
Update pandas parquet reader and writer (#429)
Browse files Browse the repository at this point in the history
This commit exposes all the options that parquet read and write functionality for pandas expose,
fulfilling issue #406.

Note: extra_kwargs is to house extra key word arguments one can pass to the parquet reader and writer.
We don't expose those parameters because that's something on the user.

Squashed commits:

* feat(parquet): implemented and tested the writer for pandas parquet

* feat(parquet): implemented and tested the reader for pandas parquet

* feat(parquet): added to the pandas materializer example showing writing/reading of a dataframe

* feat(parquet): added to the my_script in pandas materializer example showing writing/reading of a dataframe

* feat(parquet): added an extra_kwargs dictionary to the PandasParquetWriter to handle kwargs not listed in Pandas' docs

* feat(parquet): added conditional dtype_backend support for PandasParquetReader class

* feat(parquet): updates unit test with assert_frame_equal usage
  • Loading branch information
flaviassantos committed Oct 4, 2023
1 parent 100e765 commit 28c955e
Show file tree
Hide file tree
Showing 4 changed files with 598 additions and 426 deletions.
8 changes: 8 additions & 0 deletions examples/pandas/materialization/my_script.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,12 @@
path="./df.feather",
combine=df_builder,
),
to.parquet(
dependencies=output_columns,
id="df_to_parquet",
path="./df.parquet.gzip",
combine=df_builder,
),
]
# Visualize what is happening
dr.visualize_materialization(
Expand All @@ -110,6 +116,7 @@
"df_to_html_build_result",
"df_to_stata_build_result",
"df_to_feather_build_result",
"df_to_parquet_build_result",
], # because combine is used, we can get that result here.
inputs=initial_columns,
)
Expand All @@ -121,5 +128,6 @@
print(additional_outputs["df_to_html_build_result"])
print(additional_outputs["df_to_stata_build_result"])
print(additional_outputs["df_to_feather_build_result"])
print(additional_outputs["df_to_parquet_build_result"])

conn.close()
Loading

0 comments on commit 28c955e

Please sign in to comment.