Skip to content
This repository has been archived by the owner on Oct 8, 2024. It is now read-only.

chr1st1ank/dataframe-io

Repository files navigation

dataframe-io

Note

Note: This is an archived repository of my unmaintained package. There were times when I found such a library useful. It is archived, because I moved to using Polars and its rich I/O connectors for the typical usecases.

Have a look at the Polars User Guide in case you need a library to read and write dataframes.

Release Status CI Status codecov

Read and write dataframes from and to any storage.

Features

Dataframes types supported:

  • pandas DataFrame
  • Python dictionary

Supported storage backends:

  • Parquet files
  • PostgreSQL database

More backends will come. Open an issue if you are interested in a particular backend.

Implementation status for reading data:

Storage Select columns Filter rows Max rows Sampling Drop duplicates
Parquet files ✔️ ✔️ ✔️ ✔️ ✔ ¹
PostgreSQL ✔️ ✔️ ✔️ ✔️ ✔️

¹ only for pandas DataFrames

Implementation status for writing data:

Storage write append write replace
Parquet files ✔️ ✔️
PostgreSQL ✔️ ✔️

Installation

pip install dframeio

# Including pyarrow to read/write parquet files:
pip install dframeio[parquet]

# Including PostgreSQL support:
pip install dframeio[postgres]

Show installed backends:

>>> import dframeio
>>> dframeio.backends
[<class 'dframeio.parquet.ParquetBackend'>]