Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter to specify file compression for csv files #26

Closed
nitin-kakkar opened this issue Sep 19, 2019 · 1 comment
Closed

Add parameter to specify file compression for csv files #26

nitin-kakkar opened this issue Sep 19, 2019 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@nitin-kakkar
Copy link

Add file compression param while writing csv/parquet. For parquet, it compression defaults to snappy in parquet.py def write_table but in function to_csv, there is no param to specify file compression
ie. pandas.py
def to_csv(
self,
dataframe,
path,
database=None,
table=None,
partition_cols=None,
preserve_index=True,
mode="append",
procs_cpu_bound=None,
procs_io_bound=None,
)

Request is to add a new parameter for specifying the file compression eg - gzip

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
compression: str, default ‘infer’
Compression mode among the following possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If ‘infer’ and path_or_buf is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression).

@igorborgest igorborgest added the enhancement New feature or request label Sep 19, 2019
@igorborgest igorborgest self-assigned this Sep 19, 2019
@igorborgest
Copy link
Contributor

Hey @nitin-kakkar, thank you for open this issue.

I've added compression options for Pandas.to_parquet and prepare the structures to do the same with Pandas.to_csv in the future.
But for now the second is blocked by a Pandas limitation.
How we aim to write compressed files directly to S3, we must wait until Pandas unlock the option of write compressed files in memory (instead of disk).

jaidisido added a commit that referenced this issue Aug 10, 2023
Signed-off-by: Abdel Jaidi <jaidisido@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants