-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Enable CSV Writer to append / overwrite existing file #30429
Comments
Weston Pace / @westonpace: That being said I can see the potential advantages when it comes to CSV. CC @pitrou |
Antoine Pitrou / @pitrou: |
Dragoș Moldovan-Grünfeld / @dragosmg: Any thoughts @nealrichardson, |
Neal Richardson / @nealrichardson: > Do we have to emulate everything that's provided by another R library? Absolutely not, and likewise with pandas or any other package. But for every feature they have, there's a reason it exists, and we should evaluate whether it seems like a good reason--or, at least decide to wait until someone asks for it. |
We can, it's just that it's less useful :-) Nobody opposed when we started deprecating it, but we can un-deprecate if desired. |
Neal Richardson / @nealrichardson: |
Weston Pace / @westonpace: But the answer is very confusing to users. The parquet format page has confused many people with this line:
Spark further confuses the picture with "SaveMode.Append" which is documented as:
But...what is actually happening is it is either reading in the file and rewriting it or creating a new file in the same "dataset" (I don't recall off the top of my head which of these two it is). So it has been useful for me to be able to parrot a simple line "No. You cannot append to an existing file. The preferred operation is to create a new file in the same dataset. If you are doing many small writes then you can concatenate them in memory or you can periodically merge files after they are written". So I guess I worry about the slippery slope. "Users might sometimes want to append data so lets add that to the filesystem" leads to "Users want to be able to append to CSV files" leads to "We should add an append mode to write_dataset since there is at least one format that supports it" which leads to further confusing users. I won't stand in the way of adding append to CSV if wanted but I would be pretty stubborn about adding append to write_dataset. [1] https://stackoverflow.com/questions/44608076/can-you-append-to-a-feather-format |
This would be a match for the
readr::write_csv()
append
argument: boolean. IfFALSE
will overwrite existing file. IfTRUE
will append to existing file. In both cases, if the file doesn't exist, a new file is created.Reporter: Dragoș Moldovan-Grünfeld / @dragosmg
Related issues:
Note: This issue was originally created as ARROW-14904. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: