Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-3838: [Rust] Implement CSV Writer #3111

Closed
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@andygrove
Copy link
Contributor

andygrove commented Dec 6, 2018

No description provided.

w: BufWriter<File>,
}

macro_rules! write_primitive_array {

This comment has been minimized.

@sunchao

sunchao Dec 6, 2018

Contributor

Can we avoid this macro definition? maybe we can just make it generic over element type?

This comment has been minimized.

@andygrove

andygrove Dec 7, 2018

Author Contributor

Sure. I wrote this before the recent changes to generics ... will refactor this over the weekend.

for row_index in 0..batch.num_rows() {
for col_index in 0..batch.num_columns() {
if col_index > 0 {
self.w.write(",".as_bytes()).unwrap();

This comment has been minimized.

@sunchao

sunchao Dec 6, 2018

Contributor

seems we are ignoring possible errors in many places. Should we surface these errors instead of unwrap?

let file = File::create("/tmp/uk_cities.csv").unwrap();

let mut writer = Writer::new(file);
writer.write(&batch);

This comment has been minimized.

@sunchao

sunchao Dec 6, 2018

Contributor

Should we check whether result is what we expected and is in the correct format?

@paddyhoran
Copy link
Contributor

paddyhoran left a comment

Github is not letting me comment for some reason:
Nits:

  • line 44 - this can be Self instead of Writer
  • some basic doc-comments

I'm sure you have noticed but this fails on windows due to the temporary directory you are using in the tests. Maybe we could switch to use something like tempfile.

I agree with @sunchao that it would be ideal to remove the macro.

Right now this does not support nulls, but array.value assumes you have checked for nulls. If we want to support nulls we have to decide how we want them represented in csv's so maybe for now explicitly omit support for nulls in the writer. Check the null count and panic and we can follow up with null support for the csv reader and writer.

@nevi-me

This comment has been minimized.

Copy link
Contributor

nevi-me commented Dec 8, 2018

Should we support passing options to the writer, such as:

  • a custom delimiter
  • whether to write headers or not
  • (future) how to serialise datetime

Also, what happens if one tries to write a struct type?

@wesm wesm force-pushed the apache:master branch from 3088183 to 0c6b2d2 Feb 18, 2019

@andygrove andygrove closed this Mar 2, 2019

@andygrove andygrove deleted the andygrove:ARROW-3838 branch Mar 2, 2019

andygrove added a commit that referenced this pull request Mar 4, 2019

ARROW-3838: [Rust] CSV Writer
This supersedes #3111.

Supports:
* writing using the `csv` crate
* customising delimiter, and whether or not to write headers

Limitations:
* writes to `std::fs::File`, I struggled to make it write to any `std::io::Write` interface. @paddyhoran any ideas as you did it for `Reader`?
* values are converted first to `String` before being written. It could be more performant if we could convert directly to byte slice as `csv` crate supports that. I also struggled with this

Potential Further Work:
* writing temporal arrays (after #3726 [ARROW-4386])

Author: Neville Dipale <nevilledips@gmail.com>

Closes #3790 from nevi-me/ARROW-3838 and squashes the following commits:

7839949 <Neville Dipale> try fix tmp file issue
d60d0ce <Neville Dipale> change writer tests to write to target folder
74db488 <Neville Dipale> cargo fmt
1693c9b <Neville Dipale> ARROW-3838:  CSV Writer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.