Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-3490: [R] streaming of arrow objects to streams
This makes `write_record_batch` and `write_table` generic with dispatch on the stream type. ```r write_record_batch <- function(x, stream, ...){ UseMethod("write_record_batch", stream) } write_table <- function(x, stream, ...) { UseMethod("write_table", stream) } ``` The `stream` argument can be various things for different use cases: - an `arrow::pic::RecordBatchWriter` created either with `record_batch_stream_writer()` or `record_batch_file_writer()`. This is the lowest level and that calls its `$WriteBatch()` or `$WriteTable()` method depending on what is being streamed - an `arrow::io::OutputStream` : this first creates an `arrow::ipc::RecordBatchStreamWriter` and streams into it. In particular this *does not* add the bytes of arrow files. - an `fs_path` from 📦 `fs` : this opens a `arrow::ipc::RecordBatchFileWriter` and streams to it, so that the file gets the ARROW1 bytes - A `character`, we just assert it is of length one and then call the `fs_path` method - A `raw()` which is just used for its type, in that case we stream into a byte buffer and returns it as a raw vector Some examples: ``` r library(arrow) tbl <- tibble::tibble( int = 1:10, dbl = as.numeric(1:10), lgl = sample(c(TRUE, FALSE, NA), 10, replace = TRUE), chr = letters[1:10] ) batch <- record_batch(tbl) tf <- tempfile() # stream the batch to the file write_record_batch(batch, tf) # same write_record_batch(batch, fs::path_abs(tf)) # to an InputStream file_stream <- file_output_stream(tf) write_record_batch(batch, file_stream) file_stream$Close() # to a RecordBatchFileWriter file_stream <- file_output_stream(tf) file_writer <- record_batch_file_writer(file_stream, batch$schema()) write_record_batch(batch, file_writer) file_writer$Close() file_stream$Close() # get the bytes directly write_record_batch(batch, raw()) #> [1] 04 01 00 00 10 00 00 00 00 00 0a 00 0c 00 06 00 05 00 08 00 0a 00 00 #> [24] 00 00 01 03 00 0c 00 00 00 08 00 08 00 00 00 04 00 08 00 00 00 04 00 #> [47] 00 00 04 00 00 00 9c 00 00 00 58 00 00 00 2c 00 00 00 04 00 00 00 84 #> [70] ff ff ff 00 00 01 05 14 00 00 00 0c 00 00 00 04 00 00 00 00 00 00 00 #> [93] dc ff ff ff 03 00 00 00 63 68 72 00 a8 ff ff ff 00 00 01 06 18 00 00 #> [116] 00 10 00 00 00 04 00 00 00 00 00 00 00 04 00 04 00 04 00 00 00 03 00 #> [139] 00 00 6c 67 6c 00 d0 ff ff ff 00 00 01 03 20 00 00 00 14 00 00 00 04 #> [162] 00 00 00 00 00 00 00 00 00 06 00 08 00 06 00 06 00 00 00 00 00 02 00 #> [185] 03 00 00 00 64 62 6c 00 10 00 14 00 08 00 06 00 07 00 0c 00 00 00 10 #> [208] 00 10 00 00 00 00 00 01 02 24 00 00 00 14 00 00 00 04 00 00 00 00 00 #> [231] 00 00 08 00 0c 00 08 00 07 00 08 00 00 00 00 00 00 01 20 00 00 00 03 #> [254] 00 00 00 69 6e 74 00 00 00 00 00 2c 01 00 00 14 00 00 00 00 00 00 00 #> [277] 0c 00 16 00 06 00 05 00 08 00 0c 00 0c 00 00 00 00 03 03 00 18 00 00 #> [300] 00 c8 00 00 00 00 00 00 00 00 00 0a 00 18 00 0c 00 04 00 08 00 0a 00 #> [323] 00 00 ac 00 00 00 10 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 09 #> [346] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 #> [369] 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 00 00 #> [392] 00 00 00 00 00 28 00 00 00 00 00 00 00 50 00 00 00 00 00 00 00 78 00 #> [415] 00 00 00 00 00 00 08 00 00 00 00 00 00 00 80 00 00 00 00 00 00 00 08 #> [438] 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 #> [461] 88 00 00 00 00 00 00 00 30 00 00 00 00 00 00 00 b8 00 00 00 00 00 00 #> [484] 00 10 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 0a 00 00 00 00 00 #> [507] 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 00 #> [530] 00 00 00 0a 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 0a 00 00 00 #> [553] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 02 00 00 #> [576] 00 03 00 00 00 04 00 00 00 05 00 00 00 06 00 00 00 07 00 00 00 08 00 #> [599] 00 00 09 00 00 00 0a 00 00 00 00 00 00 00 00 00 f0 3f 00 00 00 00 00 #> [622] 00 00 40 00 00 00 00 00 00 08 40 00 00 00 00 00 00 10 40 00 00 00 00 #> [645] 00 00 14 40 00 00 00 00 00 00 18 40 00 00 00 00 00 00 1c 40 00 00 00 #> [668] 00 00 00 20 40 00 00 00 00 00 00 22 40 00 00 00 00 00 00 24 40 6b 03 #> [691] 00 00 00 00 00 00 22 01 00 00 00 00 00 00 00 00 00 00 01 00 00 00 02 #> [714] 00 00 00 03 00 00 00 04 00 00 00 05 00 00 00 06 00 00 00 07 00 00 00 #> [737] 08 00 00 00 09 00 00 00 0a 00 00 00 00 00 00 00 61 62 63 64 65 66 67 #> [760] 68 69 6a 00 00 00 00 00 00 00 00 00 00 ``` Created on 2018-10-12 by the [reprex package](https://reprex.tidyverse.org) (v0.2.1.9000) Author: Romain Francois <romain@purrple.cat> Closes #2749 from romainfrancois/ARROW-3490/stream-2 and squashes the following commits: ce4ec06 <Romain Francois> type promotion for types that do not exist in R 338f75f <Romain Francois> More flexible read_table 5cb8dbd <Romain Francois> more flexible read_record_batch with various dispatch 072b7f0 <Romain Francois> + BufferOutputStream f301f1e <Romain Francois> ⏪ to write_record_batch, write_table and write_arrow a17b375 <Romain Francois> Trying less double dispatch 9b9a6b8 <Romain Francois> roxygen 2ae2ab3 <Romain Francois> - to_file and to_stream - write_arrow + stream.data.frame 8d0e581 <Romain Francois> stream.arrow::Table methods e1f62cc <Romain Francois> R6 arrrow::io::FixedSizeBufferWriter 80ea2b7 <Romain Francois> R6 arrow::io::MockOutputSream a93933a <Romain Francois> +close_on_exit, local_tempfile ac20df3 <Romain Francois> + stream
- Loading branch information