Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-8311: [C++] Add push style stream format reader
This change adds the following push style reader classes: * ipc::MessageEmitter * ipc::RecordBatchStreamEmitter Push style readers don't read data from stream directly. They receive already read data by users. This style is useful with event driven style IO API. We can't read data from stream directly in event driven style IO API. We just receive already read data from event driven style IO API like: void on_read(const uint8_t* data, size_t data_size) { process_data(data, data_size); } register_read_event(on_read); run_event_loop(); We can't use the current reader API with event driven style IO API but we can use this push style reader with event driven style IO API. The current Message reader is changed to use ipc::MessageEmitter internally. So we don't have duplicated reader implementation. And no performance regression with our benchmark. Before: Running release/arrow-ipc-read-write-benchmark Run on (12 X 4600 MHz CPU s) CPU Caches: L1 Data 32K (x6) L1 Instruction 32K (x6) L2 Unified 256K (x6) L3 Unified 12288K (x1) Load Average: 0.85, 0.84, 0.65 ----------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------------------- ReadRecordBatch/1/real_time 886 ns 886 ns 774286 bytes_per_second=1102.15G/s ReadRecordBatch/4/real_time 1601 ns 1601 ns 436258 bytes_per_second=610.078G/s ReadRecordBatch/16/real_time 4819 ns 4820 ns 143568 bytes_per_second=202.663G/s ReadRecordBatch/64/real_time 18291 ns 18296 ns 38586 bytes_per_second=53.3893G/s ReadRecordBatch/256/real_time 84852 ns 84872 ns 8317 bytes_per_second=11.5091G/s ReadRecordBatch/1024/real_time 341091 ns 341168 ns 2049 bytes_per_second=2.86306G/s ReadRecordBatch/4096/real_time 1368049 ns 1368361 ns 511 bytes_per_second=730.968M/s ReadRecordBatch/8192/real_time 2676778 ns 2677341 ns 265 bytes_per_second=373.584M/s After: Running release/arrow-ipc-read-write-benchmark Run on (12 X 4600 MHz CPU s) CPU Caches: L1 Data 32K (x6) L1 Instruction 32K (x6) L2 Unified 256K (x6) L3 Unified 12288K (x1) Load Average: 0.88, 0.85, 0.66 ----------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------------------- ReadRecordBatch/1/real_time 891 ns 891 ns 769579 bytes_per_second=1095.57G/s ReadRecordBatch/4/real_time 1599 ns 1599 ns 435756 bytes_per_second=610.746G/s ReadRecordBatch/16/real_time 4834 ns 4835 ns 144374 bytes_per_second=202.027G/s ReadRecordBatch/64/real_time 18204 ns 18206 ns 38190 bytes_per_second=53.6465G/s ReadRecordBatch/256/real_time 84142 ns 84154 ns 8309 bytes_per_second=11.6061G/s ReadRecordBatch/1024/real_time 343105 ns 343148 ns 2035 bytes_per_second=2.84625G/s ReadRecordBatch/4096/real_time 1399287 ns 1399484 ns 511 bytes_per_second=714.65M/s ReadRecordBatch/8192/real_time 2641529 ns 2641845 ns 263 bytes_per_second=378.569M/s Closes #6804 from kou/cpp-record-batch-emitter Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>
- Loading branch information