feat: Allow reading from stdin with schema inference #10

Piped input does not support Seek out of the box Seek is required to infer the schema To work around this, we buffer the input iff input file does not support seek Only the number of lines actually used to infer the schema are buffered to allow reading of files larger than memory This works, because the arrow crate only seeks twice: 1. To check whether seek is supported at the start 2. To reset to the start of the file after schem inference The seekable buffer wrapper is only used when necessary There should be no performance penalty for currently supported use cases Use cases: ```sh cat test.csv | csv2parquet /dev/stdin test.parquet zstdcat test.csv.zst | csv2parquet /dev/stdin test.parquet ``` Resolves domoritz#3 feat: refactor SeekableReader into arrow-tools lib create Also refactor schema matching to make it less verbose by using map_err instead of match, see json2parquet for before/after

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Allow reading from stdin with schema inference #10

feat: Allow reading from stdin with schema inference #10

Commits on Mar 6, 2023

Commits on Apr 7, 2023