Written Parquet file way bigger than input files 

**Which part is this question about**
The parquet file writer usage

**Describe your question**
Hi, I'm looking if parqet and arrow could fit a usecase of mine, but I've ran into a strange issue, for which I can find now answer in the documentation. I have two input files in txt format, where each record spans 4 lines. I have a parser that reads that just fine, and want to convert that format to a parquet file. The two input files are combined around 600MB, but when I write these to a parquet file, the resulting file is nearly 5GB, it also consumes around 6/7GB memory while writing the files. I have turned on compression.

```rust
let message_type = "
    message Schema {
        REQUIRED BINARY id (UTF8);
        REQUIRED BINARY header (UTF8);
        REQUIRED BINARY sequence (UTF8);
        REQUIRED BINARY quality (UTF8);
    }
";

let schema = Arc::new(parse_message_type(message_type).unwrap());
let props = Arc::new(
    WriterProperties::builder()
        .set_compression(Compression::SNAPPY)
        .build(),
);
```

My rust configuration for the writer.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Written Parquet file way bigger than input files #1627

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Written Parquet file way bigger than input files #1627

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions