Skip to content

RFile writes should utilize multiple threads #4124

@FineAndDandy

Description

@FineAndDandy

Is your feature request related to a problem? Please describe.
The write operations to an rfile are serialized. When writing large rfiles in map reduce jobs this can produces very large tales to the jobs. The bottleneck is often compression rather than i/o.

Describe the solution you'd like
Utilizing multiple threads to process multiple blocks in parallel could dramatically improve write performance. Having a dedicated thread to write completed blocks in order would still be necessary, but should be possible. This could be scaled based on available memory for buffering.

Describe alternatives you've considered
Adding pipelines to the existing code could be a smaller lift, and have a big performance improvement as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementThis issue describes a new feature, improvement, or optimization.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions