-
Notifications
You must be signed in to change notification settings - Fork 477
Open
Labels
enhancementThis issue describes a new feature, improvement, or optimization.This issue describes a new feature, improvement, or optimization.
Description
Is your feature request related to a problem? Please describe.
The write operations to an rfile are serialized. When writing large rfiles in map reduce jobs this can produces very large tales to the jobs. The bottleneck is often compression rather than i/o.
Describe the solution you'd like
Utilizing multiple threads to process multiple blocks in parallel could dramatically improve write performance. Having a dedicated thread to write completed blocks in order would still be necessary, but should be possible. This could be scaled based on available memory for buffering.
Describe alternatives you've considered
Adding pipelines to the existing code could be a smaller lift, and have a big performance improvement as well.
Metadata
Metadata
Assignees
Labels
enhancementThis issue describes a new feature, improvement, or optimization.This issue describes a new feature, improvement, or optimization.