Skip to content

Provide indication of intermediate compactions to CompactionConfigurer. #3937

@keith-turner

Description

@keith-turner

Is your feature request related to a problem? Please describe.

User compactions of a tablet can be scheduled over multiple compaction jobs. For example if a tablet has 50 files and a user compaction is initiated for it, then the following could happen.

  1. 50 files are selected for user compaction
  2. A compaction job compacts 30 of the 50 files. On completion 30 files are removed from selected set and one is added. So now there are 21 files in the selected set.
  3. A compaction job compacts the remaining 21 files, completing the user compaction for the tablet.

If a user sets per compaction config to use expensive compression, they may not want to apply this to the intermediate compaction in step 2 above as this is a short lived file. In the CompactionConfigurer there is currently no way for it to know if a compaction is intermediate or not.

Describe the solution you'd like
Add something to CompactionConfigurer.InputParameters that can indicate if a compaction is intermediate or not. One possible way to do this would be to add a method like the following to InputParameters.

/**
 * If this a user compaction, then returns the selected set of files.  For user compactions, when
 *  getInputFiles().equals(getSelectedFiles()) is true then this is the final compaction in user compaction, 
 * when its not true then this is an intermediate compaction.  For system compactions there is no selected set of files so the empty set is returned.
 */
public Collection<CompactableFile> getSelectedFiles();

Describe alternatives you've considered

Initially considered adding a method that returns a boolean to indicate if the compaction is intermediate of not. Thought making the set of selected files available could be more generally useful while still allowing to be known if a compaction is intermediate or not.

Metadata

Metadata

Assignees

Labels

enhancementThis issue describes a new feature, improvement, or optimization.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions