-
Notifications
You must be signed in to change notification settings - Fork 477
Description
Is your feature request related to a problem? Please describe.
User compactions of a tablet can be scheduled over multiple compaction jobs. For example if a tablet has 50 files and a user compaction is initiated for it, then the following could happen.
- 50 files are selected for user compaction
- A compaction job compacts 30 of the 50 files. On completion 30 files are removed from selected set and one is added. So now there are 21 files in the selected set.
- A compaction job compacts the remaining 21 files, completing the user compaction for the tablet.
If a user sets per compaction config to use expensive compression, they may not want to apply this to the intermediate compaction in step 2 above as this is a short lived file. In the CompactionConfigurer there is currently no way for it to know if a compaction is intermediate or not.
Describe the solution you'd like
Add something to CompactionConfigurer.InputParameters that can indicate if a compaction is intermediate or not. One possible way to do this would be to add a method like the following to InputParameters.
/**
* If this a user compaction, then returns the selected set of files. For user compactions, when
* getInputFiles().equals(getSelectedFiles()) is true then this is the final compaction in user compaction,
* when its not true then this is an intermediate compaction. For system compactions there is no selected set of files so the empty set is returned.
*/
public Collection<CompactableFile> getSelectedFiles();Describe alternatives you've considered
Initially considered adding a method that returns a boolean to indicate if the compaction is intermediate of not. Thought making the set of selected files available could be more generally useful while still allowing to be known if a compaction is intermediate or not.