Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Default Compaction strategy is sub-optimal #1033
Consider a tablet with the following files. If the compaction ratio is 3 then all files would meet the criteria for compaction. However if the max files to compact is 10, then the files C4 and F[5-d] will be selected for compaction. This is very suboptimal over time. It would be much better if a subset of files that met the compaction ratio criteria were returned. For example C[2-4] and F[5-b] could be selected, which is 10 files that meet the ratio criteria. Another possibility is only selecting only the F files, which meet the criteria and is less than max files.
The problem is the code finds a set of file that meet the compaction ratio criteria and then takes the 10 smallest files from that set. It would be much better if the code searched for a set of files that meet the ratio criteria and is less than or equal to the max files. I think doing this could result in much less work over time.