Added summary of queued and running compactions to coordinator#5989
Merged
dlmarion merged 7 commits intoapache:2.1from Nov 25, 2025
Merged
Added summary of queued and running compactions to coordinator#5989dlmarion merged 7 commits intoapache:2.1from
dlmarion merged 7 commits intoapache:2.1from
Conversation
This commit adds periodic logging of queued and running external compaction information to the coordinator. The logging is emitted by a new class, CoordinatorSummaryLogger, so that users can easily redirect this log to a new file in the logging configuration. At each interval this new class will log the number of compactions running for each table, and will log the number of compactors, queued compactions and running compactions for each compaction queue. The number of queued compactions is an estimate as each tablet server only reports up to 100 different compaction priorities to conserve memory space in the Coordinator (see ExternalCompactionExecutor.summarize). The metrics are a more accurate source of the number of queued external compactions, but that requires aggregating all of the METRICS_MAJC_QUEUED Meters from all of the TabletServers. Related to apache#5965
ddanielr
reviewed
Nov 24, 2025
|
|
||
| CompactionCoordinator.RUNNING_CACHE.values().forEach(rc -> { | ||
| TableId tid = KeyExtent.fromThrift(rc.getJob().getExtent()).tableId(); | ||
| String tableName = tableMap.getOrDefault(tid, "Unmapped table id: " + tid.canonical()); |
Contributor
There was a problem hiding this comment.
Is an Unmapped table id ever going to be something other than a deleted table?
Ran a test with this and was able to show that a long running compaction can persist past a table deletion action.
2025-11-24T21:18:35,555 104 [manager.EventCoordinator] INFO : Created table ExternalCompaction_1_IT_testGetActiveCompactions0
2025-11-24T21:19:35,222 37 [coordinator.CoordinatorSummaryLogger] INFO : Queue DCQ8: compactors: 1, queued majc (minimum, possibly higher): 1, running majc: 1
2025-11-24T21:19:35,222 37 [coordinator.CoordinatorSummaryLogger] INFO : Running compactions for table ExternalCompaction_1_IT_testGetActiveCompactions0: 1
2025-11-24T21:19:35,329 180 [fate.Fate] INFO : Seeding FATE[1f4521d130f4a032] TABLE_DELETE Delete table ExternalCompaction_1_IT_testGetActiveCompactions0(8)
2025-11-24T21:19:36,224 39 [coordinator.CoordinatorSummaryLogger] INFO : Queue DCQ8: compactors: 1, queued majc (minimum, possibly higher): 0, running majc: 1
2025-11-24T21:19:36,224 39 [coordinator.CoordinatorSummaryLogger] INFO : Running compactions for table Unmapped table id: 8: 1
2025-11-24T21:20:10,235 38 [coordinator.CoordinatorSummaryLogger] INFO : Queue DCQ8: compactors: 1, queued majc (minimum, possibly higher): 0, running majc: 1
2025-11-24T21:20:10,235 38 [coordinator.CoordinatorSummaryLogger] INFO : Running compactions for table Unmapped table id: 8: 1
Contributor
Author
There was a problem hiding this comment.
Certainly a deleted table. I'm not sure if it might also happen when a new table is created in some scenario, I have not looked at the underlying code.
ctubbsii
approved these changes
Nov 24, 2025
Member
ctubbsii
left a comment
There was a problem hiding this comment.
Seems good. I only see trivial things.
| // This map only contains the highest priority for each tserver. So when tservers have | ||
| // other priorities that need to compact or have more than one compaction for a | ||
| // priority level this count will be lower than the actual number of queued. | ||
| CompactionCoordinator.QUEUE_SUMMARIES.QUEUES.getOrDefault(q, EMPTY).values().stream() |
Member
There was a problem hiding this comment.
If the QUEUES were of type SortedMap instead of TreeMap, you could use Collections.emptySortedMap() instead of creating your own EMPTY one.
…o/coordinator/CompactionCoordinator.java Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
…o/coordinator/CompactionCoordinator.java Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
…o/coordinator/CoordinatorSummaryLogger.java Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit adds periodic logging of queued and running external compaction information to the coordinator. The logging is emitted by a new class, CoordinatorSummaryLogger, so that users can easily redirect this log to a new file in the logging configuration.
At each interval this new class will log the number of compactions running for each table, and will log the number of compactors, queued compactions and running compactions for each compaction queue.
The number of queued compactions is an estimate as each tablet server only reports up to 100 different compaction priorities to conserve memory space in the Coordinator (see ExternalCompactionExecutor.summarize).
The metrics are a more accurate source of the number of queued external compactions, but that requires aggregating all of the METRICS_MAJC_QUEUED Meters from all of the TabletServers.
Related to #5965