Skip to content

Added summary of queued and running compactions to coordinator#5989

Merged
dlmarion merged 7 commits intoapache:2.1from
dlmarion:5965-coordinator-status-logging
Nov 25, 2025
Merged

Added summary of queued and running compactions to coordinator#5989
dlmarion merged 7 commits intoapache:2.1from
dlmarion:5965-coordinator-status-logging

Conversation

@dlmarion
Copy link
Copy Markdown
Contributor

This commit adds periodic logging of queued and running external compaction information to the coordinator. The logging is emitted by a new class, CoordinatorSummaryLogger, so that users can easily redirect this log to a new file in the logging configuration.

At each interval this new class will log the number of compactions running for each table, and will log the number of compactors, queued compactions and running compactions for each compaction queue.

The number of queued compactions is an estimate as each tablet server only reports up to 100 different compaction priorities to conserve memory space in the Coordinator (see ExternalCompactionExecutor.summarize).

The metrics are a more accurate source of the number of queued external compactions, but that requires aggregating all of the METRICS_MAJC_QUEUED Meters from all of the TabletServers.

Related to #5965

This commit adds periodic logging of queued and running external
compaction information to the coordinator. The logging is emitted
by a new class, CoordinatorSummaryLogger, so that users can easily
redirect this log to a new file in the logging configuration.

At each interval this new class will log the number of compactions
running for each table, and will log the number of compactors,
queued compactions and running compactions for each compaction queue.

The number of queued compactions is an estimate as each tablet server
only reports up to 100 different compaction priorities to conserve
memory space in the Coordinator (see ExternalCompactionExecutor.summarize).

The metrics are a more accurate source of the number of queued external
compactions, but that requires aggregating all of the METRICS_MAJC_QUEUED
Meters from all of the TabletServers.

Related to apache#5965
@dlmarion dlmarion added this to the 2.1.5 milestone Nov 24, 2025
@dlmarion dlmarion self-assigned this Nov 24, 2025

CompactionCoordinator.RUNNING_CACHE.values().forEach(rc -> {
TableId tid = KeyExtent.fromThrift(rc.getJob().getExtent()).tableId();
String tableName = tableMap.getOrDefault(tid, "Unmapped table id: " + tid.canonical());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an Unmapped table id ever going to be something other than a deleted table?

Ran a test with this and was able to show that a long running compaction can persist past a table deletion action.

2025-11-24T21:18:35,555 104 [manager.EventCoordinator] INFO : Created table ExternalCompaction_1_IT_testGetActiveCompactions0
2025-11-24T21:19:35,222 37 [coordinator.CoordinatorSummaryLogger] INFO : Queue DCQ8: compactors: 1, queued majc (minimum, possibly higher): 1, running majc: 1
2025-11-24T21:19:35,222 37 [coordinator.CoordinatorSummaryLogger] INFO : Running compactions for table ExternalCompaction_1_IT_testGetActiveCompactions0: 1
2025-11-24T21:19:35,329 180 [fate.Fate] INFO : Seeding FATE[1f4521d130f4a032] TABLE_DELETE Delete table ExternalCompaction_1_IT_testGetActiveCompactions0(8)
2025-11-24T21:19:36,224 39 [coordinator.CoordinatorSummaryLogger] INFO : Queue DCQ8: compactors: 1, queued majc (minimum, possibly higher): 0, running majc: 1
2025-11-24T21:19:36,224 39 [coordinator.CoordinatorSummaryLogger] INFO : Running compactions for table Unmapped table id: 8: 1
2025-11-24T21:20:10,235 38 [coordinator.CoordinatorSummaryLogger] INFO : Queue DCQ8: compactors: 1, queued majc (minimum, possibly higher): 0, running majc: 1
2025-11-24T21:20:10,235 38 [coordinator.CoordinatorSummaryLogger] INFO : Running compactions for table Unmapped table id: 8: 1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly a deleted table. I'm not sure if it might also happen when a new table is created in some scenario, I have not looked at the underlying code.

Copy link
Copy Markdown
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good. I only see trivial things.

// This map only contains the highest priority for each tserver. So when tservers have
// other priorities that need to compact or have more than one compaction for a
// priority level this count will be lower than the actual number of queued.
CompactionCoordinator.QUEUE_SUMMARIES.QUEUES.getOrDefault(q, EMPTY).values().stream()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the QUEUES were of type SortedMap instead of TreeMap, you could use Collections.emptySortedMap() instead of creating your own EMPTY one.

dlmarion and others added 4 commits November 25, 2025 07:20
…o/coordinator/CompactionCoordinator.java

Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
…o/coordinator/CompactionCoordinator.java

Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
…o/coordinator/CoordinatorSummaryLogger.java

Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
@dlmarion dlmarion merged commit bd184af into apache:2.1 Nov 25, 2025
8 checks passed
@dlmarion dlmarion deleted the 5965-coordinator-status-logging branch November 25, 2025 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants