Skip to content

fixes bug w/ deletion of compaction thread pool#3750

Merged
keith-turner merged 2 commits intoapache:2.1from
keith-turner:accumulo-3749
Sep 14, 2023
Merged

fixes bug w/ deletion of compaction thread pool#3750
keith-turner merged 2 commits intoapache:2.1from
keith-turner:accumulo-3749

Conversation

@keith-turner
Copy link
Contributor

Fixes a bug where a change in compaction config that deleted a compaction thread pool would leave any tablets compacting on that thread pool in a bad state. The problem was caused by shutdownNow being called on the thread pool which could cause an interrupted exception during commiting the compaction.

This change allows compactions running on the thread pool to complete while canceling any compactions queued on the thread pool.

fixes #3749

Fixes a bug where a change in compaction config that deleted a
compaction thread pool would leave any tablets compacting on that thread
pool in a bad state. The problem was caused by shutdownNow being called
on the thread pool which could cause an interrupted exception during
commiting the compaction.

This change allows compactions running on the thread pool to complete
while canceling any compactions queued on the thread pool.

fixes apache#3749
@keith-turner
Copy link
Contributor Author

This change could result in over subscription of resources. For example if a thread pool A with 10 threads is replaced by thread pool B with 15 threads, then this change will let anything running on A complete and make B available for running new compaction. This could result in 25 compactions running for a brief period. This change could be improved to leverage existing compaction cancellation mechanisms (not an thread interrupt) to request the 10 compactions running on A stop their-self. Was not sure about making this change, thinking it would probably be for the best. It may waste work, but would make it easier to reason about resources.

@dlmarion
Copy link
Contributor

dlmarion commented Sep 12, 2023

Not cancelling majc would cause Tablet.completeClose to wait until it's done?

Update: I looked, it doesn't.

@keith-turner
Copy link
Contributor Author

Updated to cancel running compaction in the deleted thread pool in c2fd9de. One of the tserver logs from running the new IT added in the PR looks like the following.

$ egrep "cancel|Compacted|Compacting|Stopped" TabletServer_1804111829.out 
2023-09-12T16:29:07,147 [tablet.files] DEBUG: Compacting 1;r:0020;r:0010 on i.cs1.small for USER from [F0000086.rf] size 261 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:07,147 [tablet.files] DEBUG: Compacting 1;r:0030;r:0020 on i.cs1.small for USER from [F0000084.rf] size 261 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:08,198 [tablet.files] DEBUG: Compacted 1;r:0020;r:0010 for USER created VOLUME/tables/1/t-0000051/A000009i.rf from [F0000086.rf]
2023-09-12T16:29:08,198 [tablet.files] DEBUG: Compacted 1;r:0030;r:0020 for USER created VOLUME/tables/1/t-0000052/A000009j.rf from [F0000084.rf]
2023-09-12T16:29:08,198 [tablet.files] DEBUG: Compacting 1;r:0990;r:0980 on i.cs1.small for USER from [F000009g.rf] size 262 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:08,198 [tablet.files] DEBUG: Compacting 1;r:0070;r:0060 on i.cs1.small for USER from [F0000087.rf] size 260 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:09,215 [tablet.files] DEBUG: Compacted 1;r:0990;r:0980 for USER created VOLUME/tables/1/t-000007q/A000009k.rf from [F000009g.rf]
2023-09-12T16:29:09,215 [tablet.files] DEBUG: Compacted 1;r:0070;r:0060 for USER created VOLUME/tables/1/t-0000056/A000009l.rf from [F0000087.rf]
2023-09-12T16:29:09,215 [tablet.files] DEBUG: Compacting 1;r:0970;r:0960 on i.cs1.small for USER from [F000009h.rf] size 262 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:09,215 [tablet.files] DEBUG: Compacting 1;r:0920;r:0910 on i.cs1.small for USER from [F000009f.rf] size 263 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:10,236 [tablet.files] DEBUG: Compacted 1;r:0920;r:0910 for USER created VOLUME/tables/1/t-000007j/A000009n.rf from [F000009f.rf]
2023-09-12T16:29:10,236 [tablet.files] DEBUG: Compacted 1;r:0970;r:0960 for USER created VOLUME/tables/1/t-000007o/A000009m.rf from [F000009h.rf]
2023-09-12T16:29:10,237 [tablet.files] DEBUG: Compacting 1;r:0870;r:0860 on i.cs1.small for USER from [F000009d.rf] size 262 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:10,237 [tablet.files] DEBUG: Compacting 1;r:0890;r:0880 on i.cs1.small for USER from [F000009e.rf] size 263 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:10,238 [compactions.InternalCompactionExecutor] DEBUG: Stopped compaction executor i.cs1.small running:2 canceled:42
2023-09-12T16:29:10,239 [compactions.InternalCompactionExecutor] DEBUG: Stopped compaction executor i.cs1.medium running:0 canceled:0
2023-09-12T16:29:10,239 [compactions.InternalCompactionExecutor] DEBUG: Stopped compaction executor i.cs1.large running:0 canceled:0
2023-09-12T16:29:10,246 [compaction.FileCompactor] DEBUG: Compaction canceled 1;r:0890;r:0880
2023-09-12T16:29:10,246 [compaction.FileCompactor] DEBUG: Compaction canceled 1;r:0870;r:0860
2023-09-12T16:29:10,247 [tablet.CompactableImpl] DEBUG: Compaction canceled 1;r:0870;r:0860 
2023-09-12T16:29:10,247 [tablet.CompactableImpl] DEBUG: Compaction canceled 1;r:0890;r:0880 
2023-09-12T16:29:10,248 [tablet.files] DEBUG: Compacting 1;r:0890;r:0880 on i.cs1.little for USER from [F000009e.rf] size 263 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:10,248 [tablet.files] DEBUG: Compacting 1;r:0870;r:0860 on i.cs1.little for USER from [F000009d.rf] size 262 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:10,272 [tablet.files] DEBUG: Compacting 1;r:0080;r:0070 on i.cs1.little for USER from [F0000085.rf] size 259 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:10,272 [tablet.files] DEBUG: Compacting 1;r:0090;r:0080 on i.cs1.little for USER from [F0000088.rf] size 259 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:10,272 [tablet.files] DEBUG: Compacting 1;r:0100;r:0090 on i.cs1.little for USER from [F0000089.rf] size 263 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:10,273 [tablet.files] DEBUG: Compacting 1;r:0120;r:0110 on i.cs1.little for USER from [F000008a.rf] size 261 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:10,273 [tablet.files] DEBUG: Compacting 1;r:0130;r:0120 on i.cs1.little for USER from [F000008b.rf] size 262 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]
2023-09-12T16:29:10,273 [tablet.files] DEBUG: Compacting 1;r:0140;r:0130 on i.cs1.little for USER from [F000008d.rf] size 264 bytes config [iterators=[name:SlowIterator, priority:100, class:org.apache.accumulo.test.functional.SlowIterator, properties:{sleepTime=100}]]

@EdColeman
Copy link
Contributor

Would it be worth providing feedback to the user that removing the compaction queue will result in cancelling running compactions? If I was changing the value it seems like I'd want to go ahead - I'm changing it for a reason. But, If I know that running compactions are cancelled on the change, I would know to resubmit any that I may have cared about?

@keith-turner
Copy link
Contributor Author

But, If I know that running compactions are cancelled on the change, I would know to resubmit any that I may have cared about?

If a tablets need to compact then it will automatically start running on the new thread pool. Should not need to resubmit anything.

@keith-turner keith-turner merged commit 98fc87d into apache:2.1 Sep 14, 2023
@EdColeman
Copy link
Contributor

If a tablets need to compact then it will automatically start running on the new thread pool.

So user compactions are not impacted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accumulo not handling compaction failures well, and results in tablet never compacting

4 participants