-
Notifications
You must be signed in to change notification settings - Fork 474
Description
Describe the bug
We are seeing cases where a compaction fails and as a result the tservers get into a bad state. When this happens, the tablet gets stuck in an infinite loop. I will lay out the progression here. I have pulled out anything that was not related to the tablet in question.
As I was filling in this issue, I noticed that I think the root cause for the failure may have been updates to Compaction planner options configurations.
- Compacting on i.default.small for SELECTOR from [File name list] size 1MB
- Stopped Compaction executor i.default.small
- Updated compaction server -- change was to up the number of threads
- tablet.DatafileManager reports ERROR:Failure updating files after major compaction Optional[New C file path] [list of file paths]
- java.lang.InterruptedException thrown at TabletserverBatchWriter:506
- clientImple.TabletServerBatchWriter DEBUG: faile to send mutations to . Interrupted while waiting for IO on channel Total timeout is 120000,115355 millis timeout left
- tablet.CompactableImple DEBUG: Selected files not in all files [files in list of file paths from 4] [list of tablet files]
7 has been repeating since. I see this as 2 problems. 1) Updating compaction settings should tank the running compactions and 2) Once it has, it shouldnt prevent compactions from occuring on that tablet
Versions (OS, Maven, Java, and others, as appropriate):
- 2.1.3-SNAPSHOT
- Java 11
- CentOS 7
To Reproduce
Steps to reproduce the behavior (or a link to an example repository that reproduces the problem):
I think this can be reproduced by changing compaction settings while a compaction is in progress, but not entirely sure
Expected behavior
Chaning compaction settings is either not zookeeper mutable or doesnt break things when settings are changed.