-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database compaction stuck in a loop #2941
Comments
Looks like a real bug. Thanks for the info on a workaround! |
Thanks for the report @markusd and good analysis, @wohali. It does look like a bug. If this was a compaction file left over from before the upgrade to 3.1.0, this is what might have happened: Previously the @davisp, what do you think, is that about right? |
Agreed this is a real bug and the cause is the commit you pointed out (123bf82). The state is then passed as an option, and set_options (rightly) crashes if passed an unexpected option. |
Thanks Markus, I met the same issue and solved as you suggested. Much appreciate!!!!! |
Merged the fix to 3.x branch: #3001 |
we're seeing a similar issue with random compactions on shards when upgrading from 2.x to 3.1.1. But might be completely unrelated. The compaction metadata file blew up to about 500gb over 24 hours for a shard that is about 30gb constantly hitting this error. Similarly we're also seeing large disk io on the nodes this is happening on versus nodes this is not happening on.
|
Description
3 of 9 nodes in my CouchDB cluster were stuck in a loop during database compaction. These nodes were submitting significantly more disk IO than the nodes not having the problem, which is how I noticed this in the first place. I do not think
_active_tasks
had the tasks in it, at least not consistently (maybe they appeared and disappeared intermittently).The compaction files were very old compared to the current time of the database file:
The logs showed a loop of this every few seconds:
I deleted the
.compact
files manually, which restarted the compaction. It ran to completion on all nodes without issues.Steps to Reproduce
I do not know, I simply replicated a database (6 million docs, 250 GB, q=18, n=3) and the automatic background compaction started and got stuck.
Expected Behaviour
Compaction to not get stuck and not to consume loads of disk IO without making progress.
Your Environment
The text was updated successfully, but these errors were encountered: