-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
Description
We are seeing ever worsening performance in couch 3.3.3
Description
Over time queries to couch take longer and eventually start return 500s and we see perf continue to degrade.
We've found a process with a growing mailbox:
process_info(pid(0,289,0)).
[{registered_name,couch_server_10},
{current_function,{erts_internal,await_result,1}},
{initial_call,{proc_lib,init_p,5}},
{status,running},
{message_queue_len,69312},
{links,[<0.18892.3178>,<0.25843.3474>,<0.28109.2975>,
<0.30613.3209>,<0.32351.3494>,<0.31224.3509>,<0.30413.3250>,
<0.27158.3496>,<0.28042.3560>,<0.19662.3364>,<0.22065.3445>,
<0.22667.3591>,<0.20881.3172>,<0.19280.3563>,<0.19642.3408>,
<0.19654.3416>,<0.19041.3365>,<0.9166.3328>,<0.17046.2913>,
<0.17074.3321>,<0.17825.3408>|...]},
{dictionary,[{'$ancestors',[couch_primary_services,
couch_sup,<0.256.0>]},
{'$initial_call',{couch_server,init,1}}]},
{trap_exit,true},
{error_handler,error_handler},
{priority,normal},
{group_leader,<0.255.0>},
{total_heap_size,365113},
{heap_size,46422},
{stack_size,45},
{reductions,99576710041},
{garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
{min_bin_vheap_size,46422},
{min_heap_size,233},
{fullsweep_after,65535},
{minor_gcs,16048}]},
{suspending,[]}]
Looking at the linked processes we see a lot of db updates appearing to be stuck in do_call:
[{current_function,{gen,do_call,4}},
{initial_call,{proc_lib,init_p,5}},
{status,waiting},
{message_queue_len,2},
{links,[<7672.6048.3484>,<7672.289.0>]},
{dictionary,[{'$ancestors',[<7672.25449.3401>]},
{io_priority,{db_update,<<"shards/00000000-ffffffff/account/0d/a0/175b29c55f3888839e47caf2821e-202502.1738041440">>}},
{last_id_merged,<<"202502-ledgers_monthly_rollover">>},
{'$initial_call',{couch_db_updater,init,1}},
{idle_limit,61000}]},
{trap_exit,false},
{error_handler,error_handler},
{priority,normal},
{group_leader,<7672.255.0>},
{total_heap_size,4185},
{heap_size,4185},
{stack_size,44},
{reductions,21157},
{garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,
size => 0}},
{min_bin_vheap_size,46422},
{min_heap_size,233},
{fullsweep_after,65535},
{minor_gcs,0}]},
{suspending,[]}]
Steps to Reproduce
This develops over time but appears correlated with a number of tasks we run at the beginning of the month
Expected Behaviour
Don't lock up.
Your Environment
- CouchDB version used: 3.3.3 / OTP 24
- Browser name and version: N/A
- Operating system and version: Centos
Additional Context
Its a 3-node cluster and we see this on all three nodes.