Eraser/Reaper/Repl-Src - Unbounded queues and crashing with large loop states #1807

martinsumner · 2021-12-03T13:12:10Z

An attempt was made on a cluster to run a very large aae_fold erase_keys query. The fold ran ok in count mode, indicating 360M keys were available to be erased. However, when run with a change_mode of local, multiple nodes crashed.

The queue within the loop state of the riak_kv_eraser process is unbounded. It is expected that it might have to grow to a large value, as erase_keys folds that push to the queue may be fast, but the deletion process that consumes from the queue is slow. The references on the queue are small - but in this case 60M references were enough to cause memory allocation problems.

The issues is made worse as there is no format_status/2 function to restrict the logging of loop state when the process crashes - so any attempt to record the process crashing would have itself caused significant memory issues.

The riak_kv_reaper process has a similar issue - both an unbounded queue and a missing format_status/2 function.

The riak_kv_replrtq_src process has a bounded queue - but not format_status/2 function.

The text was updated successfully, but these errors were encountered:

martinsumner · 2021-12-03T14:21:55Z

Perhaps for eraser/reaper, rather than simply having a limit and discarding at the limit - disk_log could be used to persist when the limit has been reached, and then should the queue ever be empty a cache log logged erases could be read back from the disk_log.

This allows for very large jobs to be slowly worked on, without running into memory risks. The disk_log folder should be cleaned at startup (rather than potentially re-reading very old logged erases). the disk_log folder is intended to persist strictly for the purpose of preserving memory, not for surviving process restarts.

martinsumner · 2021-12-03T16:15:09Z

#1808

martinsumner · 2021-12-03T23:22:42Z

Related PR in kv_index_tictactree - martinsumner/kv_index_tictactree#103

martinsumner added Bug 3.0.10 labels Dec 3, 2021

martinsumner self-assigned this Mar 9, 2022

martinsumner closed this as completed May 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eraser/Reaper/Repl-Src - Unbounded queues and crashing with large loop states #1807

Eraser/Reaper/Repl-Src - Unbounded queues and crashing with large loop states #1807

martinsumner commented Dec 3, 2021

martinsumner commented Dec 3, 2021

martinsumner commented Dec 3, 2021

martinsumner commented Dec 3, 2021

Eraser/Reaper/Repl-Src - Unbounded queues and crashing with large loop states #1807

Eraser/Reaper/Repl-Src - Unbounded queues and crashing with large loop states #1807

Comments

martinsumner commented Dec 3, 2021

martinsumner commented Dec 3, 2021

martinsumner commented Dec 3, 2021

martinsumner commented Dec 3, 2021