Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add smoosh queue persistence #3766

Merged
merged 1 commit into from Mar 15, 2022

Conversation

noahshaw11
Copy link
Contributor

@noahshaw11 noahshaw11 commented Sep 25, 2021

Overview

When smoosh is started, it starts with an empty queue and no knowledge of previously ongoing and potentially resumable compaction jobs. Frequently, this causes difficulties when a node spontaneously reboots or is intentionally rebooted and the node is experiencing a high disk usage situation.

This PR periodically saves the state for smoosh to a file (every 3 minutes by default) and reloads it on (re)start.

Testing recommendations

make eunit apps=smoosh

On an abrupt shutdown or crash, smoosh should resume compaction from where it left off.

Related Issues or Pull Requests

Checklist

  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation

@noahshaw11 noahshaw11 changed the title Initial commit Add smoosh queue persistence Sep 25, 2021
@noahshaw11 noahshaw11 marked this pull request as draft September 25, 2021 00:26
@noahshaw11 noahshaw11 force-pushed the add-smoosh-queue-persistence branch 4 times, most recently from 91a1ae6 to 3aa3b7f Compare October 18, 2021 21:21
@noahshaw11 noahshaw11 marked this pull request as ready for review October 18, 2021 21:23
@noahshaw11 noahshaw11 force-pushed the add-smoosh-queue-persistence branch 3 times, most recently from e94e35e to ec15331 Compare October 29, 2021 18:55
@noahshaw11 noahshaw11 force-pushed the add-smoosh-queue-persistence branch 4 times, most recently from 0f2be02 to 33e997d Compare October 30, 2021 00:00
@noahshaw11 noahshaw11 force-pushed the add-smoosh-queue-persistence branch 2 times, most recently from 51ca470 to f850a5b Compare February 15, 2022 23:19
@noahshaw11
Copy link
Contributor Author

noahshaw11 commented Feb 17, 2022

Logs of should_persist_queue test:

[notice] 2022-02-17T19:26:51.507665Z nonode@nohost <0.1666.0> -------- smoosh_channel Persisting "ratio_dbs" state, Active: [], Starting: [], Waiting: {priority_queue,"ratio_dbs",#{<<"eunit-test-db-29f9313a853943a7e2ada21daa7d3b14">> => {1850636.0843373493,{-576460747195516000,-576460752303423477}}},{1,{{1850636.0843373493,{-576460747195516000,-576460752303423477}},{<<"eunit-test-db-29f9313a853943a7e2ada21daa7d3b14">>,1850636.0843373493},nil,nil}}}
[notice] 2022-02-17T19:26:51.512522Z nonode@nohost <0.1712.0> -------- smoosh_priority_queue Successfully restored state file /Users/ncshaw/src/couchdb/tmp/data/ratio_dbs.waiting
[notice] 2022-02-17T19:26:51.512628Z nonode@nohost <0.1712.0> -------- ~~ Q0: {priority_queue,"ratio_dbs",#{<<"eunit-test-db-29f9313a853943a7e2ada21daa7d3b14">> => {1850636.0843373493,{-576460747195516000,-576460752303423477}}},{1,{1850636.0843373493,{<<"eunit-test-db-29f9313a853943a7e2ada21daa7d3b14">>,{-576460747195516000,-576460752303423477}},nil,nil}}}
[info] 2022-02-17T19:26:51.513289Z nonode@nohost <0.44.0> -------- Application smoosh exited with reason: stopped
[info] 2022-02-17T19:26:51.513355Z nonode@nohost <0.44.0> -------- Application smoosh exited with reason: stopped
[info] 2022-02-17T19:26:51.513385Z nonode@nohost <0.44.0> -------- Application smoosh exited with reason: stopped
[info] 2022-02-17T19:26:51.513408Z nonode@nohost <0.44.0> -------- Application smoosh exited with reason: stopped
[info] 2022-02-17T19:26:51.513444Z nonode@nohost <0.44.0> -------- Application smoosh exited with reason: stopped
[notice] 2022-02-17T19:26:51.513884Z nonode@nohost <0.1712.0> -------- smoosh_priority_queue Successfully restored state file /Users/ncshaw/src/couchdb/tmp/data/ratio_dbs.waiting
[notice] 2022-02-17T19:26:51.513998Z nonode@nohost <0.1712.0> -------- ~~ Q1: {priority_queue,"ratio_dbs",#{<<"eunit-test-db-29f9313a853943a7e2ada21daa7d3b14">> => {1850636.0843373493,{-576460747195516000,-576460752303423477}}},{1,{1850636.0843373493,{<<"eunit-test-db-29f9313a853943a7e2ada21daa7d3b14">>,{-576460747195516000,-576460752303423477}},nil,nil}}}
.
.
.
[notice] 2022-02-17T19:26:51.516256Z nonode@nohost <0.1727.0> -------- ratio_dbs: Starting compaction for eunit-test-db-29f9313a853943a7e2ada21daa7d3b14 (priority {-576460747195516000,-576460752303423477})
[info] 2022-02-17T19:26:51.516284Z nonode@nohost <0.1709.0> -------- Starting compaction for db "eunit-test-db-29f9313a853943a7e2ada21daa7d3b14" at 1
[notice] 2022-02-17T19:26:51.516309Z nonode@nohost <0.1727.0> -------- smoosh_channel Persisting "ratio_dbs" state, Active: [], Starting: [{#Ref<0.981995199.3249012738.71442>,<<"eunit-test-db-29f9313a853943a7e2ada21daa7d3b14">>}], Waiting: {priority_queue,"ratio_dbs",#{},{0,nil}}

The priority queue was successfully restored on smoosh restart and the compaction job was started.

Copy link
Contributor

@iilyak iilyak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Great work!!!

❯ make eunit apps=smoosh
==> smoosh (compile)
==> rel (compile)
==> couchdb (compile)
==> couchdb (setup_eunit)
Writing tmp/etc/default_eunit.ini
Writing tmp/etc/local_eunit.ini
Writing tmp/etc/eunit.ini
Writing tmp/etc/vm.args
==> smoosh (eunit)
Compiled src/smoosh_utils.erl
Compiled src/smoosh.erl
Compiled src/smoosh_priority_queue.erl
Compiled src/smoosh_sup.erl
Compiled src/smoosh_app.erl
/Users/iilyak@ca.ibm.com/dev/couchdb/src/smoosh/test/smoosh_priority_queue_tests.erl:0: Warning: function prop_inverse_test_/0 already exported
Compiled test/smoosh_priority_queue_tests.erl
Compiled src/smoosh_channel.erl
Compiled test/smoosh_tests.erl
/Users/iilyak@ca.ibm.com/dev/couchdb/src/smoosh/src/smoosh_server.erl:261: Warning: erlang:get_stacktrace/0 is deprecated and will be removed in OTP 24; use use the new try/catch syntax for retrieving the stack backtrace
Compiled src/smoosh_server.erl
======================== EUnit ========================
module 'smoosh_priority_queue'
  module 'smoosh_priority_queue_tests'
    smoosh priority queue test
      smoosh_priority_queue_tests: fun.prop_inverse_test_...ok
      smoosh_priority_queue_tests: fun.no_halt_on_corrupted_file_test...ok
      smoosh_priority_queue_tests: fun.no_halt_on_missing_file_test...ok
      [done in 0.009 s]
    smoosh_priority_queue_tests:38: prop_inverse_test_...ok
    smoosh_priority_queue_tests: no_halt_on_corrupted_file_test...ok
    smoosh_priority_queue_tests: no_halt_on_missing_file_test...ok
    [done in 1.920 s]
  [done in 1.920 s]
module 'smoosh_utils'
module 'smoosh_server'
  Test config updates
    smoosh_server:515: t_restart_config_listener...[0.110 s] ok
    [done in 0.114 s]
  smoosh_server:530: t_ratio_view...ok
  smoosh_server:540: t_slack_view...ok
  smoosh_server:550: t_no_data_view...ok
  smoosh_server:560: t_below_min_priority_view...ok
  smoosh_server:570: t_below_min_size_view...ok
  smoosh_server:580: t_timeout_view...ok
  smoosh_server:590: t_missing_view...ok
  smoosh_server:598: t_invalid_view...ok
  [done in 1.630 s]
module 'smoosh_app'
module 'smoosh_sup'
module 'smoosh'
  module 'smoosh_tests'
    Testing smoosh
      Should persist queue state
        smoosh_tests:94: should_persist_queue...[0.045 s] ok
        [done in 0.048 s]
      Various channels tests
        smoosh_tests:82: should_enqueue...[0.091 s] ok
        [done in 0.094 s]
      [done in 0.416 s]
    [done in 0.682 s]
  [done in 0.682 s]
module 'smoosh_channel'
=======================================================
  All 17 tests passed.
==> rel (eunit)
==> couchdb (eunit)

@noahshaw11 noahshaw11 force-pushed the add-smoosh-queue-persistence branch 2 times, most recently from 85b155a to f587f35 Compare March 14, 2022 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants