Skip to content

Debug for sharded index server#3933

Merged
iilyak merged 2 commits intoapache:3.xfrom
cloudant:debug-for-sharded-index-server
Feb 23, 2022
Merged

Debug for sharded index server#3933
iilyak merged 2 commits intoapache:3.xfrom
cloudant:debug-for-sharded-index-server

Conversation

@iilyak
Copy link
Contributor

@iilyak iilyak commented Feb 18, 2022

Overview

Recent refactor of couch_index_server made it a sharded implementation. This means multiple processes are in play now. This made it somewhat harder to bounce in production in case of problems. Also the couch_debug:print_linked_processes(couch_index_server) had an assumption that couch_index_server is a single process.

This PR does two very related things

  1. fixes couch_debug:print_linked_processes(couch_index_server)
  2. adds couch_index_debug module with auxiliary functions to restart couch_index_server processes safely.

Testing recommendations

  1. compile, start, remsh
  2. couch_debug:print_linked_processes(couch_index_server).
|name                                              | reductions | message_queue_len |    memory    |id
|--------------------------------------------------|------------|-------------------|--------------|--
|index_server_1[<0.320.0>]                         |    1127    |         0         |    17000     |
|  couch_secondary_services[<0.312.0>]             |   93596    |         0         |    68632     |
|  couch_event_listener:do_init/3[<0.323.0>]       |    203     |         0         |     2856     |
|    index_server_1[<0.320.0>]                     |    1127    |         0         |    17000     |
|                                                  |            |                   |              |
|index_server_2[<0.324.0>]                         |    290     |         0         |     6088     |
|  couch_secondary_services[<0.312.0>]             |   93598    |         0         |    68632     |
|  couch_event_listener:do_init/3[<0.326.0>]       |    169     |         0         |     2856     |
|    index_server_2[<0.324.0>]                     |    290     |         0         |     6088     |
....
  1. couch_index_debug:help().
[names,print_linked_processes,busy,restart_busy,restart]
  1. couch_index_debug:help(names)
  2. couch_index_debug:help(print_linked_processes)
  3. couch_index_debug:help(busy)
  4. couch_index_debug:help(restart_busy)
  5. Note pid of whereis(index_server_8). and do couch_index_debug:restart(index_server_8)
  6. Use output of couch_debug:print_linked_processes(couch_index_server). to select a value for a reduction property somewhere in between. Then call couch_index_debug:restart_busy(SelectedValue, 50, reductions).. Finally use couch_debug:print_linked_processes(couch_index_server). and verify pids of some index_server_x processes change.

Related Issues or Pull Requests

Checklist

  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation

@rnewson
Copy link
Member

rnewson commented Feb 18, 2022

not a refactor :P

@iilyak iilyak force-pushed the debug-for-sharded-index-server branch from 4e72316 to 4743389 Compare February 18, 2022 12:55
print_linked_processes() ->
couch_debug:print_linked_processes(couch_index_server).

-spec busy(Thershold :: pos_integer()) ->
Copy link
Contributor Author

@iilyak iilyak Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not couch_index_server specific. Therefore I am considering to move it into couch_debug.erl for reuse. With a slight change in funuction signature. It would become busy(Threshold, ProcessList) -> [proccess_name()].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 6c7a1ac

names()
).

-spec restart_busy(Threshold :: pos_integer()) ->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should move it into couch_debug.erl for re-use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 8da3715

Pid :: pid() | timeout.

restart(Name) ->
Res = test_util:with_process_restart(Name, fun() ->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I am calling a function from test_util module. Is it worth to move the with_process_restart (and friends) into couch_debug and create a dispatch in test_util?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main downside IMO to moving them is that with_process_restart uses the same sneaky trick with respect to time units as test_util:wait and friends, so keeping them together might be slightly less bewildering to someone unfamiliar with that code. I'm pretty neutral, though, and just wish all implementations were a bit less sneaky.

Copy link
Contributor

@jaydoane jaydoane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed testing instructions, and seems to work as expected.


If Property is not specified we use message box size

Properties which can be used are listed bellow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bellow/below


If Property is not specified we use message box size

Properties which can be used are listed bellow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bellow/below

io:format(" ---~n", []),
ok.

-spec busy(ProcessList :: [process_name()], Thershold :: pos_integer()) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Thershold/Threshold

busy(ProcessList, Threshold, message_queue_len).

-spec busy(
ProcessList :: [process_name()], Thershold :: pos_integer(), Property :: busy_properties()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Thershold/Threshold

Pid :: pid() | timeout.

restart(Name) ->
Res = test_util:with_process_restart(Name, fun() ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main downside IMO to moving them is that with_process_restart uses the same sneaky trick with respect to time units as test_util:wait and friends, so keeping them together might be slightly less bewildering to someone unfamiliar with that code. I'm pretty neutral, though, and just wish all implementations were a bit less sneaky.

end.

-spec restart_busy(ProcessList :: [process_name()], Threshold :: pos_integer()) ->
throw({timeout, Name :: process_name()}).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this also return ok?

(node1@127.0.0.1)28> couch_index_debug:restart_busy(4000, 50, reductions).
ok


If Property is not specified we use message box size

Properties which can be used are listed bellow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bellow/below


If Property is not specified we use message box size

Properties which can be used are listed bellow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bellow/below

@iilyak iilyak force-pushed the debug-for-sharded-index-server branch from 3879f10 to 3412a96 Compare February 22, 2022 19:20
@iilyak iilyak merged commit e4b8a46 into apache:3.x Feb 23, 2022
@iilyak iilyak deleted the debug-for-sharded-index-server branch February 23, 2022 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants