Mas i1801 monitorworkerq #979

martinsumner · 2021-11-04T12:39:44Z

Add monitoring via riak stats to the worker queues. Monitor both queue time, and work time.

The original commit to make the common riak_core_node_worker_pool behaviour contained rogue tab types that skew code formatting within github. Removing these tabs here.

Currently no way of knowing what the utilisation of different worker pools is, and how often queue events are occurring.

Changed the stats to ones which appear to be intuitively more useful. Now queue time is logged. As there may be a small number of data points only mean and max is tracked. Secondly we log threshold events. On vnode_worker pool, the threshold will be half the limit (as we don't normally expect to get close to the limit). On the node_worker pools, the threshold will be the limit.

At a cost of backwards compatibility. The renamed stats are of minimal use, and in the most case relatively recent additions, for rarely used features.

As well as monitor queue time, monitor work time. As we know our pool sizes threshold breaches can be estimated by looking at the sum of the work time (as we have a count as well as the average) Easier to understand than threshold_events.

Otherwise average will be misleading

martinsumner · 2021-11-08T14:18:55Z

basho/riak_test#1359

ThomasArts · 2021-11-09T08:29:43Z

src/riak_core_worker_pool.erl

@@ -278,3 +291,56 @@ discard_queued_work(Q, Mod) ->
            ok
    end.

+-spec poolboy_checkin(pid(), pid(), state()) -> {ok, state()}.
+poolboy_checkin(Pool, Worker, State) ->


It is not wrong, but changing state as a side effect of a little function called makes it harder to follow the code. I kind of like to have state updates very declarative in a small scope.
In this case that would mean that you pass the pool name in as last argument, instead of the state, and return the checkout or an error. Then you do the state update in the function calling this one...

I do see that this may give some code duplication and probably it's fine this way, but it is harder to get an idea of what happens to the state in the main "loop".

Manipulate state only in the main loop

Use Monitors[0]/Checkouts[0] everywhere

martinsumner added 8 commits November 2, 2021 09:47

Resolve rogue tabs

146b797

The original commit to make the common riak_core_node_worker_pool behaviour contained rogue tab types that skew code formatting within github. Removing these tabs here.

Fix more rogue tabs

338d843

Add stats for worker pools

94db6a5

Currently no way of knowing what the utilisation of different worker pools is, and how often queue events are occurring.

Align qevents name between vnode and node worker pools

72e414d

Reduce unessential additional information in stats name

bca8e9f

Make stat naming consistent

b3931f6

At a cost of backwards compatibility. The renamed stats are of minimal use, and in the most case relatively recent additions, for rarely used features.

Change to monitor work time at checkin/checkout

e45dc81

As well as monitor queue time, monitor work time. As we know our pool sizes threshold breaches can be estimated by looking at the sum of the work time (as we have a count as well as the average) Easier to understand than threshold_events.

martinsumner mentioned this pull request Nov 4, 2021

Mas tabtidy #978

Closed

martinsumner added 3 commits November 4, 2021 15:14

Fix stats

c9e7755

Update riak_core_stat.erl

922a966

Record 0 queue time when not queued

f26f78d

Otherwise average will be misleading

martinsumner requested a review from ThomasArts November 8, 2021 14:18

ThomasArts reviewed Nov 9, 2021

View reviewed changes

ThomasArts approved these changes Nov 9, 2021

View reviewed changes

martinsumner added 2 commits November 9, 2021 09:34

Refactor - state management

ff68144

Manipulate state only in the main loop

Consistent naming

e8affae

Use Monitors[0]/Checkouts[0] everywhere

ThomasArts approved these changes Nov 9, 2021

View reviewed changes

martinsumner merged commit 7e0aa31 into develop-3.0 Nov 9, 2021

martinsumner deleted the mas-i1801-monitorworkerq branch November 9, 2021 14:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mas i1801 monitorworkerq #979

Mas i1801 monitorworkerq #979

martinsumner commented Nov 4, 2021

martinsumner commented Nov 8, 2021

ThomasArts Nov 9, 2021

Mas i1801 monitorworkerq #979

Mas i1801 monitorworkerq #979

Conversation

martinsumner commented Nov 4, 2021

martinsumner commented Nov 8, 2021

ThomasArts Nov 9, 2021

Choose a reason for hiding this comment