Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RTS monitor #353

Merged
merged 1 commit into from
Nov 16, 2021
Merged

Add RTS monitor #353

merged 1 commit into from
Nov 16, 2021

Conversation

plajjan
Copy link
Contributor

@plajjan plajjan commented Nov 9, 2021

This adds monitoring function to the run time system. It is enabled with --rts-mon /tmp/mon_socket after which the RTS will listen on the UNIX Domain Socket /tmp/mon_socket. For any incoming request, we will dump the current stats for all worker threads.

There's a new thread to deal with the monitor socket. We only accept a single connection at a time. I want the monitoring thread to be completely standalone, i.e. I do not want to mix in listening on various fds into our normal eventloop. This is simple, which is good.

The protocol is a quick hack of a line protocol. The client sends "WTS" (worker thread stats) and gets a single line reply, which is a JSON document. To encode data we use yyjson, which is now included in our code base.

There is a new utility in utils/actonmon which can connect to the monitoring socket and display thread stats. It does so continuously at 0.5 second intervals. The interval can be set with --interval. It requires the path to the monitoring socket as an argument.

To start examples/count with RTS monitoring enabled:

examples/count --rts-mon ~/act_mon_socket

Then check the mon utility:

utils/actonmon ~/act_mon_socket --interval 0.2

There's also a --rich option to actonmon to tell it to use the rich library to render a very pretty looking table that is then updated in place.

@plajjan plajjan mentioned this pull request Nov 15, 2021
@plajjan plajjan marked this pull request as draft November 15, 2021 11:41
rts/rts.c Show resolved Hide resolved
utils/actonmon Outdated Show resolved Hide resolved
@plajjan plajjan force-pushed the add-rts-monitor branch 2 times, most recently from 45dd9fc to 4396dc4 Compare November 16, 2021 08:37
This adds statistics per RTS worker thread and the ability to expose
these statistics over a UNIX domain socket. It is enabled with
--rts-mon PATH to listen on the socket PATH. The protocol use is a
simple ASCII line based protocol. A client can send "WTS" (currently
only supported command) to which the RTS will respond with a dump of the
worker thread statistics (WTS) as a JSON blob on one line.

There's a new thread to deal with the monitor socket. We only accept a
single connection at a time. I want the monitoring thread to be
completely standalone, i.e. I do not want to mix in listening on various
fds into our normal eventloop. This is deliberately kept simple.

To encode data in JSON we use the yyjson library, which is a wicked fast
JSON encoding and decoding library. It only consists of a single .h and
.c file, so they are now included in the deps directory. Not sure if
this is how we want to deal with dependencies in general, but it is
there for now.

The statistics are:
- program name (argv[0])
- PID
- current worker state
  - one of no-exist, worker, idle or sleeping
- number of times a thread has gone to sleep
- number of executed continuations
- sum of time spent executing continuations, in nanoseconds
- buckets with execution time & "bookkeeping" of continuations
  - for example, if a continuation takes less than 100ns to run, it is
  counted in the 100ns bucket, if it is between 100ns and 1us it is
  counted in the 1us bucket and so forth
  - bookeeping includes flushing outgoing queues with the locking etc
    involved, if the distributed database is used, then its interaction
    is included
  - the buckets are:
    - < 100ns
    - < 1us
    - < 10us
    - < 100us
    - < 1ms
    - < 10ms
    - < 100ms
    - < 1s
    - < 10s
    - < 100s
    - < +Inf
  - 100ns is more on the level of measurement overhead so a 10ns or 1ns
    bucket would not yield useful information

It should be possible to graph the ratio between our bookkeeping and the
actual execution of threads to show the sort of run time overhead of the
Acton system. Going to be particularly interesting to see for the
distributed database!

There is a new utility in utils/actonmon which can connect to the
monitoring socket and display thread stats. It supports three different
modes:
- simple
- rich
- prometheus

The rich interface (started with --rich) uses the rich library to render
a table with the worker thread statistics. It updates in place and looks
pretty good. The update interval can be set with --interval, in seconds.

The simple mode just prints some statistics to the screen in a long log
at every --interval. It doesn't have any dependencies and so is useful
when you don't have rich installed. It's currently very sparse.

prometheus (--prom) mode starts to listen on http://localhost:8000 and
will answer GET queries with the statistics that it collects from the
RTS.

To start examples/count with RTS monitoring enabled:

    examples/count --rts-mon ~/act_mon_socket

Then check the mon utility with the rich interface:

    utils/actonmon ~/act_mon_socket --interval 0.2 --rich

Or to enable prometheus export:

    utils/actonmon ~/act_mon_socket --prom
@plajjan plajjan marked this pull request as ready for review November 16, 2021 13:55
@plajjan
Copy link
Contributor Author

plajjan commented Nov 16, 2021

I have now fixed the last couple of things, including to properly free a temporary variable :P

@nordlander approved so going to merge on CI success. Yay.

@plajjan plajjan merged commit 58b699a into main Nov 16, 2021
@plajjan plajjan deleted the add-rts-monitor branch November 16, 2021 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants