Zeek Supervisor Client

Zeek Supervisor Command-Line Client

Project Infrastructure

Executable: zeekc
Language: C++
Revision Control: will live in zeek git repo

Usage Scenarios

Assumption is that a Supervised Zeek Cluster is already started/running: user or service-manager has ran zeek -j … and the ellipsis includes some script/option that will load the Zeek-script which defines a cluster for Zeek to supervise.

# Display standard "usage" info: flags, list of commands with brief explanation,
# influential environment variables, etc.

$ zeekc help
  ...

# Query the current cluster status

$ zeekc status [all | <node_name>]
  ...

# Displays a table of nodes according to these structures:
# https://docs.zeek.org/en/current/scripts/base/frameworks/supervisor/api.zeek.html#type-Supervisor::Status
# Do we need to include any other metrics in the returned status?
# Do we need more categories to filter by (e.g. node type) ?

# If there's downed nodes at this point, what do we expect users to do?
# Check the standard services logs for stderr/stdout info?  Check reporter.log ?
# A `zeekc diag` command could help gather information, like ask Zeek supervisor
# to find core dumps and extract stack trace.  Would it do more than that, like
# show last N lines of downed nodes' stderr, or last N lines of reporter.log?

# Inspect various state stored in script variable named by <ID>
$ zeekc print <ID> [all | <node_name>]
  ...

# User may modify Zeek scripts at this point and ask if they're valid/loadable:
$ zeekc check
  ...

# If it passes, ask to reload the cluster with updated scripts:
$ zeekc restart [all | <node_name>]
  ...

# If we wanted to stop the cluster for some time:
$ zeekc stop [all | <node_name>]
  ...

# To resume the cluster:
$ zeekc start [all | <node_name>]
  ...

# To terminate the cluster, including the supervisor:
$ zeekc terminate
  ...

# Normally wouldn't terminate the supervisor if a service-manager is handling
# the Zeek supervisor process itself and will just restart it, but `terminate`
# would be helpful for anyone running a supervised Zeek cluster "manually".
# The typical way to terminate a cluster, including supervisor, perhaps to
# upgrade the local Zeek installation, would look like:

$ zeekc stop && systemctl stop zeek
  ...

# One could go directly through `systemctl stop`, too, but that's not going to
# have any "orderly" shutdown semantics for the cluster which, in the
# future, may span multiple hosts that `zeekc` needs to orchestrate more
# intelligently than simply asking each host to "shutdown everything".

Additional Meta-Usage

zeekc version or zeekc --version
- Show a version number and exit (see Open Questions below, but we might just plan to emit the zeek version number to which zeekc is paired)
zeekc -v/--verbose
- Enable verbose debugging output to stderr

Open Questions

Do we anticipate a zeekc connecting to a zeek of different versions?
- There's a couple ways for this to break
  - The underlying Broker/CAF versioning between peers differs
    - Should be easy to detect handshake failure and report nicely
  - The underlying Broker/CAF message format is compatible, but the Supervisor events / data structures changed between zeek versions
    - I'd suggest having only standardized "hello" or "handshake" exchange
      - zeekc: publish("zeek/supervisor", hello, "zeek/zeekc")
      - zeek: publish("zeek/zeekc", hello, zeek_version())
      - zeekc: wait for response with relatively short timeout interval. If the major/minor versions matches what we were built for (3.2, 4.0, 4.1, etc), then proceed, else emit fatal error.

Implementation Notes

Zeek-side changes to better support `zeekc`

New options
- SupervisorControl::enable=T: toggles whether to listen() for external requests by default
- SupervisorControl::listen_port=42042/tcp: the port on which to listen() for external requests
publish() responses to requests using a topic related to request ID
- This helps there potentially be multiple "client" implementations that can "play nice" with each other and don't get responses mixed up. Example of alternate client could be a Python script that directly requests status updates from Zeek supervisor.
The PID status of nodes is currently the "PID of last fork()", even if that fork already exited, so need to change/document that to report some sentinel value indicating "currently down"
Probably nice to have an API to request continuous status updates
- e.g. any change in the process tree gets published to a topic of choosing
- This helps zeekc stop do an orderly shutdown: ask to shutdown workers, then proxies, then manager, then logger and at each step wait for status updates to confirm all nodes of that type are gone
Add Supervisor::stop() and Supervisor::start() to kill() and fork() nodes respectively, but without mutating the node table. This differs from create() and destroy() operations which do change the node table. Child processes associated with a "stopped" node do not automatically get revived until "started".
- create() can add a default parameter of start: bool &default=T.
Rename SupervisorControl::stop_request to SupervisorControl::terminate_request and implement stop_request and start_request calling to stop() and start()

`zeekc`

check
- Try to start a shadow version of the process tree in "parse-only" mode and return if anything exits non-zero
- If supervisor is not running, or has no children, report that as error
print
- Not currently supported by SupervisorControl API, but can add
stop
- Order stop of workers, then proxies, then manager, then logger
start
- Orderly start of logger, then manager, then proxies, then workers
restart
- Likely just stop followed by start
terminate
- Likely just stop followed by terminate_request()
Ability to toggle TLS: command-line flag or env. variable.
Ability to change connection port: command-line flag or env. variable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zeek Supervisor Client

Zeek Supervisor Command-Line Client

Project Infrastructure

Usage Scenarios

Additional Meta-Usage

Open Questions

Implementation Notes

Zeek-side changes to better support `zeekc`

`zeekc`

Clone this wiki locally

Zeek Supervisor Client

Zeek Supervisor Command-Line Client

Project Infrastructure

Usage Scenarios

Additional Meta-Usage

Open Questions

Implementation Notes

Zeek-side changes to better support zeekc

zeekc

Clone this wiki locally

Zeek-side changes to better support `zeekc`

`zeekc`