Zeek Supervisor Log Handling
ZeekControl used to handle several things
-
Configure Zeek log rotation parameters via the Cluster Framework script options. Zeek itself does the rotation in the typical case, which the archival being a separate postprocessing script executed right after rotation.
-
Archive logs (move rotated logs somewhere and optionally compress).
-
Rotate+archive leftover logs that had not been rotated by Zeek due to crash/kill.
The previous rotation process with ZeekControl (default config):
-
log_mgr
emits "rotate" msg with desired rotation path:conn-<open_time>
-
Writer
thread gets "rotate" msg, adds a file extension and renames:conn.log
->conn-<open_time>.log
-
Writer
emits "finished rotation" msg,log_mgr
gets it and runs a postprocessing Zeek function. - The default postprocessing function for ASCII does an additional rename:
conn-<open_time>.log
->conn.<open_time_alt_fmt>.log
(uses a.
instead of-
and also a different timestamp format) - The default ASCII postprocessing function also shells out to a postprocessing
command,
archive-log
(shipped in ZeekControl), that does a final rename and optionalgzip
compression:conn.<open_time_alt_fmt>.log
->conn.<from>-<to>.log.gz
For the Supervised model of log rotation:
-
Step (4) should be removed/simplified for following reasons (I mean the default ASCII postprocessor function specifically, not postprocessors generally).
-
It's cumbersome to recover the file extension, from the provided filename argument since the extension can be customized per-filter. The existing function even seems to do this incorrectly.
-
I have concern about putting a
system(/bin/mv)
here at all since we implement that assystem("mv src dst &")
and it also immediately does a following call tosystem("archive-log args &")
. Don't see how that's not a race. -
Suggest adding a
Log::rotation_format_func(): string
option to help give users a direct way to customize the rotation formatting (e.g. choosing which timestamp format they want). They implement that function and it gets called at Step (1).
-
-
Step (5) doesn't work well and needs replacement. See Log Archival.
- To help bridge/replace Step (4) and (5), suggest adding a new option:
Log::default_rotation_dir
. TheLog::rotation_format_func()
will use this as part of its default return value. Thelog_mgr
will attempt to create necessary dirs just-in-time, failing to do so emits an error, but otherwise continues with rotation using working directory instead. A supervised cluster will change this option to./log-queue/
by default.
- To help bridge/replace Step (4) and (5), suggest adding a new option:
ZeekControl provided a shell script, archive-log
, and instructed Zeek
to use it as a postprocessing command. This has problems:
- Load: all logs try to zip simultaneously
- Non-Atomicity: there's no resiliency built-in to finish any interrupted (reboot, OOM, power loss), archival processes later
Solution: supervised clusters will now expect a single external process to separately take responsibilty of archiving logs. Justin has already made such a script and used it successfully, so we should just "officialize" it:
- Reference: https://github.com/ncsa/bro-atomic-rotate
- Revision Control: re-implement a version of this into the
zeek
git repo - Rename executable:
zeek-archiver
- Language: probably simple enough to re-implement in C++ (i.e. just so we're
not adding a Python dependence from the main
zeek
repo itself) - options
- dir-to-monitor
- destination dir
- timestamp delimiter used in source file names
- use gzip or not
- auto-detection of already-zipped log via a set of file extension names
- We can potentially have the Zeek Supervisor process configurable to auto-start
and keep a
zeek-archiver
child alive. Requires adding a way to supervise processes-other-than-zeek, which was already planned for other use-cases.
To implement rotation of leftover logs we can introduce a concept of
"shadow" log files which is just a .shadow.<logfile>
accompanying each log
that contains a small amount of metadata: log file extension
(e.g. .log
or .log.gz
) as well as name of the chosen postprocessor
function.
The shadow file is written before open()
of a new file and deleted after each
rotation's rename()
. If both shadow file and log file exist upon a Zeek
process starting up, then it initiates a rotation -- that's what should have
happened on normal Zeek process termination and if we don't rotate now, it's
going to get clobbered eventually. The rotation for such a leftover log file
uses the metadata in the shadowfile to help try to go through the exact
rotation that it should have occurred, including running the postprocessor
function.
I think ZeekControl's crash recovery mechanisms were writer-agnostic, but this
new shadow file implementation would be ASCII-specific. Other writers could
still implement equivalent logic entirely on their own. We could expose the
logic through some generic APIs to help others re-use it, but they ultimately
still have to opt-in with an InitPostScript()
override (where the rotation
needs to happen) and "instrumentation" to create shadow files just before they
do any open()
(e.g. we could provide a ShadowedLogOpen()
with similar args
to open()
)