Skip to content

Zeek Supervisor Log Handling

Jon Siwek edited this page Jun 30, 2020 · 1 revision

Zeek Supervised Cluster Log Handling

ZeekControl used to handle several things

  • Configure Zeek log rotation parameters via the Cluster Framework script options. Zeek itself does the rotation in the typical case, which the archival being a separate postprocessing script executed right after rotation.

  • Archive logs (move rotated logs somewhere and optionally compress).

  • Rotate+archive leftover logs that had not been rotated by Zeek due to crash/kill.

Log Rotation

The previous rotation process with ZeekControl (default config):

  1. log_mgr emits "rotate" msg with desired rotation path: conn-<open_time>
  2. Writer thread gets "rotate" msg, adds a file extension and renames: conn.log -> conn-<open_time>.log
  3. Writer emits "finished rotation" msg, log_mgr gets it and runs a postprocessing Zeek function.
  4. The default postprocessing function for ASCII does an additional rename: conn-<open_time>.log -> conn.<open_time_alt_fmt>.log (uses a . instead of - and also a different timestamp format)
  5. The default ASCII postprocessing function also shells out to a postprocessing command, archive-log (shipped in ZeekControl), that does a final rename and optional gzip compression: conn.<open_time_alt_fmt>.log -> conn.<from>-<to>.log.gz

For the Supervised model of log rotation:

  • Step (4) should be removed/simplified for following reasons (I mean the default ASCII postprocessor function specifically, not postprocessors generally).

    • It's cumbersome to recover the file extension, from the provided filename argument since the extension can be customized per-filter. The existing function even seems to do this incorrectly.

    • I have concern about putting a system(/bin/mv) here at all since we implement that as system("mv src dst &") and it also immediately does a following call to system("archive-log args &"). Don't see how that's not a race.

    • Suggest adding a Log::rotation_format_func(): string option to help give users a direct way to customize the rotation formatting (e.g. choosing which timestamp format they want). They implement that function and it gets called at Step (1).

  • Step (5) doesn't work well and needs replacement. See Log Archival.

    • To help bridge/replace Step (4) and (5), suggest adding a new option: Log::default_rotation_dir. The Log::rotation_format_func() will use this as part of its default return value. The log_mgr will attempt to create necessary dirs just-in-time, failing to do so emits an error, but otherwise continues with rotation using working directory instead. A supervised cluster will change this option to ./log-queue/ by default.

Log Archival

ZeekControl provided a shell script, archive-log, and instructed Zeek to use it as a postprocessing command. This has problems:

  • Load: all logs try to zip simultaneously
  • Non-Atomicity: there's no resiliency built-in to finish any interrupted (reboot, OOM, power loss), archival processes later

Solution: supervised clusters will now expect a single external process to separately take responsibilty of archiving logs. Justin has already made such a script and used it successfully, so we should just "officialize" it:

  • Reference: https://github.com/ncsa/bro-atomic-rotate
  • Revision Control: re-implement a version of this into the zeek git repo
  • Rename executable: zeek-archiver
  • Language: probably simple enough to re-implement in C++ (i.e. just so we're not adding a Python dependence from the main zeek repo itself)
  • options
    • dir-to-monitor
    • destination dir
    • timestamp delimiter used in source file names
    • use gzip or not
    • auto-detection of already-zipped log via a set of file extension names
  • We can potentially have the Zeek Supervisor process configurable to auto-start and keep a zeek-archiver child alive. Requires adding a way to supervise processes-other-than-zeek, which was already planned for other use-cases.

Leftover Log Rotation

To implement rotation of leftover logs we can introduce a concept of "shadow" log files which is just a .shadow.<logfile> accompanying each log that contains a small amount of metadata: log file extension (e.g. .log or .log.gz) as well as name of the chosen postprocessor function.

The shadow file is written before open() of a new file and deleted after each rotation's rename(). If both shadow file and log file exist upon a Zeek process starting up, then it initiates a rotation -- that's what should have happened on normal Zeek process termination and if we don't rotate now, it's going to get clobbered eventually. The rotation for such a leftover log file uses the metadata in the shadowfile to help try to go through the exact rotation that it should have occurred, including running the postprocessor function.

I think ZeekControl's crash recovery mechanisms were writer-agnostic, but this new shadow file implementation would be ASCII-specific. Other writers could still implement equivalent logic entirely on their own. We could expose the logic through some generic APIs to help others re-use it, but they ultimately still have to opt-in with an InitPostScript() override (where the rotation needs to happen) and "instrumentation" to create shadow files just before they do any open() (e.g. we could provide a ShadowedLogOpen() with similar args to open())

Clone this wiki locally