Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orchestrator-process: store PID metadata in temp directory #15810

Merged
merged 3 commits into from
Dec 4, 2022

Commits on Dec 4, 2022

  1. ore,storaged,computed: introduce abstractions over Unix/TCP sockets

    Add `SocketAddr`, `Listener`, and `Stream` types to the `mz_ore::netio`
    module, which abstract over TCP sockets and Unix domain sockets. Then
    teach storaged and computed to accept their listen addresses using the
    new `SocketAddr` types, which allows them to bind to either TCP or Unix
    domain sockets, as desired.
    
    This is a key part step towards fixing the process orchestrator
    (MaterializeInc#15725), as it will allow multiple copies of Materialize to be run
    concurrently without competing for access to the same ports.
    benesch committed Dec 4, 2022
    Configuration menu
    Copy the full SHA
    6ab46f8 View commit details
    Browse the repository at this point in the history
  2. orchestrator-process: store PID metadata in temp directory

    PID files are not valid after a reboot of a machine. In the best case,
    the referenced PIDs do not exist, and the process orchestrator correctly
    recreates the services; in the worst case, the PIDs have been reused by
    different processes entirely, and the process orchestrator incorrectly
    thinks the services are already running. The worst case scenario is
    almost a guarantee with containers, where there are only a few processes
    using the low-numbered containers.
    
    This commit fixes the problem by moving  the PID metadata files into
    $TMPDIR/environment-$ID. $TMPDIR is cleared on restart, so the stale PID
    files will correctly vanish after a restart. Naming the directory after
    the environment ID ensures that environmentd can find its metadata after
    a process restart without a machine restart, but allows multiple
    `environmentd` processes to co-exist, as long as they use different
    environment IDs. Things work correctly with the `--reset` option to
    bin/environment, too, as this option generates a new environmentd ID.
    
    Touches MaterializeInc#15725.
    Would close MaterializeInc#15800.
    benesch committed Dec 4, 2022
    Configuration menu
    Copy the full SHA
    32c6022 View commit details
    Browse the repository at this point in the history
  3. orchestrator-process: rewrite

    Rewrite the process orchestrator to be more reliable:
    
      * Use Unix domain sockets rather than TCP, to avoid competing with
        other `environmentd` processes (e.g., in tests) for a very limited
        range of TCP ports.
    
      * Restructure process supervision so that existing processes that are
        adopted at startup are monitored for failures and restarted if
        necessary.
    
      * Enable process status updates.
    
      * Enable process metrics collection.
    
    Fix MaterializeInc#15725.
    Fix MaterializeInc#15336.
    benesch committed Dec 4, 2022
    Configuration menu
    Copy the full SHA
    a233f31 View commit details
    Browse the repository at this point in the history