Fix up the process orchestrator #15725

benesch · 2022-10-27T05:59:40Z

There are two known problems with the process orchestrator:

It can lose track of processes, possibly leading to sources that get "stuck" in CI but not in production (see storage: sources can get silently stuck after restart #15155 (comment)).
It can't keep ownership of TCP ports in case of process restarts (code). test: Fix process orchestrator port re-use #15336

We've talked about simplifying the process orchestrator (e.g., removing all support for re-adopting processes after an environmentd restart), but there are at least two good reasons to want to keep the process orchestrator in parity with the Kubernetes orchestrator:

The process orchestrator is incredibly useful for local development and manual testing. It was crucial to the efficient development of storage and compute reconciliation, where you want to be able to restart envd and have it adopt existing storaged and computed processes.
If we more widely advertise the all-in-one materialized Docker image, we'll want a robust process orchestrator that we trust.

I think there are three well-scoped improvements we can make.

Use Unix domain sockets rather than ports. This will eliminate just an absolute ton of complexity from the process orchestrator. TCP ports are a scarce, global resource; Unix domain sockets are not scarce and easily scoped to a per-test/per-envd-invocation directory. This will be a fairly large refactor of our networking/orchestration layer, but a net reduction in complexity. Added benefits: it'll be possible to run bin/sqllogictest and bin/environmentd at the same time. It'll be possible to use cargo-nextest to run Rust tests faster (ci: pull cargo test out of the build CI job #13035).
Figure out how to supervise adopted processes. Right now the process orchestrator will restart processes that it spawns if they crash... but AFAICT it will not restart processes that it adopts. There's just no call to the equivalent of supervise that watches for the external process to crash.
Write down a "service spec" in the PID file, rather than just a PID, and kill the existing process if the service spec on disk does not match the desired service spec. Right now we assume that the existing process was spawned with all the right arguments, which is not a valid assumption in general—e.g., if you wanted to test a version upgrade with the process orchestrator.

Tossing this one on @guswynn and @jkosh44's radar, but the process orchestrator isn't really owned by any particular team.

cc @chaas @uce

The text was updated successfully, but these errors were encountered:

philip-stoev · 2022-10-28T06:32:25Z

This bug actually prevents running any tests with SIZE '4-4' reliably in the CI, let alone the entire CI. The sheer number of processes that need to be started properly is such that sporadic failures are guaranteed to spoil the CI experience for everyone. Both the "storaged was never restarted" and the "cannot assign requested address" sub-issues are seen.

@benesch could this be a potential solution:

put all the metadata/PID files on ephimeral storage
if a container restarts, this information will be lost , but all the children processes are dead too, so we have effectively a clean-slate state from which we should be able to respawn all the required children?

benesch · 2022-10-29T05:24:27Z

put all the metadata/PID files on ephimeral storage

Yup, I think that would do it!

In order to work around MaterializeInc#15725, clean up the Progress Orchestrator information on PIDs and TCP ports on restart. As the tests are running in containers, the processes that may have held those PIDs and ports are now gone anyway when environmentd restarts, as they lived in the same container.

In order to work around MaterializeInc#15725, clean up the Progress Orchestrator information on PIDs and TCP ports on restart. As the tests are running in containers, the processes that may have held those PIDs and ports are now gone anyway when environmentd restarts, as they lived in the same container. Relates to #MaterializeInc#15725, MaterializeInc#15155

guswynn · 2022-10-31T17:44:37Z

@benesch is ephemeral storage the same as anonymous volumes in docker compose or is even even more ephemeral?

benesch · 2022-10-31T23:54:51Z

Even more ephemeral! Anonymous volumes are still volumes that can persist across container restarts. Ephemeral storage is the stuff that vanishes on container exit. Like, writing to /home, unless you've mounted a volume at /home, for example.

Due to MaterializeInc#15725, environmentd may fail to spawn all the required computeds unless the process orchestrator metadata is wiped in advance. We consolidate the wiping procedure in `up()` so that all testing frameworks that happen to restart Mz can benefit.

…-wipe mzcompose: Fortify mzcompose against #15725

Due to MaterializeInc#15725, environmentd may fail to spawn all the required computeds unless the process orchestrator metadata is wiped in advance. We consolidate the wiping procedure in `up()` so that all testing frameworks that happen to restart Mz can benefit.

Add `SocketAddr`, `Listener`, and `Stream` types to the `mz_ore::netio` module, which abstract over TCP sockets and Unix domain sockets. Then teach storaged and computed to accept their listen addresses using the new `SocketAddr` types, which allows them to bind to either TCP or Unix domain sockets, as desired. This is a key part step towards fixing the process orchestrator (#15725), as it will allow multiple copies of Materialize to be run concurrently without competing for access to the same ports.

PID files are not valid after a reboot of a machine. In the best case, the referenced PIDs do not exist, and the process orchestrator correctly recreates the services; in the worst case, the PIDs have been reused by different processes entirely, and the process orchestrator incorrectly thinks the services are already running. The worst case scenario is almost a guarantee with containers, where there are only a few processes using the low-numbered containers. This commit fixes the problem by moving the PID metadata files into $TMPDIR/environment-$ID. $TMPDIR is cleared on restart, so the stale PID files will correctly vanish after a restart. Naming the directory after the environment ID ensures that environmentd can find its metadata after a process restart without a machine restart, but allows multiple `environmentd` processes to co-exist, as long as they use different environment IDs. Things work correctly with the `--reset` option to bin/environment, too, as this option generates a new environmentd ID. Touches #15725. Would close #15800.

benesch added C-refactoring Category: replacing or reorganizing code A-stability Theme: stability labels Oct 27, 2022

benesch mentioned this issue Oct 27, 2022

storage: sources can get silently stuck after restart #15155

Closed

philip-stoev mentioned this issue Oct 31, 2022

zippy,platform-checks: Clean up Process Orchestrator metadata on restart #15800

Merged

benesch mentioned this issue Oct 31, 2022

orchestrator-process: store PID metadata in temp directory #15810

Merged

3 tasks

philip-stoev mentioned this issue Nov 7, 2022

workflow-user-tables in persistence is flaky #15926

Closed

philip-stoev mentioned this issue Nov 8, 2022

mzcompose: Fortify mzcompose against #15725 #15951

Merged

philip-stoev mentioned this issue Nov 8, 2022

Kafka exactly-once tests after-restart.td is flaky #15944

Closed

philip-stoev added a commit that referenced this issue Nov 9, 2022

Merge pull request #15951 from philip-stoev/mzcompose-restart-mz-with…

d415ef3

…-wipe mzcompose: Fortify mzcompose against #15725

benesch closed this as completed in ac1028d Dec 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix up the process orchestrator #15725

Fix up the process orchestrator #15725

benesch commented Oct 27, 2022 •

edited by philip-stoev

Loading

philip-stoev commented Oct 28, 2022

benesch commented Oct 29, 2022

guswynn commented Oct 31, 2022

benesch commented Oct 31, 2022

Fix up the process orchestrator #15725

Fix up the process orchestrator #15725

Comments

benesch commented Oct 27, 2022 • edited by philip-stoev Loading

philip-stoev commented Oct 28, 2022

benesch commented Oct 29, 2022

guswynn commented Oct 31, 2022

benesch commented Oct 31, 2022

benesch commented Oct 27, 2022 •

edited by philip-stoev

Loading