Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix up the process orchestrator #15725

Closed
benesch opened this issue Oct 27, 2022 · 4 comments
Closed

Fix up the process orchestrator #15725

benesch opened this issue Oct 27, 2022 · 4 comments
Labels
A-stability Theme: stability C-refactoring Category: replacing or reorganizing code

Comments

@benesch
Copy link
Member

benesch commented Oct 27, 2022

There are two known problems with the process orchestrator:

We've talked about simplifying the process orchestrator (e.g., removing all support for re-adopting processes after an environmentd restart), but there are at least two good reasons to want to keep the process orchestrator in parity with the Kubernetes orchestrator:

  • The process orchestrator is incredibly useful for local development and manual testing. It was crucial to the efficient development of storage and compute reconciliation, where you want to be able to restart envd and have it adopt existing storaged and computed processes.
  • If we more widely advertise the all-in-one materialized Docker image, we'll want a robust process orchestrator that we trust.

I think there are three well-scoped improvements we can make.

  1. Use Unix domain sockets rather than ports. This will eliminate just an absolute ton of complexity from the process orchestrator. TCP ports are a scarce, global resource; Unix domain sockets are not scarce and easily scoped to a per-test/per-envd-invocation directory. This will be a fairly large refactor of our networking/orchestration layer, but a net reduction in complexity. Added benefits: it'll be possible to run bin/sqllogictest and bin/environmentd at the same time. It'll be possible to use cargo-nextest to run Rust tests faster (ci: pull cargo test out of the build CI job #13035).
  2. Figure out how to supervise adopted processes. Right now the process orchestrator will restart processes that it spawns if they crash... but AFAICT it will not restart processes that it adopts. There's just no call to the equivalent of supervise that watches for the external process to crash.
  3. Write down a "service spec" in the PID file, rather than just a PID, and kill the existing process if the service spec on disk does not match the desired service spec. Right now we assume that the existing process was spawned with all the right arguments, which is not a valid assumption in general—e.g., if you wanted to test a version upgrade with the process orchestrator.

Tossing this one on @guswynn and @jkosh44's radar, but the process orchestrator isn't really owned by any particular team.

cc @chaas @uce

@benesch benesch added C-refactoring Category: replacing or reorganizing code A-stability Theme: stability labels Oct 27, 2022
@philip-stoev
Copy link
Contributor

This bug actually prevents running any tests with SIZE '4-4' reliably in the CI, let alone the entire CI. The sheer number of processes that need to be started properly is such that sporadic failures are guaranteed to spoil the CI experience for everyone. Both the "storaged was never restarted" and the "cannot assign requested address" sub-issues are seen.

@benesch could this be a potential solution:

  • put all the metadata/PID files on ephimeral storage
  • if a container restarts, this information will be lost , but all the children processes are dead too, so we have effectively a clean-slate state from which we should be able to respawn all the required children?

@benesch
Copy link
Member Author

benesch commented Oct 29, 2022

put all the metadata/PID files on ephimeral storage

Yup, I think that would do it!

philip-stoev added a commit to philip-stoev/materialize that referenced this issue Oct 31, 2022
In order to work around MaterializeInc#15725, clean up the Progress Orchestrator
information on PIDs and TCP ports on restart. As the tests are running
in containers, the processes that may have held those PIDs and ports
are now gone anyway when environmentd restarts, as they lived in
the same container.
philip-stoev added a commit to philip-stoev/materialize that referenced this issue Oct 31, 2022
In order to work around MaterializeInc#15725, clean up the Progress Orchestrator
information on PIDs and TCP ports on restart. As the tests are running
in containers, the processes that may have held those PIDs and ports
are now gone anyway when environmentd restarts, as they lived in
the same container.

Relates to #MaterializeInc#15725, MaterializeInc#15155
philip-stoev added a commit to philip-stoev/materialize that referenced this issue Oct 31, 2022
In order to work around MaterializeInc#15725, clean up the Progress Orchestrator
information on PIDs and TCP ports on restart. As the tests are running
in containers, the processes that may have held those PIDs and ports
are now gone anyway when environmentd restarts, as they lived in
the same container.

Relates to #MaterializeInc#15725, MaterializeInc#15155
@guswynn
Copy link
Contributor

guswynn commented Oct 31, 2022

@benesch is ephemeral storage the same as anonymous volumes in docker compose or is even even more ephemeral?

@benesch
Copy link
Member Author

benesch commented Oct 31, 2022

Even more ephemeral! Anonymous volumes are still volumes that can persist across container restarts. Ephemeral storage is the stuff that vanishes on container exit. Like, writing to /home, unless you've mounted a volume at /home, for example.

philip-stoev added a commit to philip-stoev/materialize that referenced this issue Nov 8, 2022
Due to MaterializeInc#15725, environmentd may fail to spawn all the required
computeds unless the process orchestrator metadata is wiped
in advance.

We consolidate the wiping procedure in `up()` so that
all testing frameworks that happen to restart Mz can benefit.
philip-stoev added a commit to philip-stoev/materialize that referenced this issue Nov 8, 2022
Due to MaterializeInc#15725, environmentd may fail to spawn all the required
computeds unless the process orchestrator metadata is wiped
in advance.

We consolidate the wiping procedure in `up()` so that
all testing frameworks that happen to restart Mz can benefit.
philip-stoev added a commit that referenced this issue Nov 9, 2022
sjwiesman pushed a commit to sjwiesman/materialize that referenced this issue Nov 9, 2022
Due to MaterializeInc#15725, environmentd may fail to spawn all the required
computeds unless the process orchestrator metadata is wiped
in advance.

We consolidate the wiping procedure in `up()` so that
all testing frameworks that happen to restart Mz can benefit.
benesch added a commit that referenced this issue Dec 4, 2022
Add `SocketAddr`, `Listener`, and `Stream` types to the `mz_ore::netio`
module, which abstract over TCP sockets and Unix domain sockets. Then
teach storaged and computed to accept their listen addresses using the
new `SocketAddr` types, which allows them to bind to either TCP or Unix
domain sockets, as desired.

This is a key part step towards fixing the process orchestrator
(#15725), as it will allow multiple copies of Materialize to be run
concurrently without competing for access to the same ports.
benesch added a commit that referenced this issue Dec 4, 2022
PID files are not valid after a reboot of a machine. In the best case,
the referenced PIDs do not exist, and the process orchestrator correctly
recreates the services; in the worst case, the PIDs have been reused by
different processes entirely, and the process orchestrator incorrectly
thinks the services are already running. The worst case scenario is
almost a guarantee with containers, where there are only a few processes
using the low-numbered containers.

This commit fixes the problem by moving  the PID metadata files into
$TMPDIR/environment-$ID. $TMPDIR is cleared on restart, so the stale PID
files will correctly vanish after a restart. Naming the directory after
the environment ID ensures that environmentd can find its metadata after
a process restart without a machine restart, but allows multiple
`environmentd` processes to co-exist, as long as they use different
environment IDs. Things work correctly with the `--reset` option to
bin/environment, too, as this option generates a new environmentd ID.

Touches #15725.
Would close #15800.
@benesch benesch closed this as completed in ac1028d Dec 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stability Theme: stability C-refactoring Category: replacing or reorganizing code
Projects
None yet
Development

No branches or pull requests

3 participants