test: Fix process orchestrator port re-use #15336

jkosh44 · 2022-10-11T15:16:25Z

What version of Materialize are you using?

main

How did you install Materialize?

Built from source

What is the issue?

In https://github.com/MaterializeInc/materialize/pull/15316/files#diff-818d26f34fdb3ab6b0892e482b0a1231780a2321f7981fc96912e022bc1d6a6d we discovered an issue with the process orchestrator where we were seeing panics from attempting to re-use ports. We made a fix to just kill the existing process and restart it on another port. We should probably figure out why the port re-use was happening in the first place and fix it.

As per the PR comment:

I also hacked in a fix for the process orchestrator that just kills the existing process and finds it new ports when it falls out of sync during port allocation. This worked well enough in my local testing—and also made the basic case of killall -9 environmentd still work as expected; all clusters are readopted. We should revisit this soon too, though.

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

philip-stoev · 2022-11-01T10:43:31Z

This is a product-side issue, so I am removing the QA team. A work-around has been implemented on the testing side as #15800

In scenarios where replicas and sources are rapidly killed and restarted, computed and storaged may fail to bind to their assigned HTTP port if that port has been just freed by some other process. Previously, this would cause the process to panic and be restarted by the process orchestrator. Now that all panics are fatal, set the SO_REUSEADDR socket option so that bind() succeeds instead. Relates to: MaterializeInc#15336

jkosh44 added C-bug Category: something is broken T-testing Theme: tests or test infrastructure C-triage labels Oct 11, 2022

philip-stoev mentioned this issue Nov 1, 2022

Fix up the process orchestrator #15725

Closed

ggnall removed the C-triage label Nov 22, 2022

benesch closed this as completed in ac1028d Dec 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Fix process orchestrator port re-use #15336

test: Fix process orchestrator port re-use #15336

jkosh44 commented Oct 11, 2022

philip-stoev commented Nov 1, 2022

test: Fix process orchestrator port re-use #15336

test: Fix process orchestrator port re-use #15336

Comments

jkosh44 commented Oct 11, 2022

What version of Materialize are you using?

How did you install Materialize?

What is the issue?

Relevant log output

philip-stoev commented Nov 1, 2022