Skip to content

envd: connect HTTP proxy to UDS addresses directly#35169

Merged
teskje merged 2 commits intoMaterializeInc:mainfrom
teskje:envd-http-uds-direct
Feb 23, 2026
Merged

envd: connect HTTP proxy to UDS addresses directly#35169
teskje merged 2 commits intoMaterializeInc:mainfrom
teskje:envd-http-uds-direct

Conversation

@teskje
Copy link
Copy Markdown
Contributor

@teskje teskje commented Feb 23, 2026

Instead of requiring another hop through the process orchestrator's TCP proxy, just connect envd's HTTP proxy to the UDS addresses of the target replica directly.

Apart from avoiding data being copied around needlessly, this also resolves a mysterious issue where the HTTP proxy could become overwhelmed and start rejecting new connections under load. With the new implementation, I'm able to run a simple HTTP benchmark and get no connection errors.

Also, this will allow removing the TCP proxy in the process orchestrator. Ok, not quite. The TCP proxy addresses are also used to write a Prometheus service discovery file. We could write that in terms of the envd HTTP proxy addresses instead, but that requires more code changes.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@teskje teskje force-pushed the envd-http-uds-direct branch from 1f21b3b to e6e9147 Compare February 23, 2026 10:00
Copy link
Copy Markdown
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much!

Instead of requiring another hop through the process orchestrator's TCP
proxy, just connect envd's HTTP proxy to the UDS addresses of the target
replica directly.

Apart from avoiding data being copied around needlessly, this also
resolves a mysterious issue where the HTTP proxy could become
overwhelmed and start rejecting new connections under load. With the new
implementation, I'm able to run a simple HTTP benchmark and get no
connection errors.

Also, this will allow removing the TCP proxy in the process
orchestrator.
@teskje teskje force-pushed the envd-http-uds-direct branch from e6e9147 to c1b9f72 Compare February 23, 2026 10:45
@teskje teskje marked this pull request as ready for review February 23, 2026 10:45
@teskje teskje requested review from a team as code owners February 23, 2026 10:45
@teskje teskje requested a review from ggevay February 23, 2026 10:45
@teskje
Copy link
Copy Markdown
Contributor Author

teskje commented Feb 23, 2026

TFTR!

@teskje teskje merged commit 2395d19 into MaterializeInc:main Feb 23, 2026
135 checks passed
@teskje teskje deleted the envd-http-uds-direct branch February 23, 2026 11:11
def- added a commit to def-/materialize that referenced this pull request Feb 23, 2026
def- added a commit that referenced this pull request Feb 23, 2026
#35174)

Follow-up to #35169

Seems to cause panics in
[Cloudtest](https://buildkite.com/materialize/nightly/builds/15312),
which uses hostnames instead of IPs:
```
2026-02-23T14:20:49.944020Z  thread 'coordinator' panicked at /var/lib/buildkite-agent/builds/buildkite-l-builders-aarch64-static-9f59e6a-i-06dcfc2ae8a94894e-1/materialize/test/src/controller/src/clusters.rs:447:54:
valid socket address: AddrParseError { kind: Inet }
   6: core::panicking::panic_fmt
   7: core::result::unwrap_failed
   8: <core::iter::adapters::map::Map<core::slice::iter::Iter<alloc::string::String>, <mz_controller::Controller>::create_replica::{closure#0}> as core::iter::traits::iterator::Iterator>::fold::<(), core::iter::traits::iterator::Iterator::for_each::call<mz_ore::netio::socket::SocketAddr, <alloc::vec::Vec<mz_ore::netio::socket::SocketAddr>>::extend_trusted<core::iter::adapters::map::Map<core::slice::iter::Iter<alloc::string::String>, <mz_controller::Controller>::create_replica::{closure#0}>>::{closure#0}>::{closure#0}>
   9: <alloc::vec::Vec<mz_ore::netio::socket::SocketAddr> as alloc::vec::spec_from_iter::SpecFromIter<mz_ore::netio::socket::SocketAddr, core::iter::adapters::map::Map<core::slice::iter::Iter<alloc::string::String>, <mz_controller::Controller>::create_replica::{closure#0}>>>::from_iter
  10: <mz_controller::Controller>::create_replica
  11: <mz_adapter::coord::Coordinator>::bootstrap::{closure#0}::{closure#0}
  12: <tracing::instrument::Instrumented<<mz_adapter::coord::Coordinator>::bootstrap::{closure#0}::{closure#0}> as core::future::future::Future>::poll
  13: <tracing::instrument::Instrumented<mz_adapter::coord::serve::{closure#0}::{closure#4}::{closure#0}> as core::future::future::Future>::poll
  14: <tokio::runtime::park::CachedParkThread>::block_on::<tracing::instrument::Instrumented<mz_adapter::coord::serve::{closure#0}::{closure#4}::{closure#0}>>
  15: tokio::runtime::context::runtime::enter_runtime::<<tokio::runtime::handle::Handle>::block_on_inner<mz_adapter::coord::serve::{closure#0}::{closure#4}::{closure#0}>::{closure#0}, core::result::Result<(), mz_adapter::error::AdapterError>>
  16: <tokio::runtime::handle::Handle>::block_on::<mz_adapter::coord::serve::{closure#0}::{closure#4}::{closure#0}>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```

Test run: https://buildkite.com/materialize/nightly/builds/15317
leedqin pushed a commit to leedqin/materialize that referenced this pull request Mar 2, 2026
)

Instead of requiring another hop through the process orchestrator's TCP
proxy, just connect envd's HTTP proxy to the UDS addresses of the target
replica directly.

Apart from avoiding data being copied around needlessly, this also
resolves a mysterious issue where the HTTP proxy could become
overwhelmed and start rejecting new connections under load. With the new
implementation, I'm able to run a simple HTTP benchmark and get no
connection errors.
leedqin pushed a commit to leedqin/materialize that referenced this pull request Mar 2, 2026
MaterializeInc#35174)

Follow-up to MaterializeInc#35169

Seems to cause panics in
[Cloudtest](https://buildkite.com/materialize/nightly/builds/15312),
which uses hostnames instead of IPs:
```
2026-02-23T14:20:49.944020Z  thread 'coordinator' panicked at /var/lib/buildkite-agent/builds/buildkite-l-builders-aarch64-static-9f59e6a-i-06dcfc2ae8a94894e-1/materialize/test/src/controller/src/clusters.rs:447:54:
valid socket address: AddrParseError { kind: Inet }
   6: core::panicking::panic_fmt
   7: core::result::unwrap_failed
   8: <core::iter::adapters::map::Map<core::slice::iter::Iter<alloc::string::String>, <mz_controller::Controller>::create_replica::{closure#0}> as core::iter::traits::iterator::Iterator>::fold::<(), core::iter::traits::iterator::Iterator::for_each::call<mz_ore::netio::socket::SocketAddr, <alloc::vec::Vec<mz_ore::netio::socket::SocketAddr>>::extend_trusted<core::iter::adapters::map::Map<core::slice::iter::Iter<alloc::string::String>, <mz_controller::Controller>::create_replica::{closure#0}>>::{closure#0}>::{closure#0}>
   9: <alloc::vec::Vec<mz_ore::netio::socket::SocketAddr> as alloc::vec::spec_from_iter::SpecFromIter<mz_ore::netio::socket::SocketAddr, core::iter::adapters::map::Map<core::slice::iter::Iter<alloc::string::String>, <mz_controller::Controller>::create_replica::{closure#0}>>>::from_iter
  10: <mz_controller::Controller>::create_replica
  11: <mz_adapter::coord::Coordinator>::bootstrap::{closure#0}::{closure#0}
  12: <tracing::instrument::Instrumented<<mz_adapter::coord::Coordinator>::bootstrap::{closure#0}::{closure#0}> as core::future::future::Future>::poll
  13: <tracing::instrument::Instrumented<mz_adapter::coord::serve::{closure#0}::{closure#4}::{closure#0}> as core::future::future::Future>::poll
  14: <tokio::runtime::park::CachedParkThread>::block_on::<tracing::instrument::Instrumented<mz_adapter::coord::serve::{closure#0}::{closure#4}::{closure#0}>>
  15: tokio::runtime::context::runtime::enter_runtime::<<tokio::runtime::handle::Handle>::block_on_inner<mz_adapter::coord::serve::{closure#0}::{closure#4}::{closure#0}>::{closure#0}, core::result::Result<(), mz_adapter::error::AdapterError>>
  16: <tokio::runtime::handle::Handle>::block_on::<mz_adapter::coord::serve::{closure#0}::{closure#4}::{closure#0}>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```

Test run: https://buildkite.com/materialize/nightly/builds/15317
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants