Schema update is not propagated to the HTTP server #41

Geal · 2021-10-19T09:02:02Z

Describe the bug
maybe I'm missing something, but it looks like UpdateSchema messages change the schema in the state machine: https://github.com/apollographql/router/blob/07dfdddedc3e90cf6a7e0124ceb9f1a23446cae6/crates/apollo-router/src/state_machine.rs#L150-L172

but does not pass it to the HTTP server. It is only changed here: https://github.com/apollographql/router/blob/07dfdddedc3e90cf6a7e0124ceb9f1a23446cae6/crates/apollo-router/src/state_machine.rs#L193-L196

To Reproduce

At this line: https://github.com/apollographql/router/blob/07dfdddedc3e90cf6a7e0124ceb9f1a23446cae6/crates/apollo-router/src/lib.rs#L579

add the following:

SchemaKind::Instance(Schema::from_str("").unwrap())
                .into_stream()
                .boxed(),

This will first set up a server with the configuration and an empty schema, then try to update the schema. Any query to the router will fail because the schema used is still the empty one

The text was updated successfully, but these errors were encountered:

Geal · 2021-10-19T10:07:15Z

I see three ways to fix that:

put the schema in an Arc<Mutex<>> so that it can be updated by the state machine and read by the HTTP server. I'm worried about introducing a mutex in the hot path though, since all new queries will go through reading the schema and query planning
since the HTTP server has a graceful shutdown mechanism, and we can manage the TCP listener separately, we could do as follows:
- the state machine manages the TCP listener. This is necessary because it is not clonable, it is not possible to get it back from warp and hyper, and binding a new one with REUSE_PORT would lose some connections. we might need an extra step, like plugging it into a MPMC queue (potential perf issues introduced here)
- we pass a copy of the receiver at HTTP server creation, along with the FederatedGraph instance
- whenever there's a new schema, we create a new HTTP server with a new FederatedGraph, and we tell the old one to stop through graceful shutdown. As I have measured, creating a new server takes 20µs on my machine, and the server is 320 bytes, so it is not a costly operation
rely on external solutions like systemd and listenfd to restart the server while keeping the listen socket

I'm partial to the second solution, since I have exhaustively explored that particular rabbit hole. Managing the listen socket outside of warp will be useful when we start putting bounds on the number of sessions and rate limiting new connections.

Geal · 2021-10-19T12:06:12Z

alternate version for option 2:

make a hyper service
avoid the hyper::Server and instead use the lower level conn module

with this method we do not need to introduce another queue, but we will need to reimplement the graceful shutdown

BrynCooke · 2021-10-20T08:58:41Z

Just bear in mind that it's not restarting the server that is costly. It's draining the requests. That's why we were using shared state before. We would not want to drain the requests to update schema.

Geal · 2021-10-20T09:12:18Z

draining the requests was costly because some of them were staying there for a long time, right? Or was there another reason? What was the cost? Memory usage?

BrynCooke · 2021-10-22T19:32:06Z

Draining requests is costly due to timeouts. During this time we cannot serve new requests right?

Geal · 2021-10-25T13:17:53Z

as said in the call earlier, with #5 we can accept new connections using the new schema right away, we don't need to wait for previous connections to be drained

Geal self-assigned this Oct 26, 2021

o0Ignition0o transferred this issue from another repository Nov 4, 2021

Geal closed this as completed in e0f7070 Nov 5, 2021

nateql added size/medium labels Nov 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema update is not propagated to the HTTP server #41

Schema update is not propagated to the HTTP server #41

Geal commented Oct 19, 2021

Geal commented Oct 19, 2021 •

edited

Loading

Geal commented Oct 19, 2021

BrynCooke commented Oct 20, 2021

Geal commented Oct 20, 2021

BrynCooke commented Oct 22, 2021

Geal commented Oct 25, 2021 •

edited

Loading

Schema update is not propagated to the HTTP server #41

Schema update is not propagated to the HTTP server #41

Comments

Geal commented Oct 19, 2021

Geal commented Oct 19, 2021 • edited Loading

Geal commented Oct 19, 2021

BrynCooke commented Oct 20, 2021

Geal commented Oct 20, 2021

BrynCooke commented Oct 22, 2021

Geal commented Oct 25, 2021 • edited Loading

Geal commented Oct 19, 2021 •

edited

Loading

Geal commented Oct 25, 2021 •

edited

Loading