Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema update is not propagated to the HTTP server #41

Closed
Geal opened this issue Oct 19, 2021 · 6 comments
Closed

Schema update is not propagated to the HTTP server #41

Geal opened this issue Oct 19, 2021 · 6 comments
Assignees

Comments

@Geal
Copy link
Contributor

Geal commented Oct 19, 2021

Describe the bug
maybe I'm missing something, but it looks like UpdateSchema messages change the schema in the state machine: https://github.com/apollographql/router/blob/07dfdddedc3e90cf6a7e0124ceb9f1a23446cae6/crates/apollo-router/src/state_machine.rs#L150-L172

but does not pass it to the HTTP server. It is only changed here: https://github.com/apollographql/router/blob/07dfdddedc3e90cf6a7e0124ceb9f1a23446cae6/crates/apollo-router/src/state_machine.rs#L193-L196

To Reproduce

At this line: https://github.com/apollographql/router/blob/07dfdddedc3e90cf6a7e0124ceb9f1a23446cae6/crates/apollo-router/src/lib.rs#L579

add the following:

SchemaKind::Instance(Schema::from_str("").unwrap())
                .into_stream()
                .boxed(),

This will first set up a server with the configuration and an empty schema, then try to update the schema. Any query to the router will fail because the schema used is still the empty one

@Geal
Copy link
Contributor Author

Geal commented Oct 19, 2021

I see three ways to fix that:

  • put the schema in an Arc<Mutex<>> so that it can be updated by the state machine and read by the HTTP server. I'm worried about introducing a mutex in the hot path though, since all new queries will go through reading the schema and query planning
  • since the HTTP server has a graceful shutdown mechanism, and we can manage the TCP listener separately, we could do as follows:
    • the state machine manages the TCP listener. This is necessary because it is not clonable, it is not possible to get it back from warp and hyper, and binding a new one with REUSE_PORT would lose some connections. we might need an extra step, like plugging it into a MPMC queue (potential perf issues introduced here)
    • we pass a copy of the receiver at HTTP server creation, along with the FederatedGraph instance
    • whenever there's a new schema, we create a new HTTP server with a new FederatedGraph, and we tell the old one to stop through graceful shutdown. As I have measured, creating a new server takes 20µs on my machine, and the server is 320 bytes, so it is not a costly operation
  • rely on external solutions like systemd and listenfd to restart the server while keeping the listen socket

I'm partial to the second solution, since I have exhaustively explored that particular rabbit hole. Managing the listen socket outside of warp will be useful when we start putting bounds on the number of sessions and rate limiting new connections.

@Geal
Copy link
Contributor Author

Geal commented Oct 19, 2021

alternate version for option 2:

with this method we do not need to introduce another queue, but we will need to reimplement the graceful shutdown

@BrynCooke
Copy link
Contributor

Just bear in mind that it's not restarting the server that is costly. It's draining the requests. That's why we were using shared state before. We would not want to drain the requests to update schema.

@Geal
Copy link
Contributor Author

Geal commented Oct 20, 2021

draining the requests was costly because some of them were staying there for a long time, right? Or was there another reason? What was the cost? Memory usage?

@BrynCooke
Copy link
Contributor

Draining requests is costly due to timeouts. During this time we cannot serve new requests right?

@Geal
Copy link
Contributor Author

Geal commented Oct 25, 2021

as said in the call earlier, with #5 we can accept new connections using the new schema right away, we don't need to wait for previous connections to be drained

@Geal Geal self-assigned this Oct 26, 2021
@o0Ignition0o o0Ignition0o transferred this issue from another repository Nov 4, 2021
@Geal Geal closed this as completed in e0f7070 Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants