Conversation
The old activator had several problems: * it was listening on the same port as the container which made things way more difficult than they need to be. Even with the "network locking" there were cases where clients would get a connection refused as at some point the socket needs to be closed/reopened. * the network lock had another side effect that it would trigger TCP retransmits which delayed some requests by a whole second. While the new activator is not fully realised in eBPF, it's way more reliable as we can simply steer traffic without any interruptions just with a few maps. Essentially activation now works like this: 1. container is in checkpointed state. 2. incoming packet destined to container. 3. eBPF program redirects packet to userspace TCP proxy listening on random free port. 4. proxy accepts TCP session and triggers restore of container. 5. proxy connects to container as soon as it's running. 6. proxy shuffles data back and forth for this TCP session and all other connections that were established while the container was restoring. 7. write to eBPF map to indicate it no longer needs to redirect to proxy. 8. traffic flows to container directly as usual without going through the proxy for as long as it's alive. 9. on checkpoint the redirect is enabled again. It still only needs to proxy the requests during restore while having a more reliable activator that never drops a packet. The current implementation is using TC as it allows to modify ingress and egress packets. A full eBPF solution has been experimented with but the main issue is that we need to "hold back" packets while the container is being restored without dropping them. As soon as the initial TCP SYN is dropped, the client will wait 1 second for retransmitting and make everything quite slow. I was unable to find a solution for this as of now so instead the userspace proxy is still required.
we don't need net-lock anymore, so we can stop building with nftables and have a single CRIU version for all supported platforms.
To avoid redirect loops, the activator now uses a known port for the connection to the backend so we can disable redirects for these packets. Additionally this splits the bpf program into two separate programs for ingress and egress, mainly to make things easier to understand but it also makes the egress path shorter.
it does not really make sense to handle the loopback interface any different than eth, this was a leftover from before introducing a known local port.
Owner
Author
|
this was merged but github did not detect it because I did not create a merge commit 🤷 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The old activator had several problems:
way more difficult than they need to be. Even with the "network
locking" there were cases where clients would get a connection refused
as at some point the socket needs to be closed/reopened.
retransmits which delayed some requests by a whole second.
While the new activator is not fully realised in eBPF, it's way more
reliable as we can simply steer traffic without any interruptions just
with a few maps. Essentially activation now works like this:
random free port.
connections that were established while the container was restoring.
proxy.
proxy for as long as it's alive.
It still only needs to proxy the requests during restore while having a
more reliable activator that never drops a packet. The current
implementation is using TC as it allows to modify ingress and egress
packets. A full eBPF solution has been experimented with but the main
issue is that we need to "hold back" packets while the container is
being restored without dropping them. As soon as the initial TCP SYN is
dropped, the client will wait 1 second for retransmitting and make
everything quite slow. I was unable to find a solution for this as of
now so instead the userspace proxy is still required.