Skip to content

feat: implement bpf-steered activator#2

Closed
ctrox wants to merge 5 commits intomainfrom
bpf-activator
Closed

feat: implement bpf-steered activator#2
ctrox wants to merge 5 commits intomainfrom
bpf-activator

Conversation

@ctrox
Copy link
Copy Markdown
Owner

@ctrox ctrox commented Jan 11, 2024

The old activator had several problems:

  • it was listening on the same port as the container which made things
    way more difficult than they need to be. Even with the "network
    locking" there were cases where clients would get a connection refused
    as at some point the socket needs to be closed/reopened.
  • the network lock had another side effect that it would trigger TCP
    retransmits which delayed some requests by a whole second.

While the new activator is not fully realised in eBPF, it's way more
reliable as we can simply steer traffic without any interruptions just
with a few maps. Essentially activation now works like this:

  1. container is in checkpointed state.
  2. incoming packet destined to container.
  3. eBPF program redirects packet to userspace TCP proxy listening on
    random free port.
  4. proxy accepts TCP session and triggers restore of container.
  5. proxy connects to container as soon as it's running.
  6. proxy shuffles data back and forth for this TCP session and all other
    connections that were established while the container was restoring.
  7. write to eBPF map to indicate it no longer needs to redirect to
    proxy.
  8. traffic flows to container directly as usual without going through the
    proxy for as long as it's alive.
  9. on checkpoint the redirect is enabled again.

It still only needs to proxy the requests during restore while having a
more reliable activator that never drops a packet. The current
implementation is using TC as it allows to modify ingress and egress
packets. A full eBPF solution has been experimented with but the main
issue is that we need to "hold back" packets while the container is
being restored without dropping them. As soon as the initial TCP SYN is
dropped, the client will wait 1 second for retransmitting and make
everything quite slow. I was unable to find a solution for this as of
now so instead the userspace proxy is still required.

ctrox added 5 commits January 3, 2024 19:09
The old activator had several problems:

* it was listening on the same port as the container which made things
  way more difficult than they need to be. Even with the "network
  locking" there were cases where clients would get a connection refused
  as at some point the socket needs to be closed/reopened.
* the network lock had another side effect that it would trigger TCP
  retransmits which delayed some requests by a whole second.

While the new activator is not fully realised in eBPF, it's way more
reliable as we can simply steer traffic without any interruptions just
with a few maps. Essentially activation now works like this:

1. container is in checkpointed state.
2. incoming packet destined to container.
3. eBPF program redirects packet to userspace TCP proxy listening on
   random free port.
4. proxy accepts TCP session and triggers restore of container.
5. proxy connects to container as soon as it's running.
6. proxy shuffles data back and forth for this TCP session and all other
   connections that were established while the container was restoring.
7. write to eBPF map to indicate it no longer needs to redirect to
   proxy.
8. traffic flows to container directly as usual without going through the
   proxy for as long as it's alive.
9. on checkpoint the redirect is enabled again.

It still only needs to proxy the requests during restore while having a
more reliable activator that never drops a packet. The current
implementation is using TC as it allows to modify ingress and egress
packets. A full eBPF solution has been experimented with but the main
issue is that we need to "hold back" packets while the container is
being restored without dropping them. As soon as the initial TCP SYN is
dropped, the client will wait 1 second for retransmitting and make
everything quite slow. I was unable to find a solution for this as of
now so instead the userspace proxy is still required.
we don't need net-lock anymore, so we can stop building with nftables
and have a single CRIU version for all supported platforms.
To avoid redirect loops, the activator now uses a known port for the
connection to the backend so we can disable redirects for these
packets.

Additionally this splits the bpf program into two separate programs for
ingress and egress, mainly to make things easier to understand but it
also makes the egress path shorter.
it does not really make sense to handle the loopback interface any
different than eth, this was a leftover from before introducing a known
local port.
@ctrox ctrox closed this Jan 13, 2024
@ctrox ctrox deleted the bpf-activator branch January 13, 2024 11:39
@ctrox
Copy link
Copy Markdown
Owner Author

ctrox commented Jan 13, 2024

this was merged but github did not detect it because I did not create a merge commit 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant