feat: implement bpf-steered activator by ctrox · Pull Request #2 · ctrox/zeropod

ctrox · 2024-01-11T21:21:41Z

The old activator had several problems:

it was listening on the same port as the container which made things
way more difficult than they need to be. Even with the "network
locking" there were cases where clients would get a connection refused
as at some point the socket needs to be closed/reopened.
the network lock had another side effect that it would trigger TCP
retransmits which delayed some requests by a whole second.

While the new activator is not fully realised in eBPF, it's way more
reliable as we can simply steer traffic without any interruptions just
with a few maps. Essentially activation now works like this:

container is in checkpointed state.
incoming packet destined to container.
eBPF program redirects packet to userspace TCP proxy listening on
random free port.
proxy accepts TCP session and triggers restore of container.
proxy connects to container as soon as it's running.
proxy shuffles data back and forth for this TCP session and all other
connections that were established while the container was restoring.
write to eBPF map to indicate it no longer needs to redirect to
proxy.
traffic flows to container directly as usual without going through the
proxy for as long as it's alive.
on checkpoint the redirect is enabled again.

It still only needs to proxy the requests during restore while having a
more reliable activator that never drops a packet. The current
implementation is using TC as it allows to modify ingress and egress
packets. A full eBPF solution has been experimented with but the main
issue is that we need to "hold back" packets while the container is
being restored without dropping them. As soon as the initial TCP SYN is
dropped, the client will wait 1 second for retransmitting and make
everything quite slow. I was unable to find a solution for this as of
now so instead the userspace proxy is still required.

The old activator had several problems: * it was listening on the same port as the container which made things way more difficult than they need to be. Even with the "network locking" there were cases where clients would get a connection refused as at some point the socket needs to be closed/reopened. * the network lock had another side effect that it would trigger TCP retransmits which delayed some requests by a whole second. While the new activator is not fully realised in eBPF, it's way more reliable as we can simply steer traffic without any interruptions just with a few maps. Essentially activation now works like this: 1. container is in checkpointed state. 2. incoming packet destined to container. 3. eBPF program redirects packet to userspace TCP proxy listening on random free port. 4. proxy accepts TCP session and triggers restore of container. 5. proxy connects to container as soon as it's running. 6. proxy shuffles data back and forth for this TCP session and all other connections that were established while the container was restoring. 7. write to eBPF map to indicate it no longer needs to redirect to proxy. 8. traffic flows to container directly as usual without going through the proxy for as long as it's alive. 9. on checkpoint the redirect is enabled again. It still only needs to proxy the requests during restore while having a more reliable activator that never drops a packet. The current implementation is using TC as it allows to modify ingress and egress packets. A full eBPF solution has been experimented with but the main issue is that we need to "hold back" packets while the container is being restored without dropping them. As soon as the initial TCP SYN is dropped, the client will wait 1 second for retransmitting and make everything quite slow. I was unable to find a solution for this as of now so instead the userspace proxy is still required.

we don't need net-lock anymore, so we can stop building with nftables and have a single CRIU version for all supported platforms.

To avoid redirect loops, the activator now uses a known port for the connection to the backend so we can disable redirects for these packets. Additionally this splits the bpf program into two separate programs for ingress and egress, mainly to make things easier to understand but it also makes the egress path shorter.

it does not really make sense to handle the loopback interface any different than eth, this was a leftover from before introducing a known local port.

ctrox · 2024-01-13T11:40:17Z

this was merged but github did not detect it because I did not create a merge commit 🤷

ctrox added 5 commits January 3, 2024 19:09

feat: upgrade CRIU and remove nftables

f04dab3

we don't need net-lock anymore, so we can stop building with nftables and have a single CRIU version for all supported platforms.

refactor: unify loopback handling

6016e88

it does not really make sense to handle the loopback interface any different than eth, this was a leftover from before introducing a known local port.

docs: update README to reflect activator changes

64bb02b

ctrox force-pushed the bpf-activator branch from 437aec5 to 64bb02b Compare January 13, 2024 11:22

ctrox closed this Jan 13, 2024

ctrox deleted the bpf-activator branch January 13, 2024 11:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement bpf-steered activator#2

feat: implement bpf-steered activator#2
ctrox wants to merge 5 commits intomainfrom
bpf-activator

ctrox commented Jan 11, 2024

Uh oh!

ctrox commented Jan 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ctrox commented Jan 11, 2024

Uh oh!

ctrox commented Jan 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant