Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L2 announcement #25471

Merged
merged 12 commits into from Jun 5, 2023
Merged

L2 announcement #25471

merged 12 commits into from Jun 5, 2023

Conversation

dylandreimerink
Copy link
Member

This PR adds the L2 announcement feature. This feature makes service VIPs (specifically ExternalIPs and Loadbalancer IPs) available/routable on a local area network by responding to ARP queries. For Cilium setups in home, office, or campus networks the only option to expose a service externally would have been via node ports. Node ports are limited in the ports they can use and do not provide out of the box high availability, making clients responsible for picking another host of the first option goes down. With L2 announcements, the same IP will fail over to a new node if the old node dies. It also offers more flexibility with regards to the choice of IPs and ports.

Due to the nature of ARP, only 1 node can respond to ARP queries with its MAC address. If multiple nodes are configured to announce the same IP, leader election will take place to decided which node should respond.

This is a block diagram of the implementation:
l2-aware-lb-eBPF responder-v3 drawio

Orange nodes are newly added, blue is existing and unmodified, green are existing and modified, gray/dotted are planned for future iterations.

To control the feature a new CiliumL2AnnouncementPolicy CRD is added, its resource, services, local node, and available network devices feed into the "L2 announcer" cell in the agent. This cell contains the logic that distills all k8s resources down into a desired state for the local datapath.

Communication happens via a statedb table which gets read by the "L2 responder" cell. Its job is to reconcile the desired state with the actual map state. And as soon as the GARP sender component is available it will trigger gratuitous ARP messages to be sent when first adding a given entry to the map.

A "L2 responder map" cell provides a wrapper around the actual BPF map interface. This map is loaded from the BPF map definition and used by the L2 responder code in BPF. The L2 responder code is called from the "from-netdev" program, so when L2 announcements are enabled, netdev programs are loaded and attached.

This feature is currently IPv4 only, but the design allows for the future addition of IPv6 support.

Added L2 announcement feature

@dylandreimerink dylandreimerink added kind/feature This introduces new functionality. release-note/major This PR introduces major new functionality to Cilium. labels May 16, 2023
@dylandreimerink dylandreimerink changed the title L2 announcement feature L2 announcement May 16, 2023
@dylandreimerink
Copy link
Member Author

/test

@dylandreimerink dylandreimerink marked this pull request as ready for review May 16, 2023 09:28
@dylandreimerink dylandreimerink requested review from a team as code owners May 16, 2023 09:28
Copy link
Contributor

@PhilipSchmid PhilipSchmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work 🎉! I had a look at the doc, Helm chart, and CRD part and added some minor comments.

Documentation/network/l2-announcements.rst Outdated Show resolved Hide resolved
Documentation/network/l2-announcements.rst Outdated Show resolved Hide resolved
Documentation/network/l2-announcements.rst Outdated Show resolved Hide resolved
Documentation/network/l2-announcements.rst Outdated Show resolved Hide resolved
Documentation/network/l2-announcements.rst Outdated Show resolved Hide resolved
Documentation/network/l2-announcements.rst Show resolved Hide resolved
pkg/k8s/apis/cilium.io/v2alpha1/l2announcement_types.go Outdated Show resolved Hide resolved
@dylandreimerink dylandreimerink force-pushed the feature/l2-aware-lb branch 2 times, most recently from 4a2b2e8 to 791d4c8 Compare May 16, 2023 13:20
Documentation/network/l2-announcements.rst Outdated Show resolved Hide resolved
Documentation/network/l2-announcements.rst Outdated Show resolved Hide resolved
Documentation/network/l2-announcements.rst Outdated Show resolved Hide resolved
Documentation/network/l2-announcements.rst Outdated Show resolved Hide resolved
@dylandreimerink dylandreimerink removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Jun 1, 2023
Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contributing changes lgtm

@dylandreimerink
Copy link
Member Author

/test

@dylandreimerink
Copy link
Member Author

/test

Copy link
Contributor

@michi-covalent michi-covalent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for cli files.

@dylandreimerink dylandreimerink force-pushed the feature/l2-aware-lb branch 2 times, most recently from 299fb50 to b256712 Compare June 2, 2023 16:31
@dylandreimerink
Copy link
Member Author

/test

This commit adds flags, options and helm values to enable the new
L2 announcement feature and tune some of its parameters. These options
will be used by code added in followup commits in the patch set.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
The L2 responder code needs to know if the agent stopped maintaining
map values so it can yield when the current node is no longer the
lease holder.

This commit adds a new value to the configmap and a cell that updates
the that value periodically with the current monotonic time. This is
a generic mechanism which can also be used by other feature in the
future, but for now is primarily intended for the above described
purpose, which will be implemented in a followup commit in this patch
set.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit adds logic for the ARP layer to the pktgen which will be
utilized by future tests for the L2 responder logic.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
The bpf test runner has had a dump context feature for a while now
but there was no makefile option to easily access it. This commit adds
that option which is handy if you wish to inspect the program context
at a binary level during test development or debugging.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit adds the L2 Announcement Policy CRD for the new
L2 announcement feature which will be utilized in followup commits.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit makes a new package where table definitions will live that
serve as communication layer between datapath and agent components.

The first statedb table to be added is the L2 announce table. Each entry
contains a network device and IP address which the agent wishes to
announce via gARP and for which ARP queries should be answered.

The schema includes fields to support partial reconciliation.

In followup commits, the producer and consumer components will be added.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Added a new BPF map which is used to communicate to which IPs on which
network interfaces the datapath should respond to ARP requests.

The code agent code to write to this map and the datapath code to read
from this map will be added in a followup commit.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit adds the logic for the L2 responder. If the feature is
enabled, netdev programs are loaded and attached. When ARP requests
are processed, we check if the combination of network device and IP
address exists in the L2 responder v4 map. If so, send an ARP reply to
the sender with the MAC address of the current network device. Otherwise
pass the ARP request to the stack.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit adds the pkg/datapath/l2responder. This cell is responsible
for taking the desired state from the L2 announce table and to apply it
to the l2 responder map.

The cell will do partial reconciliation on incremental updates to the
table and periodically perform full reconciliation to catch any drift
from the desired state may it occur for any reason.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit adds the agent side logic for the L2 announcer feature.
This cell takes all policies, filters out those that do not match the
local node labels. It then takes all services and filters out the
services that do not match the policies. For each remaining service
the agent participates in leader election by way of the k8s leases.

For each service for which we are the leader, we collect all IPs and
network device combinations which should be announced according to the
policies that matched the services in the first place. And this list
gets reconciled to the table with the desired state.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit adds a new page with documentation for users on how to
use the new L2 announcement feature, its requirements, and limitations.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This package needs a codeowner, assinging to @cilium/sig-agent since we
have no specific team for this feature and its a agent component.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@dylandreimerink
Copy link
Member Author

/test

@dylandreimerink dylandreimerink added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 3, 2023
@borkmann borkmann merged commit 662d86f into cilium:main Jun 5, 2023
61 checks passed
@maxpain
Copy link

maxpain commented Jun 30, 2023

Great work!
When IPv6 support will be implemented?

Comment on lines +78 to +80
// This timer triggers full reconciliation once in a while, in case partial reconciliation
// got out of sync or the map was changed underneath us.
ticker := time.NewTicker(5 * time.Minute)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dylandreimerink Did you consider using something more formal for reconciliation? We have facilities like pkg/controller or Hive jobs in order to support periodic reconcilers like this while also exposing the status into cilium status output in a standardized way so that they're easer to debug.

cc @joamaki

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have. We are playing with a few ideas to find the best solution: #27884 and #28276

Not just for L2 but a bit broader

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature This introduces new functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/major This PR introduces major new functionality to Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet