New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
L2 announcement #25471
L2 announcement #25471
Conversation
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work 🎉! I had a look at the doc, Helm chart, and CRD part and added some minor comments.
pkg/k8s/apis/cilium.io/client/crds/v2alpha1/ciliuml2announcementpolicies.yaml
Show resolved
Hide resolved
pkg/k8s/apis/cilium.io/client/crds/v2alpha1/ciliuml2announcementpolicies.yaml
Outdated
Show resolved
Hide resolved
a732c84
to
64b3462
Compare
4a2b2e8
to
791d4c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contributing changes lgtm
/test |
016b853
to
3ab13fd
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm for cli files.
299fb50
to
b256712
Compare
/test |
This commit adds flags, options and helm values to enable the new L2 announcement feature and tune some of its parameters. These options will be used by code added in followup commits in the patch set. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
The L2 responder code needs to know if the agent stopped maintaining map values so it can yield when the current node is no longer the lease holder. This commit adds a new value to the configmap and a cell that updates the that value periodically with the current monotonic time. This is a generic mechanism which can also be used by other feature in the future, but for now is primarily intended for the above described purpose, which will be implemented in a followup commit in this patch set. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit adds logic for the ARP layer to the pktgen which will be utilized by future tests for the L2 responder logic. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
The bpf test runner has had a dump context feature for a while now but there was no makefile option to easily access it. This commit adds that option which is handy if you wish to inspect the program context at a binary level during test development or debugging. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit adds the L2 Announcement Policy CRD for the new L2 announcement feature which will be utilized in followup commits. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit makes a new package where table definitions will live that serve as communication layer between datapath and agent components. The first statedb table to be added is the L2 announce table. Each entry contains a network device and IP address which the agent wishes to announce via gARP and for which ARP queries should be answered. The schema includes fields to support partial reconciliation. In followup commits, the producer and consumer components will be added. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Added a new BPF map which is used to communicate to which IPs on which network interfaces the datapath should respond to ARP requests. The code agent code to write to this map and the datapath code to read from this map will be added in a followup commit. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit adds the logic for the L2 responder. If the feature is enabled, netdev programs are loaded and attached. When ARP requests are processed, we check if the combination of network device and IP address exists in the L2 responder v4 map. If so, send an ARP reply to the sender with the MAC address of the current network device. Otherwise pass the ARP request to the stack. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit adds the pkg/datapath/l2responder. This cell is responsible for taking the desired state from the L2 announce table and to apply it to the l2 responder map. The cell will do partial reconciliation on incremental updates to the table and periodically perform full reconciliation to catch any drift from the desired state may it occur for any reason. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit adds the agent side logic for the L2 announcer feature. This cell takes all policies, filters out those that do not match the local node labels. It then takes all services and filters out the services that do not match the policies. For each remaining service the agent participates in leader election by way of the k8s leases. For each service for which we are the leader, we collect all IPs and network device combinations which should be announced according to the policies that matched the services in the first place. And this list gets reconciled to the table with the desired state. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit adds a new page with documentation for users on how to use the new L2 announcement feature, its requirements, and limitations. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This package needs a codeowner, assinging to @cilium/sig-agent since we have no specific team for this feature and its a agent component. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
b256712
to
a831ed6
Compare
/test |
Great work! |
// This timer triggers full reconciliation once in a while, in case partial reconciliation | ||
// got out of sync or the map was changed underneath us. | ||
ticker := time.NewTicker(5 * time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dylandreimerink Did you consider using something more formal for reconciliation? We have facilities like pkg/controller
or Hive jobs in order to support periodic reconcilers like this while also exposing the status into cilium status
output in a standardized way so that they're easer to debug.
cc @joamaki
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds the L2 announcement feature. This feature makes service VIPs (specifically ExternalIPs and Loadbalancer IPs) available/routable on a local area network by responding to ARP queries. For Cilium setups in home, office, or campus networks the only option to expose a service externally would have been via node ports. Node ports are limited in the ports they can use and do not provide out of the box high availability, making clients responsible for picking another host of the first option goes down. With L2 announcements, the same IP will fail over to a new node if the old node dies. It also offers more flexibility with regards to the choice of IPs and ports.
Due to the nature of ARP, only 1 node can respond to ARP queries with its MAC address. If multiple nodes are configured to announce the same IP, leader election will take place to decided which node should respond.
This is a block diagram of the implementation:
Orange nodes are newly added, blue is existing and unmodified, green are existing and modified, gray/dotted are planned for future iterations.
To control the feature a new CiliumL2AnnouncementPolicy CRD is added, its resource, services, local node, and available network devices feed into the "L2 announcer" cell in the agent. This cell contains the logic that distills all k8s resources down into a desired state for the local datapath.
Communication happens via a statedb table which gets read by the "L2 responder" cell. Its job is to reconcile the desired state with the actual map state. And as soon as the GARP sender component is available it will trigger gratuitous ARP messages to be sent when first adding a given entry to the map.
A "L2 responder map" cell provides a wrapper around the actual BPF map interface. This map is loaded from the BPF map definition and used by the L2 responder code in BPF. The L2 responder code is called from the "from-netdev" program, so when L2 announcements are enabled, netdev programs are loaded and attached.
This feature is currently IPv4 only, but the design allows for the future addition of IPv6 support.