Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the MetalLB configmap with k8s custom resources #196

Open
danderson opened this issue Mar 14, 2018 · 7 comments

Comments

@danderson
Copy link
Owner

commented Mar 14, 2018

MetalLB currently uses a ConfigMap for its configuration. This is easy, but not very kubernetes-native.

We should consider migrating the config to custom resource definitions. Looking at our config today, we would have ~3 custom resources:

  • BGP peer, similar to how Calico does it when it's running in "kubernetes datastore" mode
  • Address pool
  • BGP community aliases (not sure about this one - it's kind of an odd one out in the current config, not sure where to put it)

Major advantage of custom resources is that Kubernetes users already know how to manipulate them, and it gives us a great attachment point for event logs (peer connected, address allocated...). This lets us inject most of what's in MetalLB logs today into the standard kubernetes event stream. It also lets us use webhooks in a more natural way, where we can allow/deny changes to individual subresources of the config.

Major disadvantage is that CRDs are a nightmare to work with in the k8s go-client, we basically have to compile our own fork of go-client to get custom type definitions, unless things have changed for the better.

Thoughts welcome on this one, I'm conflicted about whether we should do this.

@miekg

This comment has been minimized.

Copy link
Collaborator

commented Mar 14, 2018

@mleklund

This comment has been minimized.

Copy link

commented Mar 15, 2018

in k8-ipcontroller they are controlled with kubectl

apiVersion: ipcontroller.ext/v1
kind: IpClaimPool
metadata:
    name: test-pool
spec:
    cidr: 192.168.0.248/29
    ranges:
        - - 192.168.0.249
          - 192.168.0.250
        - - 192.168.0.252
          - 192.168.0.253
@danderson

This comment has been minimized.

Copy link
Owner Author

commented Mar 15, 2018

Kubectl has some automagic support for handling CRDs, so it would probably look something like:

kubectl edit metallb.universe.tf/AddressPool my-addr-pool

Which would let you edit something that looks like:

apiVersion: metallb.universe.tf/v1
kind: AddressPool
metadata:
  name: my-addr-pool
spec:
  protocol: layer2
  cidr:
  - 192.168.16.240/30

Exact content of each CRD is to be figured out still, but that's approximately what things would look like.

Agreed that a configmap makes things simpler by putting everything in one place... But that also means we lack handy objects to attach things like event logs. Maybe that's fine and we should just do structured logging in MetalLB instead, I don't know.

@mleklund

This comment has been minimized.

Copy link

commented Mar 16, 2018

Allowing ranges without being cidr would be sweet as well. Sometimes ranges are broken up because of technical debt.

@JoshuaAndrew

This comment has been minimized.

Copy link

commented Mar 13, 2019

@danderson
migrating the config to custom resource definitions, it is good idea, what about this CRDs below.

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: bgpadvertisements.metallb.universe.tf
spec:
  group: metallb.universe.tf
  names:
    kind: BGPAdvertisement
    listKind: BGPAdvertisementList
    plural: bgpadvertisements
    singular: bgpadvertisement
  scope: Cluster
  subresources:
    status: {}
  validation:
    openAPIV3Schema:
      properties:
        apiVersion:
          type: string
        kind:
          type: string
        metadata:
          type: object
        spec:
          type: object
        status:
          type: object
  version: v1alpha1
  versions:
  - name: v1alpha1
    served: true
    storage: true
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ippools.metallb.universe.tf
spec:
  group: metallb.universe.tf
  names:
    kind: IPPool
    listKind: IPPoolList
    plural: ippools
    singular: ippool
  scope: Cluster
  subresources:
    status: {}
  validation:
    openAPIV3Schema:
      properties:
        apiVersion:
          type: string
        kind:
          type: string
        metadata:
          type: object
        spec:
          type: object
        status:
          type: object
  version: v1alpha1
  versions:
  - name: v1alpha1
    served: true
    storage: true
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: bgppeers.metallb.universe.tf
spec:
  group: metallb.universe.tf
  names:
    kind: BGPPeer
    listKind: BGPPeerList
    plural: bgppeers
    singular: bgppeer
  scope: Cluster
  subresources:
    status: {}
  validation:
    openAPIV3Schema:
      properties:
        apiVersion:
          type: string
        kind:
          type: string
        metadata:
          type: object
        spec:
          type: object
        status:
          type: object
  version: v1alpha1
  versions:
  - name: v1alpha1
    served: true
    storage: true

Example Custom Resource
BGPAdvertisement

apiVersion: metallb.universe.tf/v1alpha1
kind: BGPAdvertisement
metadata:
  name: example-bgpadvertisement
spec:
  # (optional) How much you want to aggregate up the IP address
  # before advertising. For example, advertising 1.2.3.4 with
  # aggregation-length=24 would end up advertising 1.2.3.0/24.
  # For the majority of setups, you'll want to keep this at the
  # default of 32, which advertises the entire IP address unmodified.
  aggregationLength: 24
  # (optional) The value of the BGP "local preference" attribute
  # for this advertisement. Only used with IBGP peers,
  # i.e. peers where peer-asn is the same as my-asn.
  localPref: 100
  # (optional) BGP communities to attach to this
  # advertisement. Communities are given in the standard
  # two-part form <asn>:<community number>. You can also use
  # alias names (see below).
  communities:
    - 64512:64515
    - no-export

example-ippool-bgp

apiVersion: metallb.universe.tf/v1alpha1
kind: IPPool
metadata:
  name: example-ippool-bgp
spec:
  # Protocol can be used to select how the announcement is done.
  # Supported values are bgp and layer2.
  protocol: bgp
  # A list of IP address ranges over which MetalLB has
  # authority. You can list multiple ranges in a single pool, they
  # will all share the same settings. Each range can be either a
  # CIDR prefix, or an explicit start-end range of IPs.
  addresses:
  - 198.51.100.0/24
  - 192.168.0.150-192.168.0.200
  # (optional) If true, MetalLB will not allocate any address that
  # ends in .0 or .255. Some old, buggy consumer devices
  # mistakenly block traffic to such addresses under the guise of
  # smurf protection. Such devices have become fairly rare, but
  # the option is here if you encounter serving issues.
  avoidBuggyIps: true
  # (optional, default true) If false, MetalLB will not automatically
  # allocate any address in this pool. Addresses can still explicitly
  # be requested via loadBalancerIP or the address-pool annotation.
  autoAssign: false
  # (optional) A list of BGP advertisements to make, when
  # protocol=bgp. Each address that gets assigned out of this pool
  # will turn into this many advertisements. For most simple
  # setups, you'll probably just want one.
  #
  # The default value for this field is a single advertisement with
  # all parameters set to their respective defaults.
  advertisement: example-bgpadvertisement

example-bgppeer

apiVersion: metallb.universe.tf/v1alpha1
kind: BGPPeer
metadata:
  name: example-bgppeer
spec:
  # The target IP address for the BGP session.
  peerAddress: 10.0.0.1
  # The BGP AS number that MetalLB expects to see advertised by the router.
  peerAsn: 64512
  # The BGP AS number that MetalLB should speak as.
  myAsn: 64512
  # (optional) the TCP port to talk to. Defaults to 179, you shouldn't
  # need to set this in production.
  peerPort: 179
  # (optional) The proposed value of the BGP Hold Time timer. Refer to
  # BGP reference material to understand what setting this implies.
  holdTime: 120s
  # (optional) The router ID to use when connecting to this peer. Defaults
  # to the node IP address. Generally only useful when you need to peer with
  # another BGP router running on the same machine as MetalLB.
  routerId: 1.2.3.4
  # (optional) Password for TCPMD5 authenticated BGP sessions offered by some peers.
  password: "yourPassword"
  # (optional) The nodes that should connect to this peer. A node
  # matches if at least one of the node selectors matches. Within
  # one selector, a node matches if all the matchers are
  # satisfied. The semantics of each selector are the same as the
  # label- and set-based selectors in Kubernetes, documented at
  # https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/.
  # By default, all nodes are selected.
  nodeSelectors:
    - # Match by label=value
      match-labels:
        kubernetes.io/hostname: prod-01
      # Match by 'key OP values' expressions
      match-expressions:
        - key: beta.kubernetes.io/arch
          operator: In
          values: [amd64, arm]
@alexlovelltroy

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2019

I like the IPPool CRD you're suggesting. Some potentially small naming questions:

Is there a reason you used the name IPPool instead of the MetalLB configuration parameter AddressPool?

BGP and layer2 are referenced in the documentation as modes. Is the word "protocol" more descriptive to you?

I'm confused about the other CRDs though. Can you share use cases where you would use the CRD to create/modify a BGPAdvertisement or BGPPeer? That seems like something we can trust metallb's BGP code to do for us.

@danderson

This comment has been minimized.

Copy link
Owner Author

commented Mar 16, 2019

The other CRDs are mirroring parts of MetalLB's configuration, for setting up BGP peers and configuring how BGP advertisements are computed from addresses.

I'm not convinced that it's worth separating BGPAdvertisement into its own CRD. It cleans stuff up a little bit by letting multiple address pools reference the same advertisement configuration... But that's already a feature that has, AFAIK, approximately zero users, and breaking it out adds significant extra complexity because the code now has to deal with keeping these two data structures in sync given k8s's eventually consistent announcements.

Aside from that, yeah, that's roughly the shape I was expecting. +1 to Alex's naming suggestions.

Also be aware that I have no time to review PRs in this direction, so don't expect much activity if you start sending PRs :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.