Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpv1: Add support for setting BGP timer parameters in CiliumBGPNeighbor CRD #25408

Merged
merged 1 commit into from May 24, 2023

Conversation

rastislavs
Copy link
Contributor

@rastislavs rastislavs commented May 12, 2023

Extends the CiliumBGPNeighbor CRD with 3 new configuration options: ConnectRetryTime, HoldTime and KeepAliveTime. These can be used to fine-tune BGP peering, e.g. to achieve faster failover times. If not set, the default values for the affected BGP timers remain the same as before this change.

Example configuration:

apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
  name: tor
spec:
  virtualRouters:
  - localASN: 65001
    exportPodCIDR: true
    neighbors:
    - peerAddress: "172.0.0.1/32"
      peerASN: 65000
      holdTime: 120s
      keepAliveTime: 30s
      connectRetryTime: 60s

Example state dump of peers after configuring:

# cilium bgp peers -o json
[
  {
    "applied-hold-time-seconds": 120,
    "applied-keep-alive-time-seconds": 30,
    "configured-hold-time-seconds": 120,
    "configured-keep-alive-time-seconds": 30,
    "connect-retry-time-seconds": 60,
    "families": [
      {
        "advertised": 1,
        "afi": "ipv4",
        "received": 1,
        "safi": "unicast"
      },
      {
        "afi": "ipv6",
        "safi": "unicast"
      }
    ],
    "local-asn": 65001,
    "peer-address": "172.0.0.1",
    "peer-asn": 65000,
    "session-state": "established",
    "uptime-nanoseconds": 8144057623
  }
]

Implementation notes:

  • The default values for the timer params are the same as GoBGP uses internally, so there is no change in behaviour if they are not explicitly set.
  • The timer params are defaulted in the CiliumBGPPeeringPolicy at the BGP Agent level, so that the same defaults can be consistently applied across all future BGP backends.
  • The BGP Router interface has been extended with a new method UpdateNeighbor, which is called from the BGP Manager's neighborReconciler when a neighbor needs to be updated.
Add support for setting BGP timer parameters in CiliumBGPNeighbor CRD

@rastislavs rastislavs requested review from a team as code owners May 12, 2023 09:08
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label May 12, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 12, 2023
@rastislavs rastislavs force-pushed the bgp-timers branch 3 times, most recently from eebd777 to 76bce94 Compare May 12, 2023 09:45
@rastislavs rastislavs added area/bgp release-note/minor This PR changes functionality that users may find relevant to operating Cilium. and removed kind/community-contribution This was a contribution made by a community member. labels May 12, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 12, 2023
Copy link
Contributor

@harsimran-pabla harsimran-pabla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, overall looks good

pkg/bgpv1/gobgp/server.go Outdated Show resolved Hide resolved
pkg/bgpv1/gobgp/server.go Show resolved Hide resolved
api/v1/openapi.yaml Outdated Show resolved Hide resolved
api/v1/openapi.yaml Outdated Show resolved Hide resolved
Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some comments. My main concern here is a validation of the timer values and how to feedback the wrong configurations to users.

I don't see any range checking on controller side and we currently let GoBGP to validate the value. That means we effectively leaking the underlying BGP's specification to CRD. Ideally, we should be able to control which value is valid and which is not.

Another problem of validating the value in GoBGP level is we don't have a good way to feedback the wrong configuration to users since the error happens during async reconciliation. Ideally, users should be able to notice the invalid value when they create/update the CRD. In that sense, it would be nice to validate configuration at k8s level (with OpenAPI constraint).

@rastislavs rastislavs force-pushed the bgp-timers branch 3 times, most recently from ffab248 to 6eaf608 Compare May 17, 2023 11:24
pkg/bgpv1/gobgp/server.go Outdated Show resolved Hide resolved
@YutaroHayakawa YutaroHayakawa self-assigned this May 18, 2023
Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for K8s and API

@rastislavs rastislavs force-pushed the bgp-timers branch 3 times, most recently from fbae65a to 097a291 Compare May 19, 2023 10:24
Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now overall looks great! At last, let me ask you to consolidate the last two commits into the first commit for readability. You made an API response name changes and some logic changes in those commits, but the final form is only meaningful, so it's more reader-friendly to only show the final form. It's fine to do force-push. It's common to do that in this project for the clean commit history.

@rastislavs
Copy link
Contributor Author

rastislavs commented May 23, 2023

@YutaroHayakawa

Now overall looks great! At last, let me ask you to consolidate the last two commits into the first commit for readability.

Consolidated into a single commit, as keeping the second one would require resolving too many conflicts and it was not that significant (just unit tests). Kept the gist of all commit messages in the final one.

@YutaroHayakawa
Copy link
Member

Consolidation looks good! And sorry, one more. I noticed I didn't answer to your question here.

#25408 (comment)

What I mean by "unit" here is like a "Seconds".

@rastislavs
Copy link
Contributor Author

rastislavs commented May 23, 2023

@YutaroHayakawa

#25408 (comment)

What I mean by "unit" here is like a "Seconds".

Ah, so you meant including e.g. "Seconds" suffix for the timer intervals in the CRD API? I don't think it is necessary, as the datatype is metav1.Duaration (time.Duration) which means the units are defined in the value - e.g. 60s / 1m etc: connectRetryTime: 60s / connectRetryTime: 1m.

@YutaroHayakawa
Copy link
Member

Ah, alright. That's a good point, but since we internally round them up, better to mention it in the document.

Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, LGTM! Thanks!

@rastislavs
Copy link
Contributor Author

Ah, alright. That's a good point, but since we internally round them up, better to mention it in the document.

Fair point, added Rounded internally to the nearest whole second. to the docs.

@rastislavs
Copy link
Contributor Author

/test

Extends the CiliumBGPNeighbor CRD with 3 new configuration options:
ConnectRetryTime, HoldTime and KeepAliveTime. These can be used to
fine-tune BGP peering, e.g. to achieve faster failover times.
If not set, the default values for the affected timers remain
the same as before this change.

This also introduces a new UpdateNeighbor API for bgpv1,
to support changes of an existing peering. During the update
we first dump existing peer configuration from GoBGP and then
perform the update on the dumped value. The reason for that is that
many peer config fields are defaulted internally in GoBGP and would
cause peer reset if not provided on update.

Timer values are included in the state API of the BgpPeer. Since the
applied values of HoldTime and KeepAliveTime may be different
from the configured values (they also depend on negotiation during
the session setup), the state API differentiates between
"configured" and "applied" values of these timer intervals.

Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
@rastislavs
Copy link
Contributor Author

/test

@rastislavs
Copy link
Contributor Author

rastislavs commented May 24, 2023

/test-runtime

looks like a flake in bgpv1 Test_NeighborAddDel: #25637

(failed in https://jenkins.cilium.io/job/Cilium-PR-Runtime-kernel-net-next/120/)

@rastislavs rastislavs added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 24, 2023
@squeed squeed merged commit 3c44393 into cilium:main May 24, 2023
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bgp ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants