Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local-port annotation not respected by Cilium BGP #24737

Closed
2 tasks done
CorneJB opened this issue Apr 4, 2023 · 8 comments · Fixed by #25809
Closed
2 tasks done

local-port annotation not respected by Cilium BGP #24737

CorneJB opened this issue Apr 4, 2023 · 8 comments · Fixed by #25809
Assignees
Labels
area/bgp kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.

Comments

@CorneJB
Copy link

CorneJB commented Apr 4, 2023

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

When using node annotations to set the local-port, these values do not seem to be respected by Cilium. Looks similar to #23155.

Given these annotations:

apiVersion: v1
items:
- apiVersion: v1
  kind: Node
  metadata:
    annotations:
      cilium.io/bgp-virtual-router.65341: local-port=42424,router-id=192.168.178.201

In the Cilium logs a different port is selected by the VirtualRouter:

kind: CiliumBGPPeeringPolicy
metadata:
  name: mikro
spec:
  nodeSelector:
    matchLabels:
      kubernetes.io/hostname: X
  virtualRouters:
    - localASN: 65341
      neighbors:
        - peerASN: 65340
          peerAddress: 192.168.178.1/32
      serviceSelector:
        matchLabels:
          io.cilium.gateway/owning-gateway: tls-gateway

The router gives the following error when connecting on the port annotated on the node:
image

Cilium Version

cilium-cli: v0.13.2 compiled with go1.20.2 on linux/amd64
cilium image (default): v1.13.1
cilium image (stable): v1.13.1
cilium image (running): v1.13.1

Kernel Version

Linux steamroller2 5.15.0-69-generic #76-Ubuntu SMP Fri Mar 17 17:19:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.7+k3s1", GitCommit:"f7c20e237d0ad0eae83c1ce60d490da70dbddc0e", GitTreeState:"clean", BuildDate:"2023-03-10T22:16:07Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.7+k3s1", GitCommit:"f7c20e237d0ad0eae83c1ce60d490da70dbddc0e", GitTreeState:"clean", BuildDate:"2023-03-10T22:16:07Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}

Sysdump

No response

Relevant log output

level=info msg="type:STATE peer:{conf:{local_asn:65341 neighbor_address:\"192.168.178.1\" peer_asn:65340} state:{local_asn:65341 neighbor_address:\"192.168.178.1\" peer_asn:65340 session_state:IDLE router_id:\"192.168.178.1\"} transport:{local_address:\"192.168.178.201\" local_port:53903 remote_port:179}}"
level=info msg="type:STATE peer:{conf:{local_asn:65341 neighbor_address:\"192.168.178.1\" peer_asn:65340} state:{local_asn:65341 neighbor_address:\"192.168.178.1\" peer_asn:65340 session_state:ACTIVE router_id:\"192.168.178.1\"} transport:{local_address:\"192.168.178.201\" local_port:53903 remote_port:179}}"
level=info msg="type:STATE peer:{conf:{local_asn:65341 neighbor_address:\"192.168.178.1\" peer_asn:65340} state:{local_asn:65341 neighbor_address:\"192.168.178.1\" peer_asn:65340 session_state:OPENSENT router_id:\"192.168.178.1\"} transport:{local_address:\"192.168.178.201\" local_port:53405 remote_port:179}}"
level=info msg="type:STATE peer:{conf:{local_asn:65341 neighbor_address:\"192.168.178.1\" peer_asn:65340} state:{local_asn:65341 neighbor_address:\"192.168.178.1\" peer_asn:65340 session_state:OPENCONFIRM router_id:\"192.168.178.1\"} transport:{local_address:\"192.168.178.201\" local_port:53405 remote_port:179}}"
level=info msg="Peer Up" Key=192.168.178.1 State=BGP_FSM_OPENCONFIRM Topic=Peer asn=65341 component=gobgp.BgpServerInstance subsys=bgp-control-plane
level=info msg="type:STATE peer:{conf:{local_asn:65341 neighbor_address:\"192.168.178.1\" peer_asn:65340} state:{local_asn:65341 neighbor_address:\"192.168.178.1\" peer_asn:65340 session_state:ESTABLISHED router_id:\"192.168.178.1\"} transport:{local_address:\"192.168.178.201\" local_port:53405 remote_port:179}}"

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@CorneJB CorneJB added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Apr 4, 2023
@squeed squeed added area/bgp sig/agent Cilium agent related. and removed needs/triage This issue requires triaging to establish severity and next steps. labels Apr 4, 2023
@christarazi christarazi added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. and removed sig/agent Cilium agent related. labels May 16, 2023
@danehans
Copy link
Contributor

I have reproduced this bug with cilium-agent built from commit 7755f39. When local-port=179 is used to annotate nodes, the peers reach an ESTABLISHED state:

level=info msg="type:STATE peer:{conf:{local_asn:65341 neighbor_address:\"172.18.0.2\" peer_asn:65340} state:{local_asn:65341 neighbor_address:\"172.18.0.2\" peer_asn:65340 session_state:ESTABLISHED router_id:\"172.18.0.2\"} transport:{local_address:\"172.18.0.3\" local_port:36577 remote_port:179}}"

When the node annotation is local-port=42424, the peers are stuck in an ACTIVE state.

level=info msg="Cilium BGP Control Plane Controller woken for reconciliation" component=Controller.Run subsys=bgp-control-plane
level=debug msg="Successfully listed CiliumBGPPeeringPolicies" component=Controller.Reconcile count=2 subsys=bgp-control-plane
level=debug msg="Comparing BGP policy node selector with node's labels" component=PolicySelection nodeLabels="beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=kind-control-plane,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=" policyNodeSelector="kubernetes.io/hostname=kind-worker" subsys=bgp-control-plane
level=debug msg="Comparing BGP policy node selector with node's labels" component=PolicySelection nodeLabels="beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=kind-control-plane,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=" policyNodeSelector="kubernetes.io/hostname=kind-control-plane" subsys=bgp-control-plane
level=debug msg="Asking configured BGPRouterManager to configure peering" component=Controller.Reconcile subsys=bgp-control-plane
level=debug msg="Reconciling new CiliumBGPPeeringPolicy" component=manager.ConfigurePeers diff="Registering: [65340] Withdrawing: [] Reconciling: []" subsys=bgp-control-plane
level=info msg="Registering BGP servers for policy with local ASN 65340" component=manager.registerBGPServer subsys=bgp-control-plane
level=debug msg="Preflight for virtual router with ASN 65340 not necessary, first instantiation of this BgpServer." component=manager.preflightReconciler subsys=bgp-control-plane
level=debug msg="Begin reconciling peers for virtual router with local ASN 65340" component=manager.neighborReconciler subsys=bgp-control-plane
level=info msg="Reconciling peers for virtual router with local ASN 65340" component=manager.neighborReconciler subsys=bgp-control-plane
level=info msg="Adding peer 172.18.0.3/32 65341 to local ASN 65340" component=manager.neighborReconciler subsys=bgp-control-plane
level=info msg="Add a peer configuration" Key=172.18.0.3 Topic=Peer asn=65340 component=gobgp.BgpServerInstance subsys=bgp-control-plane
level=info msg="Done reconciling peers for virtual router with local ASN 65340" component=manager.neighborReconciler subsys=bgp-control-plane
level=debug msg="Begin reconciling pod CIDR advertisements for virtual router with local ASN 65340" component=manager.exportPodCIDRReconciler subsys=bgp-control-plane
level=debug msg="pod CIDR advertisements disabled for virtual router with local ASN 65340" component=manager.exportPodCIDRReconciler subsys=bgp-control-plane
level=info msg="Successfully registered GoBGP servers for policy with local ASN 65340" component=manager.registerBGPServer subsys=bgp-control-plane
level=debug msg="Successfully completed reconciliation" component=Controller.Run subsys=bgp-control-plane
level=debug msg="IdleHoldTimer expired" Duration=0 Key=172.18.0.3 Topic=Peer asn=65340 component=gobgp.BgpServerInstance subsys=bgp-control-plane
level=debug msg="state changed" Key=172.18.0.3 Topic=Peer asn=65340 component=gobgp.BgpServerInstance new=BGP_FSM_ACTIVE old=BGP_FSM_IDLE reason=idle-hold-timer-expired subsys=bgp-control-plane
level=info msg="type:STATE peer:{conf:{local_asn:65340 neighbor_address:\"172.18.0.3\" peer_asn:65341} state:{local_asn:65340 neighbor_address:\"172.18.0.3\" peer_asn:65341 session_state:IDLE router_id:\"<nil>\"} transport:{local_address:\"<nil>\"}}"
level=info msg="type:STATE peer:{conf:{local_asn:65340 neighbor_address:\"172.18.0.3\" peer_asn:65341} state:{local_asn:65340 neighbor_address:\"172.18.0.3\" peer_asn:65341 session_state:ACTIVE router_id:\"<nil>\"} transport:{local_address:\"<nil>\"}}"

This is b/c the speaker continues using port 179 to connect to the configured peer but the peer refuses the connection since it's listening on 42424:

level=debug msg="try to connect" Key=172.18.0.3 Topic=Peer asn=65340 component=gobgp.BgpServerInstance subsys=bgp-control-plane
level=debug msg="failed to connect" Error="dial tcp 0.0.0.0:0->172.18.0.3:179: connect: connection refused" Key=172.18.0.3 Topic=Peer asn=65340 component=gobgp.BgpServerInstance subsys=bgp-control-plane

It appears that CiliumBGPNeighbor requires a field to set the neighbor port when the neighbor listens on a port other than 179. Thoughts @squeed @christarazi?

@christarazi
Copy link
Member

cc @YutaroHayakawa @ldelossa

@YutaroHayakawa
Copy link
Member

YutaroHayakawa commented May 25, 2023

@danehans, I think your issue is different from the original one. I don't have any objection to make such a configuration knob. Please feel free to make an issue or submit the PR.

@danehans
Copy link
Contributor

@YutaroHayakawa thanks for the feedback. I created ^ to track the issue that I discovered while triaging this issue. I've been able to reproduce this issue by having one speaker use port 179 and the other speaker using port 42424:

Version:

$ kubectl exec po/cilium-t5qwd -c cilium-agent -n kube-system -- cilium version
Client: 1.13.1 a6be57eb 2023-03-15T19:39:01+01:00 go version go1.19.6 linux/arm64
Daemon: 1.13.1 a6be57eb 2023-03-15T19:39:01+01:00 go version go1.19.6 linux/arm64

Node annotations:

$ kubectl get node/kind-worker -o yaml | grep bgp-virtual-router
    cilium.io/bgp-virtual-router.65341: local-port=42424,router-id=172.18.0.3
$ kubectl get node/kind-control-plane -o yaml | grep bgp-virtual-router
    cilium.io/bgp-virtual-router.65340: local-port=179,router-id=172.18.0.2

Since node kind-control-plane is listening on port 179, the peers become established through a passive connection:

level=debug msg="Accepted a new passive connection" Key=172.18.0.3 Topic=Peer asn=65340 component=gobgp.BgpServerInstance subsys=bgp-control-plane

Logs from the cilium-agent running on node kind-worker indicate local_port:38253 instead of the configured local port of 42424:

level=info msg="type:STATE  peer:{conf:{local_asn:65341  neighbor_address:\"172.18.0.2\"  peer_asn:65340}  state:{local_asn:65341  neighbor_address:\"172.18.0.2\"  peer_asn:65340  session_state:OPENSENT  router_id:\"<nil>\"}  transport:{local_address:\"172.18.0.3\"  local_port:38253  remote_port:179}}"

@danehans
Copy link
Contributor

I see the same behavior as ^ using cilium-agent built from commit c8598f8.

@log1cb0mb
Copy link
Contributor

log1cb0mb commented May 26, 2023

Logs from the cilium-agent running on node kind-worker indicate local_port:38253 instead of the configured local port of 42424

I believe that is because of another issue where added cilium.io/bgp-virtual-router.X annotation to ciliumNode (as per my understand that is the correct method) gets removed if cilium-agent pod is restarted.

behaviour I mentioned above fixed in 1.13.3 and with the introduction of #24914

However configured port opening on node while cilium-agent still using random port is still there with 1.13.3

@danehans
Copy link
Contributor

@YutaroHayakawa I have reproduced the issue and now have a more complete understanding of the situation. The local-port defines the listening port of the BGPRouterManager but it does not define the source port used when a BgpServerInstance tries connecting to a peer. The 192.168.178.1 peer tries to connect to 192.168.178.201 on port 42424 and fails due to #25683. 192.168.178.201 then creates a passive connection to 192.168.178.1:179 using an ephemeral source port. I will work to fix this issue as part of #25683, so feel free to assign me.

Logs from my reproducer:

level=debug msg="try to connect" Key=172.18.0.3 Topic=Peer asn=65340 component=gobgp.BgpServerInstance subsys=bgp-control-plane
level=debug msg="failed to connect" Error="dial tcp 0.0.0.0:0->172.18.0.3:179: connect: connection refused" Key=172.18.0.3 Topic=Peer asn=65340 component=gobgp.BgpServerInstance subsys=bgp-control-plane
...
level=debug msg="Accepted a new passive connection" Key=172.18.0.3 Topic=Peer asn=65340 component=gobgp.BgpServerInstance subsys=bgp-control-plane
level=debug msg="stop connect loop" Key=172.18.0.3 Topic=Peer asn=65340 component=gobgp.BgpServerInstance subsys=bgp-control-plane

@danehans
Copy link
Contributor

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bgp kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants