Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

route missing upon pod restart #370

Open
drobinson123 opened this issue Apr 6, 2018 · 23 comments
Open

route missing upon pod restart #370

drobinson123 opened this issue Apr 6, 2018 · 23 comments
Labels
bug

Comments

@drobinson123
Copy link
Contributor

@drobinson123 drobinson123 commented Apr 6, 2018

When a kube-router pod is restarted it appears to lose at least one and sometimes several routes at either shutdown or startup. I have several k8s clusters, both using kube-router v0.1.0, and both have encountered this problem over the past few days. CoreOS auto-updates cause the host, and therefore kube-router, to restart -- that seems to trigger the problem.

The host with the missing routes is ip-172-21-91-59.us-west-2.compute.internal -- it auto-updated to CoreOS 1688.5.3. Before the update it was running CoreOS 1.6.32.2.1, the same version as other hosts.

Here's the cluster after the node auto-updated (note the last line, it's the host that updated):

$ kubectl get nodes -o wide
NAME                                          STATUS    ROLES     AGE       VERSION   EXTERNAL-IP   OS-IMAGE                                        KERNEL-VERSION   CONTAINER-RUNTIME
ip-172-21-46-101.us-west-2.compute.internal   Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1632.2.1 (Ladybug)    4.14.16-coreos   docker://17.9.1
ip-172-21-55-253.us-west-2.compute.internal   Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1632.2.1 (Ladybug)    4.14.16-coreos   docker://17.9.1
ip-172-21-58-9.us-west-2.compute.internal     Ready     master    24d       v1.8.9    <none>        Container Linux by CoreOS 1632.3.0 (Ladybug)    4.14.19-coreos   docker://17.9.1
ip-172-21-62-29.us-west-2.compute.internal    Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1632.2.1 (Ladybug)    4.14.16-coreos   docker://17.9.1
ip-172-21-71-225.us-west-2.compute.internal   Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1632.2.1 (Ladybug)    4.14.16-coreos   docker://17.9.1
ip-172-21-91-59.us-west-2.compute.internal    Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1688.5.3 (Rhyolite)   4.14.32-coreos   docker://17.12.1-ce

k8s shows the kube-router pod restarted (the first line):

$ kubectl -n kube-system get pods | grep kube-router
kube-router-2gjg5    1/1       Running   1          23h       172.21.91.59    ip-172-21-91-59.us-west-2.compute.internal
kube-router-9qrbn    1/1       Running   0          23h       172.21.62.29    ip-172-21-62-29.us-west-2.compute.internal
kube-router-lqnh5    1/1       Running   0          23h       172.21.71.225   ip-172-21-71-225.us-west-2.compute.internal
kube-router-ntkwq    1/1       Running   0          23h       172.21.46.101   ip-172-21-46-101.us-west-2.compute.internal
kube-router-q4fhq    1/1       Running   0          23h       172.21.58.9     ip-172-21-58-9.us-west-2.compute.internal
kube-router-vfqw2    1/1       Running   0          23h       172.21.55.253   ip-172-21-55-253.us-west-2.compute.internal

After being updated (and kube-router restarting, obviously) the host's routing table looked like this:

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.21.64.1     0.0.0.0         UG    1024   0        0 eth0
100.96.5.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-1722146101
100.96.6.0      172.21.71.225   255.255.255.0   UG    0      0        0 eth0
100.96.7.0      0.0.0.0         255.255.255.0   U     0      0        0 kube-bridge
100.96.8.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-1722155253
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.21.64.0     0.0.0.0         255.255.224.0   U     0      0        0 eth0
172.21.64.1     0.0.0.0         255.255.255.255 UH    1024   0        0 eth0

It's missing routes for 100.96.0.0 and 100.96.9.0.

Deleting the kube-router pod causes a new pod to start, and it fixes the routing table, which now looks like this:

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.21.64.1     0.0.0.0         UG    1024   0        0 eth0
100.96.0.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-17221589
100.96.5.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-1722146101
100.96.6.0      172.21.71.225   255.255.255.0   UG    0      0        0 eth0
100.96.7.0      0.0.0.0         255.255.255.0   U     0      0        0 kube-bridge
100.96.8.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-1722155253
100.96.9.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-172216229
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.21.64.0     0.0.0.0         255.255.224.0   U     0      0        0 eth0
172.21.64.1     0.0.0.0         255.255.255.255 UH    1024   0        0 eth0

I neglected to fetch the BGP peering info when the problem occurred this time, however when the problem occurred several days ago I did check BGP and nodes were peering correctly (there are snippets from the previous occurrence in #kube-router).

Since this problem occurred in one of our clusters, I disabled auto-updates there but left them enabled in our second cluster, for debugging. I increased kube-router's log verbosity (--v=3) anticipating it would occur in our second cluster, which it did. The attached logs are from the pod (kube-router-2gjg5) before the host restarted, the same pod after it restarted, and logs from the new pod (the old pod was deleted, the new pod fixed the routes).

@drobinson123

This comment has been minimized.

Copy link
Contributor Author

@drobinson123 drobinson123 commented Apr 7, 2018

The logs don't show anything out of the ordinary. OS logs also don't show anything abnormal. It looks like RouteReplace silently failed.

I've patch kube-router to confirm/deny my theory:

--- app/controllers/network_routes_controller.go
+++ app/controllers/network_routes_controller.go
@@ -921,7 +921,20 @@ func (nrc *NetworkRoutingController) injectRoute(path *table.Path) error {
                return netlink.RouteDel(route)
        }
        glog.V(2).Infof("Inject route: '%s via %s' from peer to routing table", dst, nexthop)
-       return netlink.RouteReplace(route)
+       err := netlink.RouteReplace(route)
+       if err != nil {
+               return err
+       }
+
+       // Verify route was injected
+       out, err := exec.Command("ip", "route", "list", "table", "0").CombinedOutput()
+       if err != nil {
+               return fmt.Errorf("Failed to verify if route was created: %s", err.Error())
+       }
+       if !strings.Contains(string(out), dst.String()) {
+               panic("Failed to inject route, reason unknown")
+       }
+       return err
 }
 
 // Cleanup performs the cleanup of configurations done
@jpds

This comment has been minimized.

Copy link
Contributor

@jpds jpds commented Apr 9, 2018

My nodes are originally running 1632.3.0, most of them upgraded to 1688.5.3. On the nodes where the upgrade happened, I can see that kube-router restarted once:

kube-system                kube-router-5pb56                                                     1/1       Running            0          12d
kube-system                kube-router-cbw25                                                     1/1       Running            1          13d
kube-system                kube-router-fvpjb                                                     1/1       Running            1          2d
kube-system                kube-router-p9r2g                                                     1/1       Running            1          13d
kube-system                kube-router-q7hbw                                                     1/1       Running            1          2d
kube-system                kube-router-stjc6                                                     1/1       Running            0          12d

Currently log for one of those pods:

kubectl logs -n  kube-system                kube-router-fvpjb
I0408 02:18:30.081462       1 server.go:210] Running /usr/local/bin/kube-router version v0.1.0, built on 2018-03-19, go1.8.7
I0408 02:18:30.124928       1 health_controller.go:127] Starting health controller
I0408 02:18:30.208561       1 network_policy_controller.go:108] Starting network policy controller
E0408 02:18:30.588044       1 network_routes_controller.go:982] Node does not have necessary IAM creds to modify instance attribute. So skipping disabling src-dst check.
I0408 02:18:30.879465       1 network_routes_controller.go:200] Starting network route controller
I0408 02:18:30.892221       1 network_services_controller.go:109] Starting network services controller
I0408 02:18:30.893882       1 network_routes_controller.go:1429] Could not find BGP peer info for the node in the node annotations so skipping configuring peer.
time="2018-04-08T02:18:31Z" level=info msg="Add a peer configuration for:172.20.119.209" Topic=Peer
time="2018-04-08T02:18:31Z" level=info msg="Add a peer configuration for:172.20.127.66" Topic=Peer
time="2018-04-08T02:18:31Z" level=info msg="Add a peer configuration for:172.20.54.132" Topic=Peer
time="2018-04-08T02:18:31Z" level=info msg="Add a peer configuration for:172.20.59.123" Topic=Peer
time="2018-04-08T02:18:31Z" level=info msg="Add a peer configuration for:172.20.85.60" Topic=Peer
time="2018-04-08T02:18:41Z" level=info msg="Peer Up" Key=172.20.85.60 State=BGP_FSM_OPENCONFIRM Topic=Peer
time="2018-04-08T02:18:41Z" level=info msg="Peer Up" Key=172.20.54.132 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0408 02:18:41.701074       1 network_routes_controller.go:846] Found node: 172.20.54.132 to be in different subnet.
I0408 02:18:41.701118       1 network_routes_controller.go:869] Found node: 172.20.54.132 to be in different subnet. Creating tunnel: tun-1722054132
time="2018-04-08T02:18:45Z" level=info msg="Peer Up" Key=172.20.127.66 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0408 02:18:45.706730       1 network_routes_controller.go:846] Found node: 172.20.127.66 to be in different subnet.
I0408 02:18:45.706783       1 network_routes_controller.go:869] Found node: 172.20.127.66 to be in different subnet. Creating tunnel: tun-1722012766
time="2018-04-08T02:18:46Z" level=info msg="Peer Up" Key=172.20.119.209 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0408 02:18:46.700487       1 network_routes_controller.go:846] Found node: 172.20.119.209 to be in different subnet.
I0408 02:18:46.700548       1 network_routes_controller.go:869] Found node: 172.20.119.209 to be in different subnet. Creating tunnel: tun-17220119209
time="2018-04-08T02:18:48Z" level=info msg="Peer Up" Key=172.20.59.123 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0408 02:18:48.701194       1 network_routes_controller.go:846] Found node: 172.20.59.123 to be in different subnet.
I0408 02:18:48.701248       1 network_routes_controller.go:869] Found node: 172.20.59.123 to be in different subnet. Creating tunnel: tun-1722059123
time="2018-04-08T10:10:03Z" level=warning msg="received notification" Code=6 Communicated-Reason= Data="[]" Key=172.20.59.123 Subcode=2 Topic=Peer
time="2018-04-08T10:10:03Z" level=info msg="Peer Down" Key=172.20.59.123 Reason="notification-received code 6(cease) subcode 2(administrative shutdown)" State=BGP_FSM_ESTABLISHED Topic=Peer
I0408 10:10:03.097344       1 network_routes_controller.go:846] Found node: 172.20.59.123 to be in different subnet.
I0408 10:10:03.097436       1 network_routes_controller.go:891] Tunnel interface: tun-1722059123 for the node 172.20.59.123 already exists.
E0408 10:10:03.099824       1 network_routes_controller.go:360] Failed to inject routes due to: no such process
time="2018-04-08T10:11:15Z" level=info msg="Peer Up" Key=172.20.59.123 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0408 10:11:15.994729       1 network_routes_controller.go:846] Found node: 172.20.59.123 to be in different subnet.
I0408 10:11:15.994834       1 network_routes_controller.go:891] Tunnel interface: tun-1722059123 for the node 172.20.59.123 already exists.
time="2018-04-09T00:00:47Z" level=info msg="Peer Down" Key=172.20.54.132 Reason=read-failed State=BGP_FSM_ESTABLISHED Topic=Peer
I0409 00:00:47.192928       1 network_routes_controller.go:846] Found node: 172.20.54.132 to be in different subnet.
I0409 00:00:47.193014       1 network_routes_controller.go:891] Tunnel interface: tun-1722054132 for the node 172.20.54.132 already exists.
E0409 00:00:47.196363       1 network_routes_controller.go:360] Failed to inject routes due to: no such process
time="2018-04-09T00:02:49Z" level=info msg="Peer Up" Key=172.20.54.132 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0409 00:02:49.343984       1 network_routes_controller.go:846] Found node: 172.20.54.132 to be in different subnet.
I0409 00:02:49.344085       1 network_routes_controller.go:891] Tunnel interface: tun-1722054132 for the node 172.20.54.132 already exists.

Previous log for that pod pre-restart:

kubectl logs -n  kube-system                kube-router-fvpjb --previous
I0406 08:57:24.557313       1 server.go:210] Running /usr/local/bin/kube-router version v0.1.0, built on 2018-03-19, go1.8.7
I0406 08:57:24.581970       1 health_controller.go:127] Starting health controller
I0406 08:57:24.617581       1 network_policy_controller.go:108] Starting network policy controller
E0406 08:57:24.784889       1 network_routes_controller.go:96] Failed to get pod CIDR from CNI conf file: subnet missing from CNI IPAM
E0406 08:57:24.900070       1 network_routes_controller.go:982] Node does not have necessary IAM creds to modify instance attribute. So skipping disabling src-dst check.
I0406 08:57:24.949946       1 network_routes_controller.go:200] Starting network route controller
I0406 08:57:25.163223       1 network_routes_controller.go:1429] Could not find BGP peer info for the node in the node annotations so skipping configuring peer.
I0406 08:57:25.362912       1 network_services_controller.go:109] Starting network services controller
time="2018-04-06T08:57:25Z" level=info msg="Can't find configuration for a new passive connection from:172.20.94.127" Topic=Peer
time="2018-04-06T08:57:26Z" level=info msg="Add a peer configuration for:172.20.119.209" Topic=Peer
time="2018-04-06T08:57:26Z" level=info msg="Add a peer configuration for:172.20.127.66" Topic=Peer
time="2018-04-06T08:57:26Z" level=info msg="Add a peer configuration for:172.20.43.174" Topic=Peer
time="2018-04-06T08:57:26Z" level=info msg="Add a peer configuration for:172.20.54.132" Topic=Peer
time="2018-04-06T08:57:26Z" level=info msg="Add a peer configuration for:172.20.85.60" Topic=Peer
time="2018-04-06T08:57:26Z" level=info msg="Add a peer configuration for:172.20.94.127" Topic=Peer
time="2018-04-06T08:57:28Z" level=info msg="Peer Up" Key=172.20.54.132 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0406 08:57:28.386943       1 network_routes_controller.go:846] Found node: 172.20.54.132 to be in different subnet.
I0406 08:57:28.386994       1 network_routes_controller.go:869] Found node: 172.20.54.132 to be in different subnet. Creating tunnel: tun-1722054132
time="2018-04-06T08:57:28Z" level=info msg="Peer Up" Key=172.20.127.66 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0406 08:57:28.422376       1 network_routes_controller.go:846] Found node: 172.20.127.66 to be in different subnet.
I0406 08:57:28.422415       1 network_routes_controller.go:869] Found node: 172.20.127.66 to be in different subnet. Creating tunnel: tun-1722012766
time="2018-04-06T08:57:30Z" level=info msg="Peer Up" Key=172.20.119.209 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0406 08:57:30.395228       1 network_routes_controller.go:846] Found node: 172.20.119.209 to be in different subnet.
I0406 08:57:30.395283       1 network_routes_controller.go:869] Found node: 172.20.119.209 to be in different subnet. Creating tunnel: tun-17220119209
time="2018-04-06T08:57:31Z" level=info msg="Peer Up" Key=172.20.85.60 State=BGP_FSM_OPENCONFIRM Topic=Peer
time="2018-04-06T08:57:34Z" level=info msg="Peer Up" Key=172.20.43.174 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0406 08:57:34.391626       1 network_routes_controller.go:846] Found node: 172.20.43.174 to be in different subnet.
I0406 08:57:34.391692       1 network_routes_controller.go:869] Found node: 172.20.43.174 to be in different subnet. Creating tunnel: tun-1722043174
time="2018-04-06T08:57:42Z" level=info msg="Peer Up" Key=172.20.94.127 State=BGP_FSM_OPENCONFIRM Topic=Peer
time="2018-04-06T09:14:07Z" level=info msg="Peer Down" Key=172.20.94.127 Reason=read-failed State=BGP_FSM_ESTABLISHED Topic=Peer
I0406 09:14:44.910594       1 network_routes_controller.go:1338] Received node 172.20.94.127 removed update from watch API, so remove node from peer
time="2018-04-06T09:14:44Z" level=warning msg="failed to unset md5: setsockopt: no such file or directory" Key=172.20.94.127 Topic=Peer
time="2018-04-06T09:14:44Z" level=info msg="Delete a peer configuration for:172.20.94.127" Topic=Peer
E0406 09:14:44.950981       1 network_routes_controller.go:982] Node does not have necessary IAM creds to modify instance attribute. So skipping disabling src-dst check.
time="2018-04-06T13:41:20Z" level=info msg="Add a peer configuration for:172.20.59.123" Topic=Peer
E0406 13:41:20.616436       1 network_routes_controller.go:982] Node does not have necessary IAM creds to modify instance attribute. So skipping disabling src-dst check.
time="2018-04-06T13:41:50Z" level=info msg="Peer Up" Key=172.20.59.123 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0406 13:41:50.917243       1 network_routes_controller.go:846] Found node: 172.20.59.123 to be in different subnet.
I0406 13:41:50.917436       1 network_routes_controller.go:869] Found node: 172.20.59.123 to be in different subnet. Creating tunnel: tun-1722059123
time="2018-04-06T13:43:34Z" level=warning msg="hold timer expired" Key=172.20.43.174 State=BGP_FSM_ESTABLISHED Topic=Peer
time="2018-04-06T13:43:34Z" level=info msg="Peer Down" Key=172.20.43.174 Reason=hold-timer-expired State=BGP_FSM_ESTABLISHED Topic=Peer
time="2018-04-06T13:43:34Z" level=warning msg="sent notification" Code=4 Data="[]" Key=172.20.43.174 State=BGP_FSM_ESTABLISHED Subcode=0 Topic=Peer
I0406 13:43:34.395002       1 network_routes_controller.go:846] Found node: 172.20.43.174 to be in different subnet.
I0406 13:43:34.395111       1 network_routes_controller.go:891] Tunnel interface: tun-1722043174 for the node 172.20.43.174 already exists.
time="2018-04-06T13:47:51Z" level=warning msg="sent notification" Data="hold timer expired" Key=172.20.43.174 Topic=Peer
time="2018-04-06T13:52:07Z" level=warning msg="sent notification" Data="hold timer expired" Key=172.20.43.174 Topic=Peer
time="2018-04-06T13:56:28Z" level=warning msg="sent notification" Data="hold timer expired" Key=172.20.43.174 Topic=Peer
time="2018-04-06T14:00:48Z" level=warning msg="sent notification" Data="hold timer expired" Key=172.20.43.174 Topic=Peer
time="2018-04-06T14:05:10Z" level=warning msg="sent notification" Data="hold timer expired" Key=172.20.43.174 Topic=Peer
time="2018-04-06T14:09:34Z" level=warning msg="sent notification" Data="hold timer expired" Key=172.20.43.174 Topic=Peer
time="2018-04-06T14:13:53Z" level=warning msg="sent notification" Data="hold timer expired" Key=172.20.43.174 Topic=Peer
time="2018-04-06T14:18:11Z" level=warning msg="sent notification" Data="hold timer expired" Key=172.20.43.174 Topic=Peer
I0406 14:22:26.933913       1 network_routes_controller.go:1338] Received node 172.20.43.174 removed update from watch API, so remove node from peer
time="2018-04-06T14:22:26Z" level=warning msg="failed to unset md5: setsockopt: no such file or directory" Key=172.20.43.174 Topic=Peer
time="2018-04-06T14:22:26Z" level=info msg="Delete a peer configuration for:172.20.43.174" Topic=Peer
time="2018-04-06T14:22:27Z" level=warning msg="sent notification" Data= Key=172.20.43.174 Topic=Peer
time="2018-04-06T14:22:27Z" level=warning msg="Cant't find the neighbor 172.20.43.174" Topic=Peer
E0406 14:22:28.718335       1 network_routes_controller.go:982] Node does not have necessary IAM creds to modify instance attribute. So skipping disabling src-dst check.
time="2018-04-08T00:37:33Z" level=info msg="Peer Down" Key=172.20.127.66 Reason=read-failed State=BGP_FSM_ESTABLISHED Topic=Peer
I0408 00:37:33.757334       1 network_routes_controller.go:846] Found node: 172.20.127.66 to be in different subnet.
I0408 00:37:33.757773       1 network_routes_controller.go:891] Tunnel interface: tun-1722012766 for the node 172.20.127.66 already exists.
time="2018-04-08T00:39:41Z" level=info msg="Peer Up" Key=172.20.127.66 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0408 00:39:41.091872       1 network_routes_controller.go:846] Found node: 172.20.127.66 to be in different subnet.
I0408 00:39:41.091960       1 network_routes_controller.go:891] Tunnel interface: tun-1722012766 for the node 172.20.127.66 already exists.
I0408 02:17:36.786929       1 server.go:195] Shutting down the controllers
I0408 02:17:36.787013       1 health_controller.go:154] Shutting down health controller
I0408 02:17:36.791024       1 network_services_controller.go:146] Shutting down network services controller
I0408 02:17:36.792630       1 network_routes_controller.go:270] Shutting down network routes controller
I0408 02:17:36.792664       1 network_policy_controller.go:133] Shutting down network policies controller

On the nodes with the update, pods can't seem to do DNS queries but they can ping 8.8.8.8

@jpds

This comment has been minimized.

Copy link
Contributor

@jpds jpds commented Apr 9, 2018

Deleting the kube-router pods appears to have helped with the issue in my case.

@murali-reddy murali-reddy added the bug label Apr 9, 2018
@jpds

This comment has been minimized.

Copy link
Contributor

@jpds jpds commented Apr 10, 2018

I'm also seeing this on new nodes which spin up on 1688.5.3 and join the cluster, all outbound traffic fails (particularly CI jobs )and restarting kube-router makes it work.

@murali-reddy

This comment has been minimized.

Copy link
Member

@murali-reddy murali-reddy commented Apr 10, 2018

I've patch kube-router to confirm/deny my theory:

@drobinson123 any luck?

@murali-reddy

This comment has been minimized.

Copy link
Member

@murali-reddy murali-reddy commented Apr 10, 2018

From the logs @jpds shared there is atleast error indicating what might be happening.

I0408 10:10:03.097436       1 network_routes_controller.go:891] Tunnel interface: tun-1722059123 for the node 172.20.59.123 already exists.
E0408 10:10:03.099824       1 network_routes_controller.go:360] Failed to inject routes due to: no such process

error message no such process not very useful, but apparently this is what it means

https://serverfault.com/questions/437598/what-can-cause-rtnetlink-answers-no-such-process-when-adding-a-route

@rmenn

This comment has been minimized.

Copy link
Contributor

@rmenn rmenn commented Apr 17, 2018

perhaps this is what is causing the regression coreos/bugs#2407 ? coreos has opened up an issue here : systemd/systemd#7940 we are downgrading to 1576.5.0 to concretely identify the issue, we have disabled the update operator on the cluster so that we do not get upgraded.

@sourcejedi

This comment has been minimized.

Copy link

@sourcejedi sourcejedi commented Apr 17, 2018

fwiw systemd/systemd#7940 is not a bug report, it's a pull request. I haven't seen any systemd bug filed for the coreos bug you linked, instead it seems to be being resolved inside the affected coreos documentation. (And I don't what the connection would be myself, so I'm not leaping into action following this comment :).

@lucab

This comment has been minimized.

Copy link

@lucab lucab commented Apr 18, 2018

@rmenn I honestly don't see any correlation between the two tickets. If you have any specific elements that I am missing, please make them explicit.

@sourcejedi the specific behavior regressed, but I think there is some consensus that the new behavior is somewhat safer and people should stick with it. We just updated our docs and posted a notice about that.

@rmenn

This comment has been minimized.

Copy link
Contributor

@rmenn rmenn commented Apr 19, 2018

@sourcejedi im sorry, it is indeed a PR, i was just trying to document what i was finding, im sure the issues are else where the only thing i found was the /lib/modules which are common, nothing else

@lucab same as i mentioned above, the only thing i found was the /lib/modules which are common, nothing else

@drobinson123

This comment has been minimized.

Copy link
Contributor Author

@drobinson123 drobinson123 commented Apr 27, 2018

RouteReplace is silently failing, kube-router panicked because the route was not added:

time="2018-04-27T19:14:35Z" level=info msg="Peer Up" Key=172.21.71.225 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0427 19:14:35.909090       1 network_routes_controller.go:351] Processing bgp route advertisement from peer
I0427 19:14:35.909117       1 network_routes_controller.go:923] Inject route: '100.96.6.0/24 via 172.21.71.225' from peer to routing table
time="2018-04-27T19:14:35Z" level=info msg="Peer Up" Key=172.21.55.253 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0427 19:14:35.913898       1 network_routes_controller.go:351] Processing bgp route advertisement from peer
I0427 19:14:35.913924       1 network_routes_controller.go:846] Found node: 172.21.55.253 to be in different subnet.
I0427 19:14:35.913966       1 network_routes_controller.go:869] Found node: 172.21.55.253 to be in different subnet. Creating tunnel: tun-1722155253
I0427 19:14:35.922869       1 network_routes_controller.go:923] Inject route: '100.96.8.0/24 via 172.21.55.253' from peer to routing table
panic: Failed to inject route, reason unknown

goroutine 265 [running]:
github.com/cloudnativelabs/kube-router/app/controllers.(*NetworkRoutingController).injectRoute(0xc420a62140, 0xc420d0c280, 0x0, 0x0)
        /home/drobinson/go/src/github.com/cloudnativelabs/kube-router/app/controllers/network_routes_controller.go:935 +0x16a6
github.com/cloudnativelabs/kube-router/app/controllers.(*NetworkRoutingController).watchBgpUpdates(0xc420a62140)
        /home/drobinson/go/src/github.com/cloudnativelabs/kube-router/app/controllers/network_routes_controller.go:359 +0x1ff
created by github.com/cloudnativelabs/kube-router/app/controllers.(*NetworkRoutingController).startBgpServer
        /home/drobinson/go/src/github.com/cloudnativelabs/kube-router/app/controllers/network_routes_controller.go:1434 +0x4fe

When the pod was restarted (by k8s, after the panic) routes were still not setup correctly, but it failed somewhere else (it did not panic again). Here's what the routing table looks like in its broken state -- it's missing a route for 100.96.10.0:

ip-172-21-95-228 ~ # route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.21.64.1     0.0.0.0         UG    1024   0        0 eth0
100.96.0.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-17221589
100.96.6.0      172.21.71.225   255.255.255.0   UG    0      0        0 eth0
100.96.7.0      172.21.91.59    255.255.255.0   UG    0      0        0 eth0
100.96.8.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-1722155253
100.96.9.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-172216229
100.96.11.0     0.0.0.0         255.255.255.0   U     0      0        0 kube-bridge
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.21.64.0     0.0.0.0         255.255.224.0   U     0      0        0 eth0
172.21.64.1     0.0.0.0         255.255.255.255 UH    1024   0        0 eth0

But the weird thing is that the kube-router logs suggest the route was created (it did not panic):

time="2018-04-27T19:15:02Z" level=info msg="Peer Up" Key=172.21.39.16 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0427 19:15:02.635221       1 network_routes_controller.go:351] Processing bgp route advertisement from peer
I0427 19:15:02.635249       1 network_routes_controller.go:846] Found node: 172.21.39.16 to be in different subnet.
I0427 19:15:02.635309       1 network_routes_controller.go:869] Found node: 172.21.39.16 to be in different subnet. Creating tunnel: tun-172213916
I0427 19:15:02.639052       1 network_routes_controller.go:923] Inject route: '100.96.10.0/24 via 172.21.39.16' from peer to routing table
I0427 19:15:03.893404       1 health_controller.go:71] Received heartbeat from MC

The tun device for the route was created:

# ip addr | grep tun
8: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
9: tun-172216229@eth0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 8961 qdisc noqueue state UNKNOWN group default qlen 1000
10: tun-1722155253@eth0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 8961 qdisc noqueue state UNKNOWN group default qlen 1000
11: tun-172213916@eth0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 8961 qdisc noqueue state UNKNOWN group default qlen 1000
12: tun-17221589@eth0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 8961 qdisc noqueue state UNKNOWN group default qlen 1000

And there's no weird messages in the journal:

2018-04-27T19:15:02.637373+0000 ip-172-21-95-228.us-west-2.compute.internal systemd-networkd[587]: tun-172213916: Gained carrier
2018-04-27T19:15:02.637794+0000 ip-172-21-95-228.us-west-2.compute.internal systemd-timesyncd[644]: Network configuration changed, trying to establish connection.
2018-04-27T19:15:02.637910+0000 ip-172-21-95-228.us-west-2.compute.internal systemd-networkd[587]: tun-172213916: Gained IPv6LL
2018-04-27T19:15:02.645192+0000 ip-172-21-95-228.us-west-2.compute.internal systemd-udevd[3906]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
2018-04-27T19:15:02.651370+0000 ip-172-21-95-228.us-west-2.compute.internal systemd-networkd[587]: docker0: Link is not managed by us
2018-04-27T19:15:02.651386+0000 ip-172-21-95-228.us-west-2.compute.internal systemd-networkd[587]: veth2d3c880a: Link is not managed by us
2018-04-27T19:15:02.712232+0000 ip-172-21-95-228.us-west-2.compute.internal kubelet[967]: I0427 19:15:02.712193     967 aws.go:1051] Could not determine public DNS from AWS metadata.
2018-04-27T19:15:02.725092+0000 ip-172-21-95-228.us-west-2.compute.internal systemd-timesyncd[644]: Synchronized to time server 69.164.213.136:123 (1.coreos.pool.ntp.org).
2018-04-27T19:15:04.639910+0000 ip-172-21-95-228.us-west-2.compute.internal systemd-udevd[3928]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
2018-04-27T19:15:04.642573+0000 ip-172-21-95-228.us-west-2.compute.internal systemd-networkd[587]: tun-17221589: Gained carrier
2018-04-27T19:15:04.643029+0000 ip-172-21-95-228.us-west-2.compute.internal systemd-timesyncd[644]: Network configuration changed, trying to establish connection.
2018-04-27T19:15:04.643068+0000 ip-172-21-95-228.us-west-2.compute.internal systemd-networkd[587]: tun-17221589: Gained IPv6LL
2018-04-27T19:15:04.648281+0000 ip-172-21-95-228.us-west-2.compute.internal systemd-networkd[587]: docker0: Link is not managed by us
2018-04-27T19:15:04.648298+0000 ip-172-21-95-228.us-west-2.compute.internal systemd-networkd[587]: veth2d3c880a: Link is not managed by us
2018-04-27T19:15:04.730570+0000 ip-172-21-95-228.us-west-2.compute.internal systemd-timesyncd[644]: Synchronized to time server 69.164.213.136:123 (1.coreos.pool.ntp.org).

This seems like it may be a kernel bug.

@murali-reddy

This comment has been minimized.

Copy link
Member

@murali-reddy murali-reddy commented Apr 29, 2018

only other thing that is in between kube-router and kernel is netlink library. There is something wierd I noticed as well with netlink RouteReplace when testing out a patch that uses netlink.RouteReplace. Library method call does not fail, but route is not replaced either. Got to look at the implementation to see whats happening.

@murali-reddy

This comment has been minimized.

Copy link
Member

@murali-reddy murali-reddy commented May 18, 2018

So me and @roffe have been testing this issue independently. I was using below image which is latest core os stable release image.

  "us-west-2": {
    "hvm": "ami-b41377cc",
    "pv": "ami-f81c7880"
  }

I ran ip monitor command while the node is booting up i.e) first time kube-router is starting.
I see that route get added by subsequently deleted.

172.20.89.123 dev tun-1722089123 table 77 scope link
100.96.1.0/24 dev tun-1722089123 proto 17 src 172.20.52.43
Deleted 100.96.1.0/24 dev tun-1722089123 proto 17 src 172.20.52.43
100.96.0.0/24 via 172.20.34.210 dev eth0 proto 17
100.96.2.0/24 via 172.20.33.115 dev eth0 proto 17

Once i restart the kube-router pod on the node route gets added but is not deleted.

172.20.33.115 dev eth0 lladdr 06:8d:17:26:d1:f8 REACHABLE
100.96.1.0/24 dev tun-1722089123 proto 17 src 172.20.52.43
172.20.33.115 dev eth0 lladdr 06:8d:17:26:d1:f8 STALE

So clerly there is netlink call to delete the route. Now question is who is removing the entry?

I will investigate further to trace the source.

@benbooth493

This comment has been minimized.

Copy link

@benbooth493 benbooth493 commented Jun 7, 2018

I've been hit by the same issue running CoreOS ContainerLinux.
I'm on kernel 4.14.44
I tried changing to the Alpha channel which bumps the kernel to 4.16.13 but still experiencing problems.

From discussions on Slack it has become apparent that many distributions running similar kernel versions are also affected.

Any news on current investigations?

@roffe

This comment has been minimized.

Copy link
Collaborator

@roffe roffe commented Jun 7, 2018

Im my tests when trying different OS'es and kernels I was more or less able to pin it down to any kernel after 4.14.11, after that the problem starts manifesting, however i have not been able to find any explanation to why route gets deleted as observed in ip monitor

@roffe

This comment has been minimized.

Copy link
Collaborator

@roffe roffe commented Jun 7, 2018

( i tried all new beta and alpha CoreOS channels, only older releases work)

@bush-niel

This comment has been minimized.

Copy link

@bush-niel bush-niel commented Jun 25, 2018

Hey just a hunch. Do you think this is due to systemd-networkd.service interfering and trying to modify the routes?! If you run a networkctl command (to introspect the state of the network link as seen by systemd-networkd) you will understand why I am suspecting this.

Also as a work around do you think kube-router would require files likes this (/usr/lib/systemd/network/50-flannel.network) to leave the tunnel interfaces unmanaged by systemd

But I am not sure what has changed in the newer kernels that is causing this issue

@murali-reddy

This comment has been minimized.

Copy link
Member

@murali-reddy murali-reddy commented Jun 26, 2018

Do you think this is due to systemd-networkd.service interfering and trying to modify the routes?! If you run a networkctl command (to introspect the state of the network link as seen by systemd-networkd) you will understand why I am suspecting this.

@bush-niel Not sure about this, but can try out. If there is any such dependency may by Kubelet should have already been started after networkd?

@tasdikrahman

This comment has been minimized.

Copy link

@tasdikrahman tasdikrahman commented Jun 26, 2018

If there is any such dependency may by Kubelet should have already been started after networkd?

@murali-reddy didn't get that, sorry if it's something silly. Could you elaborate more on it please, if you don't mind?

@lomkju

This comment has been minimized.

Copy link

@lomkju lomkju commented Feb 15, 2019

@murali-reddy we solved this using @bush-niel solution. We are using CoreOS

Container Linux by CoreOS 1967.6.0 (Rhyolite)
Kernel: 4.14.96-coreos-r1

We created a file 50-kube-router.network in the directory /etc/systemd/network/ with the contents

[Match]
Name=tun* kube-bridge kube-dummy-if

[Link]
Unmanaged=yes

It stopped all the race condition caused by networkd.
You can replicate the problem by using the above ami and having kube-router a ds.
You can use this command to find the race condition errors cause even networkd is trying to manage the interfaces. journalctl -u systemd-networkd. Also when you use the command sudo systemctl restart systemd-networkd networkd closes all the tunnels. After adding the networkd file and typing sudo systemctl restart systemd-networkd doesn't close the tunnel. You will need to restart networkd after adding the file.

@tasdikrahman

This comment has been minimized.

Copy link

@tasdikrahman tasdikrahman commented Feb 15, 2019

Amazing! Great job guys @lomkju @bush-niel :)

@murali-reddy

This comment has been minimized.

Copy link
Member

@murali-reddy murali-reddy commented Feb 17, 2019

thanks @bush-niel @tasdikrahman @lomkju for finding this out.

Any reason what changed in the kernel due to which this race condition is occuring?

@lomkju

This comment has been minimized.

Copy link

@lomkju lomkju commented Feb 17, 2019

@murali-reddy networkd still throws errors on older distribution/kernels but for some reason it didn't affect it as much, I haven't seen the kernel diff, I don't think it is something related to the kernel. It is configuration problem. We need to update the docs to ask people to create the above file. Also coreos already had a flannel file included in it to ignore flannel links.

Feb 14 14:25:03 ip-10-1-111-191 systemd-networkd[730]: tunl0: Cannot configure proxy NDP for interfac
Feb 14 14:25:03 ip-10-1-111-191 systemd-networkd[730]: tunl0: Cannot configure IPv6 privacy extension
Feb 14 14:25:03 ip-10-1-111-191 systemd-networkd[730]: tunl0: Cannot disable kernel IPv6 accept_ra fo
Feb 14 14:25:03 ip-10-1-111-191 systemd-networkd[730]: tunl0: Cannot enable IPv6 for interface tunl0:
Feb 14 14:25:03 ip-10-1-111-191 systemd-networkd[730]: tunl0: Could not bring up interface: Address f

This doc helped me alot. https://www.freedesktop.org/software/systemd/man/systemd-networkd.service.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.