Ubuntu version: instance does not work after vagrant halt/up cycle #73

grahamwhaley · 2019-03-20T11:44:35Z

Hi @ganeshmaharaj - I'm reporting here (but really against your https://github.com/ganeshmaharaj/vagrant-stuff/tree/master/k8s which I believe is a derivative??) as I thought we'd get more exposure and eyes on it then....

I was using your Ubuntu based instance fine, but after a host reboot, when I bought it back up my kubectl didn't work any more. I had a dig, and I suspect it might be that some things do not come back up. I used a vagrant halt/up cycle to make it more refined and hopefully repeatable. Here are my logs:

Try to shutdown and restart stack:

vagrant up it

run the setup script

Note, I don't add the two slave nodes here... just working with the master node.

check we are up:

~/cloud-native-setup/clr-k8s-examples$ kubectl get nodes
NAME        STATUS   ROLES    AGE    VERSION
ubuntu-01   Ready    master   110s   v1.13.4
vagrant@ubuntu-01:~/cloud-native-setup/clr-k8s-examples$ 
vagrant@ubuntu-01:~/cloud-native-setup/clr-k8s-examples$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:f4:a0:2b brd ff:ff:ff:ff:ff:ff
    inet 192.168.121.170/24 brd 192.168.121.255 scope global dynamic eth0
       valid_lft 2899sec preferred_lft 2899sec
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:5c:ba:9c brd ff:ff:ff:ff:ff:ff
    inet 192.52.100.11/24 brd 192.52.100.255 scope global eth1
       valid_lft forever preferred_lft forever
4: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 0a:58:0a:58:00:01 brd ff:ff:ff:ff:ff:ff
    inet 10.88.0.1/16 scope global cni0
       valid_lft forever preferred_lft forever
5: vethfd79254a@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni0 state UP group default 
    link/ether 16:8c:cb:2a:fd:53 brd ff:ff:ff:ff:ff:ff link-netnsid 0
6: vetha05bf341@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni0 state UP group default 
    link/ether aa:6a:ff:b9:cc:5b brd ff:ff:ff:ff:ff:ff link-netnsid 1
7: veth9adc6a7d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni0 state UP group default 
    link/ether be:e1:57:ef:a1:8b brd ff:ff:ff:ff:ff:ff link-netnsid 2
8: vethfdd2b8a2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni0 state UP group default 
    link/ether e6:85:14:26:27:b4 brd ff:ff:ff:ff:ff:ff link-netnsid 3

vagrant halt it

[gwhaley@fido k8s]$ vagrant halt
==> ubuntu-03: Halting domain...
==> ubuntu-02: Halting domain...
==> ubuntu-01: Halting domain...
[gwhaley@fido k8s]$ vagrant status
Current machine states:

ubuntu-01                 shutoff (libvirt)
ubuntu-02                 shutoff (libvirt)
ubuntu-03                 shutoff (libvirt)

vagrant up it

# vagrant up --provider=libvirt

and fail

[gwhaley@fido k8s]$ vagrant ssh ubuntu-01
Last login: Wed Mar 20 04:13:25 2019 from 192.168.121.1
vagrant@ubuntu-01:~$ kubectl get nodes
The connection to the server 192.168.121.170:6443 was refused - did you specify the right host or port?

Wed 2019-03-20 04:21:47 PDT. --
e: Failed with result 'exit-code'.
e: Main process exited, code=exited, status=255/n/a
:47.776497    2117 server.go:261] failed to run Kubelet: failed to create kubelet: rpc error: code = Unav
:47.776368    2117 kuberuntime_manager.go:184] Get runtime version failed: rpc error: code = Unavailable 
:47.776217    2117 remote_runtime.go:72] Version from runtime service failed: rpc error: code = Unavailab
:47.775764    2117 util_unix.go:77] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please con
:47.775617    2117 util_unix.go:77] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please con
:47.774978    2117 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list
:47.774897    2117 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Ser
:47.774332    2117 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Nod
:47.757270    2117 kubelet.go:306] Watching apiserver
:47.757233    2117 kubelet.go:281] Adding pod path: /etc/kubernetes/manifests
:47.757167    2117 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]"
:47.757149    2117 state_mem.go:84] [cpumanager] updated default cpuset: ""
:47.757061    2117 state_mem.go:36] [cpumanager] initializing new in-memory state store
:47.757029    2117 container_manager_linux.go:272] Creating device plugin manager: true
:47.756872    2117 container_manager_linux.go:253] Creating Container Manager object based on Node Config
:47.756844    2117 container_manager_linux.go:248] container manager verified user specified cgroup-root 
:47.756590    2117 server.go:666] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaul
:47.743312    2117 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-cli
:47.740768    2117 plugins.go:103] No cloud provider specified.
:47.740588    2117 server.go:407] Version: v1.13.4
lv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's 
lv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's 
t: The Kubernetes Node Agent.
t: The Kubernetes Node Agent.

did we maybe change IP address?

agrant@ubuntu-01:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:f4:a0:2b brd ff:ff:ff:ff:ff:ff
    inet 192.168.121.170/24 brd 192.168.121.255 scope global dynamic eth0
       valid_lft 3500sec preferred_lft 3500sec
    inet6 fe80::5054:ff:fef4:a02b/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:5c:ba:9c brd ff:ff:ff:ff:ff:ff
    inet 192.52.100.11/24 brd 192.52.100.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe5c:ba9c/64 scope link 
       valid_lft forever preferred_lft forever

ah, no CNI ?? This output is significantly different from our first successful run...

The text was updated successfully, but these errors were encountered:

grahamwhaley · 2019-03-20T14:17:20Z

OK, I may have found a workaround.
I did a vagrant suspend, and it showed the same issue.
After the restart I checked the crio status, and it showed the same 'multiple key' issue like #64. (interestingly, I did remove those duplicates before the suspend - is something re-injecting them upon boot??).
I removed the duplicates again, then systemctl restarted first crio and then kubelet, and now I can see my nodes again...

grahamwhaley · 2019-07-23T09:55:18Z

This is against the ubuntu version, and is a little old now - closing...

grahamwhaley mentioned this issue Mar 20, 2019

kata-deploy causing cri-o to fail after node is up #64

Closed

grahamwhaley closed this as completed Jul 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ubuntu version: instance does not work after vagrant halt/up cycle #73

Ubuntu version: instance does not work after vagrant halt/up cycle #73

grahamwhaley commented Mar 20, 2019

grahamwhaley commented Mar 20, 2019

grahamwhaley commented Jul 23, 2019

Ubuntu version: instance does not work after vagrant halt/up cycle #73

Ubuntu version: instance does not work after vagrant halt/up cycle #73

Comments

grahamwhaley commented Mar 20, 2019

Try to shutdown and restart stack:

vagrant up it

run the setup script

check we are up:

vagrant halt it

vagrant up it

and fail

did we maybe change IP address?

grahamwhaley commented Mar 20, 2019

grahamwhaley commented Jul 23, 2019