Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu version: instance does not work after vagrant halt/up cycle #73

Closed
grahamwhaley opened this issue Mar 20, 2019 · 2 comments
Closed

Comments

@grahamwhaley
Copy link

Hi @ganeshmaharaj - I'm reporting here (but really against your https://github.com/ganeshmaharaj/vagrant-stuff/tree/master/k8s which I believe is a derivative??) as I thought we'd get more exposure and eyes on it then....

I was using your Ubuntu based instance fine, but after a host reboot, when I bought it back up my kubectl didn't work any more. I had a dig, and I suspect it might be that some things do not come back up. I used a vagrant halt/up cycle to make it more refined and hopefully repeatable. Here are my logs:


Try to shutdown and restart stack:

vagrant up it

run the setup script

Note, I don't add the two slave nodes here... just working with the master node.

check we are up:

~/cloud-native-setup/clr-k8s-examples$ kubectl get nodes
NAME        STATUS   ROLES    AGE    VERSION
ubuntu-01   Ready    master   110s   v1.13.4
vagrant@ubuntu-01:~/cloud-native-setup/clr-k8s-examples$ 
vagrant@ubuntu-01:~/cloud-native-setup/clr-k8s-examples$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:f4:a0:2b brd ff:ff:ff:ff:ff:ff
    inet 192.168.121.170/24 brd 192.168.121.255 scope global dynamic eth0
       valid_lft 2899sec preferred_lft 2899sec
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:5c:ba:9c brd ff:ff:ff:ff:ff:ff
    inet 192.52.100.11/24 brd 192.52.100.255 scope global eth1
       valid_lft forever preferred_lft forever
4: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 0a:58:0a:58:00:01 brd ff:ff:ff:ff:ff:ff
    inet 10.88.0.1/16 scope global cni0
       valid_lft forever preferred_lft forever
5: vethfd79254a@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni0 state UP group default 
    link/ether 16:8c:cb:2a:fd:53 brd ff:ff:ff:ff:ff:ff link-netnsid 0
6: vetha05bf341@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni0 state UP group default 
    link/ether aa:6a:ff:b9:cc:5b brd ff:ff:ff:ff:ff:ff link-netnsid 1
7: veth9adc6a7d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni0 state UP group default 
    link/ether be:e1:57:ef:a1:8b brd ff:ff:ff:ff:ff:ff link-netnsid 2
8: vethfdd2b8a2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni0 state UP group default 
    link/ether e6:85:14:26:27:b4 brd ff:ff:ff:ff:ff:ff link-netnsid 3

vagrant halt it

[gwhaley@fido k8s]$ vagrant halt
==> ubuntu-03: Halting domain...
==> ubuntu-02: Halting domain...
==> ubuntu-01: Halting domain...
[gwhaley@fido k8s]$ vagrant status
Current machine states:

ubuntu-01                 shutoff (libvirt)
ubuntu-02                 shutoff (libvirt)
ubuntu-03                 shutoff (libvirt)

vagrant up it

# vagrant up --provider=libvirt

and fail

[gwhaley@fido k8s]$ vagrant ssh ubuntu-01
Last login: Wed Mar 20 04:13:25 2019 from 192.168.121.1
vagrant@ubuntu-01:~$ kubectl get nodes
The connection to the server 192.168.121.170:6443 was refused - did you specify the right host or port?

Wed 2019-03-20 04:21:47 PDT. --
e: Failed with result 'exit-code'.
e: Main process exited, code=exited, status=255/n/a
:47.776497    2117 server.go:261] failed to run Kubelet: failed to create kubelet: rpc error: code = Unav
:47.776368    2117 kuberuntime_manager.go:184] Get runtime version failed: rpc error: code = Unavailable 
:47.776217    2117 remote_runtime.go:72] Version from runtime service failed: rpc error: code = Unavailab
:47.775764    2117 util_unix.go:77] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please con
:47.775617    2117 util_unix.go:77] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please con
:47.774978    2117 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list
:47.774897    2117 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Ser
:47.774332    2117 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Nod
:47.757270    2117 kubelet.go:306] Watching apiserver
:47.757233    2117 kubelet.go:281] Adding pod path: /etc/kubernetes/manifests
:47.757167    2117 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]"
:47.757149    2117 state_mem.go:84] [cpumanager] updated default cpuset: ""
:47.757061    2117 state_mem.go:36] [cpumanager] initializing new in-memory state store
:47.757029    2117 container_manager_linux.go:272] Creating device plugin manager: true
:47.756872    2117 container_manager_linux.go:253] Creating Container Manager object based on Node Config
:47.756844    2117 container_manager_linux.go:248] container manager verified user specified cgroup-root 
:47.756590    2117 server.go:666] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaul
:47.743312    2117 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-cli
:47.740768    2117 plugins.go:103] No cloud provider specified.
:47.740588    2117 server.go:407] Version: v1.13.4
lv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's 
lv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's 
t: The Kubernetes Node Agent.
t: The Kubernetes Node Agent.

did we maybe change IP address?

agrant@ubuntu-01:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:f4:a0:2b brd ff:ff:ff:ff:ff:ff
    inet 192.168.121.170/24 brd 192.168.121.255 scope global dynamic eth0
       valid_lft 3500sec preferred_lft 3500sec
    inet6 fe80::5054:ff:fef4:a02b/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:5c:ba:9c brd ff:ff:ff:ff:ff:ff
    inet 192.52.100.11/24 brd 192.52.100.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe5c:ba9c/64 scope link 
       valid_lft forever preferred_lft forever

ah, no CNI ?? This output is significantly different from our first successful run...

@grahamwhaley
Copy link
Author

OK, I may have found a workaround.
I did a vagrant suspend, and it showed the same issue.
After the restart I checked the crio status, and it showed the same 'multiple key' issue like #64. (interestingly, I did remove those duplicates before the suspend - is something re-injecting them upon boot??).
I removed the duplicates again, then systemctl restarted first crio and then kubelet, and now I can see my nodes again...

@grahamwhaley
Copy link
Author

This is against the ubuntu version, and is a little old now - closing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant