New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes-system-spec post-start script sometimes fails #143

Closed
alex-slynko opened this Issue Dec 9, 2017 · 8 comments

Comments

Projects
None yet
6 participants
@alex-slynko
Member

alex-slynko commented Dec 9, 2017

How we achieved it:

1. Start kubo on BOSH lite with two masters
1. Start deploying spec on kubernetes
1. Start upgrade

Post-deploy logs:

W1208 22:52:19.150637    2701 factory_object_mapping.go:423] Failed to download OpenAPI (Get http://localhost:8080/swagger-2.0.0.pb-v1: dial tcp 127.0.0.1:8080: getsockopt: connection refused), falling back to swagger
The connection to the server localhost:8080 was refused - did you specify the right host or port?

kubernetes-api

I1208 22:52:46.058403    2613 wrap.go:42] GET /api/v1/namespaces/default/services/kubernetes: (1.371968ms) 200 [[kube-apiserver/v1.8.2 (linux/amd64) kubernetes/bdaeafa] 127.0.0.1:32978]
I1208 22:52:46.060345    2613 wrap.go:42] GET /api/v1/namespaces/default/endpoints/kubernetes: (899.411µs) 200 [[kube-apiserver/v1.8.2 (linux/amd64) kubernetes/bdaeafa] 127.0.0.1:32978]
I1208 22:52:46.396386    2613 logs.go:41] http: TLS handshake error from 192.168.50.1:64482: read tcp 10.240.0.2:8443->192.168.50.1:64482: read: connection reset by peer
I1208 22:52:47.198015    2613 wrap.go:42] GET /api/v1/namespaces/kube-system/endpoints/kube-controller-manager: (1.421929ms) 200 [[kube-controller-manager/v1.8.2 (linux/amd64) kubernetes/bdaeafa/leader-election] 127.0.0.1:34170]
I1208 22:52:47.313919    2613 wrap.go:42] GET /version: (284.565µs) 200 [[kubectl/v1.8.4 (darwin/amd64) kubernetes/9befc2b] 192.168.50.1:64496]
I1208 22:52:47.476920    2613 wrap.go:42] GET /api/v1/namespaces/kube-system/endpoints/kube-scheduler: (1.473922ms) 200 [[kube-scheduler/v1.8.2 (linux/amd64) kubernetes/bdaeafa/leader-election] 127.0.0.1:34164]
I1208 22:52:48.454099    2613 logs.go:41] http: TLS handshake error from 192.168.50.1:64506: read tcp 10.240.0.2:8443->192.168.50.1:64506: read: connection reset by peer
I1208 22:52:49.428775    2613 wrap.go:42] GET /version: (292.899µs) 200 [[kubectl/v1.8.4 (darwin/amd64) kubernetes/9befc2b] 192.168.50.1:64522]
I1208 22:52:50.115068    2613 wrap.go:42] GET /api/v1/namespaces/kube-system/endpoints/kube-controller-manager: (2.257356ms) 200 [[kube-controller-manager/v1.8.2 (linux/amd64) kubernetes/bdaeafa/leader-election] 127.0.0.1:34170]
I1208 22:52:50.527864    2613 logs.go:41] http: TLS handshake error from 192.168.50.1:64532: read tcp 10.240.0.2:8443->192.168.50.1:64532: read: connection reset by peer
I1208 22:52:50.605928    2613 wrap.go:42] GET /api/v1/namespaces/kube-system/endpoints/kube-scheduler: (1.355343ms) 200 [[kube-scheduler/v1.8.2 (linux/amd64) kubernetes/bdaeafa/leader-election] 127.0.0.1:34164]
I1208 22:52:51.550838    2613 wrap.go:42] GET /version: (321.879µs) 200 [[kubectl/v1.8.4 (darwin/amd64) kubernetes/9befc2b] 192.168.50.1:64548]
I1208 22:52:52.505899    2613 logs.go:41] http: TLS handshake error from 192.168.50.1:64556: read tcp 10.240.0.2:8443->192.168.50.1:64556: read: connection reset by peer
I1208 22:52:52.884368    2613 wrap.go:42] GET /api/v1/namespaces/kube-system/endpoints/kube-controller-manager: (4.02103ms) 200 [[kube-controller-manager/v1.8.2 (linux/amd64) kubernetes/bdaeafa/leader-election] 127.0.0.1:34170]
I1208 22:52:53.287783    2613 wrap.go:42] GET /api/v1/namespaces/kube-system/endpoints/kube-scheduler: (1.226977ms) 200 [[kube-scheduler/v1.8.2 (linux/amd64) kubernetes/bdaeafa/leader-election] 127.0.0.1:34164]

Scheduler logs

E1208 22:52:09.917764    2658 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1beta1.StatefulSet: Get http://localhost:8080/apis/apps/v1beta1/statefulsets?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:09.919959    2658 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.PersistentVolume: Get http://localhost:8080/api/v1/persistentvolumes?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:09.932021    2658 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Service: Get http://localhost:8080/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:09.934799    2658 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1beta1.ReplicaSet: Get http://localhost:8080/apis/extensions/v1beta1/replicasets?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:10.914047    2658 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Node: Get http://localhost:8080/api/v1/nodes?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:10.938769    2658 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.PersistentVolumeClaim: Get http://localhost:8080/api/v1/persistentvolumeclaims?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:10.948219    2658 reflector.go:205] k8s.io/kubernetes/plugin/cmd/kube-scheduler/app/server.go:103: Failed to list *v1.Pod: Get http://localhost:8080/api/v1/pods?fieldSelector=status.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:10.951289    2658 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.ReplicationController: Get http://localhost:8080/api/v1/replicationcontrollers?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:10.967948    2658 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1beta1.StatefulSet: Get http://localhost:8080/apis/apps/v1beta1/statefulsets?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:10.976550    2658 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1beta1.ReplicaSet: Get http://localhost:8080/apis/extensions/v1beta1/replicasets?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:10.977987    2658 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.PersistentVolume: Get http://localhost:8080/api/v1/persistentvolumes?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:10.984232    2658 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Service: Get http://localhost:8080/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:11.935570    2658 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Node: Get http://localhost:8080/api/v1/nodes?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused

kubernetes-control-manager

I1208 22:52:06.789921    2646 leaderelection.go:174] attempting to acquire leader lease...
E1208 22:52:06.844327    2646 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get http://localhost:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:10.311677    2646 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get http://localhost:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:14.583467    2646 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get http://localhost:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:18.204373    2646 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get http://localhost:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:21.289054    2646 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get http://localhost:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1208 22:52:24.327511    2646 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get http://localhost:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:8080: getsockopt: connection refused
@cf-gitbot

This comment has been minimized.

cf-gitbot commented Dec 9, 2017

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/153535114

The labels on this github issue will be updated when the story is started.

@JaredGordon

This comment has been minimized.

JaredGordon commented Dec 15, 2017

+1

@bsnchan

This comment has been minimized.

Contributor

bsnchan commented Dec 18, 2017

Hey @alex-slynko - this got prioritized during our IPM today.

@cf-gitbot cf-gitbot added in progress and removed scheduled labels Dec 19, 2017

@alex-slynko

This comment has been minimized.

Member

alex-slynko commented Dec 19, 2017

Here is the link to our deployment scripts https://github.com/bstick12/kubecon/tree/feature/haproxy

@jfmyers9

This comment has been minimized.

Member

jfmyers9 commented Dec 19, 2017

Hi @alex-slynko,

We tried to reproduce this with your deployment scripts above. After running them 4 times, we were unable to get a reproduction.

Is there something that we are missing or do you have a more consistent way to reproduce the steps?

Also, the deployment configuration (2 master nodes and 1 etcd node) is no longer recommended or supported, so we are hesitant to explore this issue much further unless we are able to see it on a more recent version/deployment configuration.

Best,

@jfmyers9 && @BenChapman

@cf-gitbot cf-gitbot added scheduled and removed in progress labels Dec 19, 2017

@JaredGordon

This comment has been minimized.

JaredGordon commented Dec 19, 2017

For us, upgrading the PKS tile to the latest (says v.0.7.0 on it) fixed the issue, we're able to spin up clusters again.

@srm09

This comment has been minimized.

Contributor

srm09 commented Jan 2, 2018

Fixed in this commit

@alex-slynko

This comment has been minimized.

Member

alex-slynko commented Jan 2, 2018

Thanks.

@cf-gitbot cf-gitbot added the accepted label Jan 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment