Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection reset by peer when reading slugbuilder logs #225

Closed
gabrtv opened this issue Mar 2, 2016 · 11 comments
Closed

Connection reset by peer when reading slugbuilder logs #225

gabrtv opened this issue Mar 2, 2016 · 11 comments
Labels

Comments

@gabrtv
Copy link
Member

gabrtv commented Mar 2, 2016

First git push on a fresh kube-aws cluster gave me this:

$ git push deis master
Counting objects: 76, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (65/65), done.
Writing objects: 100% (76/76), 17.76 KiB | 0 bytes/s, done.
Total 76 (delta 31), reused 16 (delta 6)
remote: ---> 2016/03/02 23:01:33 Error running git receive hook [attempting to stream logs (Get https://ip-10-0-0-166.us-west-2.compute.internal:10250/containerLogs/deis/slugbuild-zircon-addendum-275e7ca6-53c961aa/deis-slugbuilder?follow=true: read tcp 10.0.0.166:10250: connection reset by peer)]
Starting build... but first, coffee!
To ssh://git@deis.gabrtv.io:2222/zircon-addendum.git
 * [new branch]      master -> master

Looks like an internal network issue talking to the Kube API server, but hard to say. Subsequent pushes seem to work fine. Filing an issue for posterity.

@gabrtv gabrtv added the bug label Mar 2, 2016
@arschles
Copy link
Member

arschles commented Mar 3, 2016

I'm gonna guess this is the symptom of the same race described in #199. I've proposed a solution in #207 and a discussion has been started there on the topic.

@smothiki
Copy link
Contributor

smothiki commented Mar 3, 2016

@gabrtv any more debug info available? . events or kubectl get pods --namespace=deis -w would help.

@krancour
Copy link
Contributor

krancour commented Mar 3, 2016

@gabrtv, you need to increase the idle timeout on the ELB that k8s created for you. I wish we could configure that in the router svc, but it doesn't seem we can. 1200 should be enough.

@gabrtv
Copy link
Member Author

gabrtv commented Mar 3, 2016

@krancour i have the ELB timeout set to 600, which usually works fine.

@krancour
Copy link
Contributor

krancour commented Mar 3, 2016

@gabrtv ah ok then... bad guess on my part. The default is 60, and I know that's bitten a few people before.

@slack
Copy link
Member

slack commented Mar 3, 2016

I'm almost positive this is apiserver to kubelet communication problems:

Get https://ip-10-0-0-166.us-west-2.compute.internal:10250/containerLogs/deis/slugbuild-zircon-addendum-275e7ca6-53c961aa/deis-slugbuilder?follow=true: \
read tcp 10.0.0.166:10250: connection reset by peer)]

If you can grab the kubelet logs from the node ip-10-0-0-166.us-west-2.compute.internal that may be revelatory.

@gabrtv
Copy link
Member Author

gabrtv commented Mar 3, 2016

@slack full kubelet logs for that node here: https://gist.github.com/gabrtv/70ae044394f3491ea6cb

No smoking gun that I can see. However there are a few unexplained reboots with a decent amount of time where the kubelet was restarting. That could easily explain it.

Unless someone else finds something relevant in the logs, I'm inclined to close the issue and chalk it up to flakiness of the underlying cluster.. in which case the error is exactly what I'd expect.

@gabrtv gabrtv removed the bug label Mar 3, 2016
@arschles arschles added the bug label Mar 3, 2016
@arschles
Copy link
Member

arschles commented Mar 3, 2016

Adding this to RC1

@arschles arschles added this to the v2.0-rc1 milestone Mar 3, 2016
@slack
Copy link
Member

slack commented Mar 14, 2016

There are a series of node reboots in the logs which would explain "connection reset by peer" if those overlapped with the log fetches.

@bacongobbler bacongobbler removed this from the v2.0-rc1 milestone May 19, 2016
@arschles
Copy link
Member

arschles commented Jun 7, 2016

@slack @gabrtv @krancour has anyone seen this pop up in beta4 or RC1? Since I haven't, I'm inclined to close.

@smothiki
Copy link
Contributor

smothiki commented Jun 7, 2016

I dont think any one has reported this since beta4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants