Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on windows 10, devspace keeps exiting and throwing an error after running fine for a few minutes. #341

Closed
KaelBaldwin opened this issue Oct 31, 2018 · 19 comments
Assignees
Labels
area/sync Issues related to the real-time code synchronization kind/bug Something isn't working needs-information Indicates an issue needs more information to work on it system/windows Issues related to using DevSpace CLI on Windows

Comments

@KaelBaldwin
Copy link

KaelBaldwin commented Oct 31, 2018

What happened?

  1. I run devspace up.
  2. devspace deployed and I'm able to work in the container.
  3. After a few minutes I get kicked out of the container and devspace.
  • I can use devspace up again to reconnect and keep working, but it keeps happening.
  1. I see this in the logs:
    {"level":"error","msg":"Runtime error occurred: error copying from remote stream to local connection: readfrom tcp4 127.0.0.1:61572-\u003e127.0.0.1:61574: write tcp4 127.0.0.1:61572-\u003e127.0.0.1:61574: wsasend: An established connection was aborted by the software in your host machine.","time":"2018-10-31T11:11:26-05:00"}

What did you expect to happen instead?
I should be able to keep working in the container however long I need to.

How can we reproduce the bug? (as minimally and precisely as possible)
Follow the same steps I did in the "What happened" section.

Local Environment:

  • Operating System: windows 10
  • Deployment method: helm

Kubernetes Cluster:

  • Cloud Provider: Baremetal via Rancher 2.0 rancher
  • Kubernetes Version: Client Version: v1.10.2 Server version: v1.11.1

Anything else we need to know?
This might be happening during the sync operation, I'm not sure, it seems more stable after everything is synced up. I'll update further when I'm more sure about that.

/kind bug

@KaelBaldwin
Copy link
Author

After doing devspace up a few times to re-enter into the container, I noticed that this stops happening.

So I believe this is happening for some reason during the sync process, when there is a large volume of files to sync.

@LukasGentele LukasGentele added kind/bug Something isn't working priority/critical area/sync Issues related to the real-time code synchronization system/windows Issues related to using DevSpace CLI on Windows labels Oct 31, 2018
@FabianKramm
Copy link
Collaborator

@KaelBaldwin I think we need more information to reproduce this issue, as it does not happen currently on our machines. I'm not completely sure where this error originates from in the source code. Is there any error in the sync.log when this happens? Is there a fatal message in the console?

@KaelBaldwin
Copy link
Author

@FabianKramm No fatal message or anything in the console, other than I'm suddenly disconnected from the container. The error that was in the error.log is in my original post, but there isn't anything very helpful in the sync.log other than a "Sync stopped" message.

I suspect it only occurs when there is a lot of data to sync? If your repo isn't as large it might not occur. Maybe I could create a public repo with a ton of files and try to reproduce it with that.

It very well could be something specific to my machine, or even a temporary connectivity issue with the server it is syncing to.

Might I suggest just making the sync process more robust? Would there be any drawbacks to having it retry to continue syncing if there is an error that causes the sync to quit? There must be some sort of panic going on somewhere that's causing devspace to quit, but it is weird that the console isn't getting any error message.

I can try to debug this locally and see exactly where it panics out, I'll look into it.

@KaelBaldwin
Copy link
Author

KaelBaldwin commented Nov 2, 2018

ok, I think this can be avoided by not running any commands until everything is synced.

I can only reproduce this now by running a command to install my project's dependencies while the project is still being synced up. Sorry for missing that detail.

I think #343 will fix this for me. so we can close this one out.

Nevermind, I just had it happen again with me just leaving the console running for a while. (~20 minutes)

Going to try debugging in goland.

@FabianKramm
Copy link
Collaborator

@KaelBaldwin Thanks for all the information! I analyzed the issue a bit and the errors you see in the errors.log actually originates from the portforwarding functionality directly in the kubectl package and is not related to the sync functionality. If you don't see any FATAL: [Sync] Fatal sync error: like message on the console when the terminal exits the sync is not the root cause for exiting the terminal, because if the sync loses connection or encounters any other fatal issue such a message would be printed.

Since you don't see any error message prior to the terminal exiting, I suspect two possible causes for this behavior:

  1. The terminal connection is somehow aborted on your side, either trough firewall/internet outage etc. What you could try is to open a shell through kubectl in parallel to devspace up and check if this terminal is closed too.
  2. The pod is somehow restarted/rescheduled/loses connection. Maybe the sync is the issue that the pod gets restarted/rescheduled, however I'm not really sure why this should be the case.

I guess we need some more information from your side about the circumstances this keeps happening, because I'm not really sure why the terminal should exit without any error and we cannot reproduce it unfortunately.

@FabianKramm FabianKramm added the needs-information Indicates an issue needs more information to work on it label Nov 5, 2018
@KaelBaldwin
Copy link
Author

Sure I'll keep looking into it, thanks

@KaelBaldwin
Copy link
Author

KaelBaldwin commented Nov 6, 2018

So noticed today while starting it up, that I got kicked out again. The first time I got kicked out, I decided to run exec into the container via kubectl as you mentioned while also running devspace up in another terminal.

I ran devspace up first and it connected fine.

I ran kubectl exec and it failed to connect. I then checked the devspace terminal and it did indeed disconnect.

So this does seem to be a connectivity issue.

I wonder if the data transfer going on during the sync process is causing a timeout that makes devspace give up it's connection somewhere and kick out.

FYI this is a on premises bare metal cluster so connectivity to the nodes should be fine.
But the cluster can get pretty busy. There are test servers being built on it regularly, so it could be a combination of traffic or maybe resource usage.

@KaelBaldwin
Copy link
Author

KaelBaldwin commented Nov 6, 2018

I have just discovered that kubectl exec, if established before devspace disconnects will persist through the disconnect though devspace does not.

@KaelBaldwin
Copy link
Author

So basically I think this goes down to whether or not you all want to try to make devspace more robust to temporarily stalled connections. If that's not feasible, feel free to close this issue and I'll just deal with the disconnects.

@FabianKramm

@FabianKramm
Copy link
Collaborator

@KaelBaldwin Thanks for all this detailed information! Regarding your last question, we definitively want to make devspace more robust. You've got really interesting results, because we internally already call the kubectl exec function (The call is in pkg/devspace/kubectl/client.go and the kubectl function is exec.Stream from the kubernetes project (see kubernetes). So this is really weird behavior and I guess as you already assumed is somehow caused by a side-effect of the port-forwarding/sync services. To verify this assumption can I ask you to do another test for us? Could you run devspace up bash, devspace enter bash and kubectl exec ... bash at the same time and tell us which of these three has disconnected, if the disconnect occurs?

@KaelBaldwin
Copy link
Author

Sure thing, I'm trying it out now

@KaelBaldwin
Copy link
Author

@FabianKramm they all failed this time, including kubectl exec, which had persisted every other time. I did some monitoring after getting that result:

image

Looks like the data transfer is spiking very high at a certain point, which is impressive! Haha. I'm thinking what's happening here is my network interface is getting 100% used by the sync and losing connections.

Further looking into it, I have a rather large file in my repo at the moment that I was using for test data. I'm betting when it gets transferred it's causing the overload.

@KaelBaldwin
Copy link
Author

That file might be what's been the problem all along. Removing it and seeing if that resolves everything.

@KaelBaldwin
Copy link
Author

KaelBaldwin commented Nov 7, 2018

ok, yeah things were much more tame after removing the file.

Highest it got up to was around 70% usage, and no disconnects.

@FabianKramm

@FabianKramm
Copy link
Collaborator

FabianKramm commented Nov 7, 2018

@KaelBaldwin Thanks for this information! I'm still wondering why exactly this happens though, because the sync opens only two kubectl exec shells to sync all files and I don't really understand why a high network usage would result in the os dropping network connections, that somehow sounds odd, because an usual browser file down/upload is normally also unrestricted in bandwidth usage. I would suspect that maybe the remote provider has a bandwidth limit and somehow aborts the connections if they use up too much bandwidth. I'm not sure how to test that though.

@FabianKramm
Copy link
Collaborator

FabianKramm commented Nov 7, 2018

But I guess restricting the sync to use only a certain amount of network bandwidth would probably solve your problem. So maybe we can implement a feature where you can specify an upper limit for the sync to use.

@KaelBaldwin
Copy link
Author

@FabianKramm as far as the remote provider goes, this is a baremetal kubernetes cluster so I'm on the same network

@KaelBaldwin
Copy link
Author

KaelBaldwin commented Nov 7, 2018

I do think your idea for being able to specify an upper limit would resolve it.

As to if the OS is dropping connections or not, I'm not familiar enough with it's handling to speculate on that, other than maybe it doesn't drop the connections, but perhaps there is enough of a delay for kubectl exec to reach a timeout and disconnect.

@FabianKramm
Copy link
Collaborator

FabianKramm commented Nov 7, 2018

@KaelBaldwin Okay I'll open a new issue for that and close this one. That is also the only solution I see currently that we can implement, while we cannot exactly find out who and why the connections are getting closed.

EDIT: Should you find out, why the connections are getting closed and that there is a better solution, feel free to reopen the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sync Issues related to the real-time code synchronization kind/bug Something isn't working needs-information Indicates an issue needs more information to work on it system/windows Issues related to using DevSpace CLI on Windows
Projects
None yet
Development

No branches or pull requests

3 participants