Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash due to "too many file descriptors in select()" by Tornado on Windows #2916

Closed
KodySalak opened this issue Nov 28, 2018 · 11 comments
Closed
Labels
bug Issue describes a bug done Done but not yet released needs testing Testing from the community is needed triage This issue needs triage unreproduced No reproduction in a dev setting yet, further analysis blocked by that
Milestone

Comments

@KodySalak
Copy link

What were you doing?

  1. Was printing a regular print, and OctoPrint crashed.

What did you expect to happen?

For the print to keep going...?

What happened instead?

The OctoPrint python client crashed.

Did the same happen when running OctoPrint in safe mode?

Does not apply, this was a one-off and I wanted to report it.

Version of OctoPrint

1.3.9

Operating System running OctoPrint

Windows Server 2012 R2/Python 2.7.14

Printer model & used firmware incl. version

Creality Ender 3 / "Marlin V1; Sprinter/grbl mashup for gen6"

Browser and version of browser, operating system running browser

Chrome 70.0.3538.110 (Offical) / Windows 10 1803

Link to octoprint.log

https://pastebin.com/rWuzvfAq

Link to contents of terminal tab or serial.log

NA, was unable to capture this information.

Link to contents of Javascript console in the browser

NA, was unable to capture this information.

Screenshot(s)/video(s) showing the problem:

NA, was unable to capture this information.

I have read the FAQ.
Yes.

@GitIssueBot GitIssueBot added the triage This issue needs triage label Nov 28, 2018
@foosel
Copy link
Member

foosel commented Nov 28, 2018

Thank you for the thorough report!

Looks like that's an issue with Tornado under Windows, see tornadoweb/tornado#1802. How many clients did you have connected when this happens? I have never before seen this particular error message pop up and as far as I understand its origins it should also only ever be possible to happen if there are significantly more connections held open to the server than is common with OctoPrint.

I'm not sure if that can be fixed in any way, but it can possibly be mitigated by changing the keepalive values as suggested in the linked ticket. I'll take a look into that.

In general though I'd honestly recommend to not use OctoPrint on Windows machines in production - yes, it works thanks to it being cross platform, but running a full fledged Windows machine for it (with potential unexpected reboots and other fun) kinda defeats the whole purpose.

@KodySalak
Copy link
Author

KodySalak commented Nov 28, 2018 via email

@foosel
Copy link
Member

foosel commented Nov 28, 2018

Huh, then I have no idea how that could even happen. I develop this software under Windows, so I should have seen this by now if it was a general problem (I have more than one connection open regularly). Odd.

@foosel foosel added this to the 1.3.11 milestone Nov 28, 2018
@foosel foosel changed the title "One-Off" error wanted to report. Crash due to "too many file descriptors in select()" by Tornado on Windows Nov 28, 2018
@KodySalak
Copy link
Author

KodySalak commented Nov 28, 2018 via email

@KodySalak
Copy link
Author

@foosel It happened again tonight, logs are here: https://pastebin.com/cY00ijNQ

@KodySalak
Copy link
Author

Same environment variables as before, I think I'm going to update to the RC and see if it stops.

@foosel
Copy link
Member

foosel commented Dec 11, 2018

The RC is no longer, 1.3.10 stable got released yesterday. But I'd be surprised if that solves this.

foosel added a commit that referenced this issue Dec 12, 2018
@foosel
Copy link
Member

foosel commented Dec 12, 2018

I've pushed a commit that sets the idle timeout of tornado to 10min, but only under windows, as suggested in the thread linked above. Available on maintenance, soon devel, to be rolled out with 1.3.11. That commit will not solve the problem as that's an inherent problem with how Tornado works on Windows, but hopefully makes it less likely to run into this.

I have still not been able to reproduce it myself under regular use, so something about your environment must be different. Possibly some network tooling that keeps pinging the server, using up connections. I remember reading that some Netgear routers were fairly aggressive in that regard. If you keep running into this even after 1.3.11 with this patch goes live, you might want to switch to a Linux based setup since this patch is the only thing I'm able to do here. As you can see in the linked thread, the official recommendation from the Tornado people is "don't use Windows" in such situations as well.

@foosel foosel added unreproduced No reproduction in a dev setting yet, further analysis blocked by that needs testing Testing from the community is needed done Done but not yet released labels Dec 12, 2018
@KodySalak
Copy link
Author

KodySalak commented Dec 12, 2018 via email

@foosel
Copy link
Member

foosel commented May 15, 2019

1.3.11 was released yesterday.

@foosel foosel closed this as completed May 15, 2019
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 28, 2020
@foosel foosel added the bug Issue describes a bug label Oct 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue describes a bug done Done but not yet released needs testing Testing from the community is needed triage This issue needs triage unreproduced No reproduction in a dev setting yet, further analysis blocked by that
Projects
None yet
Development

No branches or pull requests

3 participants