-
-
Notifications
You must be signed in to change notification settings - Fork 794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChannesLiveServerTestcase hanging #962
Comments
|
If the empty test case is hanging, then that's the actual fault here, so I will ignore Selenium for now. Locally, I can get a similar test to pass: And this is, of course, on the newest master of all the packages. Did you say this only happens on channels master, not channels 2.0.2? If so, is it possible you could bisect and work out which commit does it? |
|
Yes, it's not Selenium. I don't know if you had seen that I updated my post to exclude it. The empty test case ( The first commit where the test doesn't pass any more is (as I assumed) 15f8b7d. |
|
Alright, so it's likely something with the
|
|
This is what I found out so far: MacOS: I couldn't see that another python/pytest process was started during the test. Ubuntu (17.10): I get the same error. But I see that another pytest process starts. This is the codebase for my tests: https://github.com/dgilge/channels-tests. I also tried a non-empty test. I did some tests with both, Python 3.5 and 3.6. |
|
I don't know if this helps but on my computer both tests in |
|
It's likely related, but I unfortunately do not have a Mac OS computer to test on, so I'll have to rely on your or someone else to work out exactly what part of it is failing... |
|
I don't know if you had seen that it failed on Ubuntu, too. Travis has the same issue: https://travis-ci.org/dgilge/channels-tests In which test environment did your test work? I think I was able to circumcribe the issue a bit. |
|
I am running it under an Ubuntu 16.04 environment inside of Windows Subsystem for Linux, so it may be that somehow the linux emulation layer fixes the file descriptor issue. The same code works fine for the Daphne test suite on Travis, though. |
|
The last released version without this issue on my macOS installation is |
|
I'll second that I can reproduce this issue on macOS with a very minimal test case: Running in console: I've isolated the Channels code that is hanging by stepping through with a debugger: As mentioned above, just before it hangs This bug implies that that Part 4 of the Channels Tutorial on automated testing no longer works on macOS. I can provide more detailed repro instructions if it would help. I have a repository of the Channels tutorial with the bug visible in it. |
|
It would be nice if you could play around with that line and see if there's anything we can do that can fix it? Daphne live tests have to run a subprocess, so we need some way to wait for it to be ready or we'll have to say you can't run these on OSX for whatever reason is blocking it... |
|
Sure I’ll give it a harder look. I suspect it’ll be a week or so before I find a time window. |
|
I've spent a couple more hours troubleshooting today. Between Channels 2.0.2 and 2.1.x the implementation of ChannelsLiveServerTestCase was rewritten to use multiprocessing, so the implementation is fairly different. The multiprocessing implementation crashes fairly deeply inside Twisted. It gets a "[Errno 9] Bad file descriptor" when trying to open a socket. Failing code path: I'm not very familiar with Twisted, so I don't have a lot of ideas for efficiently proceeding in investigating. I wonder if Twisted is having trouble running inside of a forked context, which sometimes does strange things to file descriptors. |
|
I'm not super familiar with the interior of Twisted either, unfortunately. |
|
Aye. I'll give it another hard look when I can next find a time block. |
|
I've done some more experiments by writing a barebones program that interacts with Twisted in a similar manner as ChannelsLiveServerTestCase. The following sequence causes Twisted to crash with If I fork before setting up the reactor, the problem goes away: Also, If I switch to the default SelectReactor it doesn't matter what order I fork in When running a ChannelsLiveServerTestCase, the sequence of events is:
So that's the problem: Twisted's AsyncioSelectorReactor can't be forked after it is setup. Maybe that's a bug on Twisted's end. Or maybe AsyncioSelectorReactor isn't designed to be used that way and Channels should instead be forking before trying to setup the AsyncioSelectorReactor. |
|
Thanks for the investigation. It's very hard to fork before things get started in Twisted's reactor system, unfortunately, or I'd say that was the easier option. Maybe it can be done with a lot of code moving. |
|
Running that Twisted test program above on Ubuntu 18 with the |
|
Sorry to interrupt the discussion. As a beginner trying to complete the tutorial, what is the suggested way to get around this issue? Thank you. |
|
Assuming you are using macOS, downgrade Channels to 2.0.2. I don't have enough bandwidth to fix this for probably a month or two, so I am planning on putting in a documentation change in the meantime giving similar advice. |
|
If Mac OS allows, I'd suggest tracing file descriptor activity in the first process and all threads/children, in particular I suspect something is closing the once-good file descriptor causing the EBADF from listen(2). Note, this doesn't have to be a close(2), other system calls can cause an FD to be closed too, e.g. O_CLOEXEC and an execve(2). |
…e default 'fork'. Fixes channels bug django/channels#962 Also allow the Server to be passed a logger, which is needed if using multiprocessing.
|
So I thought I had a fix for this, but it fails in our project. Here's some relevant documentation I've found while researching this issue:
I think the best option is to revert back to subprocess unfortunately. |
|
I was finally able to write a fix, so that tests for our project work on Linux and Mac (Yosemite and Mojave). (And selenium works) In summary, I changed ChannelsLiveServerTestCase to use a single (running for the duration of all tests) thread to run Daphne. I added some methods to Daphne to clear all running applications, and to change the application that is used for new connections. The single thread prevents the ReactorNotRestartable errors if there is more than one Live TestCase. Relevant commits are: I don't know if my fix will fail for complex django tests with multiple or dynamic database connections. I am by no means an expert with threading, twisted, etc., so these changes should be looked over thoroughly. I hope these commits can help someone else write a pull request for Channels and Daphne. Some things that should be taken care of:
|
|
I took a look at this issue quite deeply, spending a day debugging and seeking for solutions. Then I thought this is a design issue rather than just fixing-a-bug, so I chose to post my result on Google Groups instead of here. But as @carltongibson replied, I think I should also leave a comment here. In a nutshell, So, how to fix it? This is where a "design issue" comes in. In the big picture, there are three ways:
I am not sure which one is the best design. I've already opened a PR django/daphne#247 using the first solution (which I think is the easiest way to solve) even though I am not 100% satisfied because it looks like a cheating. I believe the third solution was (at least partially) implemented by @shjohnson-pi as he commented above. The second one? Hmm... I have no idea what advantages it has over the third one. What is your opinion? |
|
Hi @kuratowski I personally like your PR ( django/daphne#247 ) that implements the first solution. It's relatively concise. Unfortunately that PR has been open for several months now with no comments on it. I've pinged it to hopefully give it some attention. |
|
Yes. Somewhat complex issue. To which I’ve not yet had the time to look. Sorry about the delay there. (Although anyone is welcome to review...) |
…e default 'fork'. Fixes channels bug django/channels#962 Also allow the Server to be passed a logger, which is needed if using multiprocessing.
…e default 'fork'. Fixes channels bug django/channels#962 Also allow the Server to be passed a logger, which is needed if using multiprocessing.
|
I recently ran into a problem with my solution (singleton thread for Daphne) when run with pytest-xdist (parallel/multiprocess). Because of imports, twisted was using the asyncio event loop from the main thread, even though it was run in a subthread. This caused async_to_sync calls to fail in synchronous code because it thought the main event loop was being used (which it technically was by twisted). After trial and error I figured out how to force twisted to use a new asyncio event loop for it's own thread. And, as a bonus, the thread no longer has to be a singleton, so it properly closes resources at the end of each TestCase. New Summary of my Changes:
Channels:
I feel more confident about this code than I did before. Do you want me to start a pull request with these changes? And, if so, should I squash the commits?
|
|
@shjohnson-pi Yes please. Happy to review separate commits if that makes the changes clearer. |
|
Ref this, django/daphne#247 looks to resolve the test hang/failures on macOS, which is django/daphne#179. I'm minded to bring that in. @shjohnson-pi fancy taking a quick look? |
|
@carltongibson I believe the other pull request will fix the immediate problem, but doesn't allow mocking files during the test (since there are multiple processes). I have been using my branch to run our own project tests with extensive mocking, so for my personal use cases that pull request won't work. |
…e default 'fork'. Fixes channels bug django/channels#962 Also allow the Server to be passed a logger, which is needed if using multiprocessing.
Here's another issue I encountered...
What I expected to happen vs. what actually happened
This test should pass:
But it is hanging. Firefox opens and then nothing happens. In my "real" test it shows no URL and no content. I have to cancel the test.
You can even remove the selenium code showing the same behavior:
It works with Channels 2.0.2 though. Therefore I think the issue is related to Channels somehow.
OS and runtime environment, and browser
MacOS Sierra 10.12.6, Firefox 58.0.2
The versions of Channels, Daphne, Django, Twisted, and your ASGI backend (channels_redis normally)
How I'm running Channels
pytest
Console logs and full tracebacks of any errors
The text was updated successfully, but these errors were encountered: