-
Notifications
You must be signed in to change notification settings - Fork 311
dotnet test keeps hanging #1537
Comments
From @livarcocc on Tuesday, 21 August 2018 23:19:10 @natemcmaster do you know who owns |
From @livarcocc on Tuesday, 21 August 2018 23:19:10 From @natemcmaster on August 21, 2018 21:11 That's my team. You can move this to aspnet/Home. |
From @natemcmaster on Wednesday, 22 August 2018 00:21:54 @dustinmoris can share more logs? What do you see in console before the tests hang? Also, VSTest diagnostic logs may provide information about which process is stuck. Unlike |
From @dustinmoris on Friday, 24 August 2018 22:31:36 Before the tests I just see the normal .NET Core output from building the project:
Because it hangs I had to cancel the build. After running testlog.host.18-08-24_23-22-45_88253_4.txt Seems like this is the point at which it starts hanging. After this line it just keeps saying
Let me know if there's more information I can provide! |
From @natemcmaster on Monday, 27 August 2018 15:39:47 Thanks for the test logs. As far as I can tell, this rules out issues in the test runner itself. There appears to be some kind of deadlock in the test code. Gathering a process dump is the next step for investigating this. When dotnet test runs, there will be several "dotnet.exe" processes. I recommend using Process Explorer to find the right dotnet.exe process for which to collect a full process dump. The dotnet.exe process which is hanging will have startup arguments like this:
Once you have a process dump, we can look at where the threads are stuck. You should be able to open .dmp files in Visual Studio. |
Hi, So I have figured out that the deadlock was caused with Initially I had this problem on my MBP and on my Win 10 machine, but after changing some tests I have fixed the issue now, however I am still getting a deadlock on AppVeyor, Travis and Azure Pipelines for Ubuntu and Windows images, which is really weird because on two independent Windows 10 machines and on my MacBook Pro I can't reproduce this anymore. I feel like I am wasting here so much time trying to figure out why tests deadlock after upgrading the NuGet package |
You should be able to pull the latest version from the develop branch of Giraffe and then run With TestHost 2.0 the build works perfectly fine. After updating only the NuGet version of TestHost from 2.0 to 2.1 the The weird thing is that on my macOS Mojave MBP, Windows 10 Home laptop and Windows 10 Professional Desktop I am not able to get the tests hanging either. I don't think it is an AppVeyor issue though, because the same also happens in TravisCI and Azure Pipelines. TestHost 2.0 works perfectly on ALL environments. There is something incredibly weird going on here and I hope you can look into this as there might be some major flaw with TestHost 2.1. I'm more than happy to assist, but I would hope that Microsoft could take a little bit more the lead on investigating an issue which has only emerged after you made changes to your TestHost library. |
FYI I see two changes in TestHost 2.1 that might have an impact on test execution: Edit: hadn't read the history above... |
This doesn't repro locally for me on Win10 either. We use AppVeyor and TravisCI ourselves and haven't seen any issues like this. We have known those systems to be flakier if only because they run so much slower. That can make them prone to uncovering race conditions that you wouldn't normally hit during development. It seems like the only way to repro this right now is to send you a PR and try to debug it. |
Thanks @Tratcher for getting back so quickly and checking this yourself! I have narrowed it down that it has to do with these tests: What they both have in common is that the TestHost is being created with a content root and that they both stream a file. The streamFile handler is implemented here, which essentially calls into StreamCopyOperation.CopyToAsync. When I remove both files from the compilation by commenting them in the
That's a good point, however I think this is not the case here because the tests are deadlocking consistently on 3 different CI environments (AppVeyor, Travis and Azure Pipelines) every single time for many weeks now (since TestHost 2.1 has been released). They are also passing on all my machines consistently and some of them are not particularly strong either. I was thinking that perhaps what the CI servers have in common is that the tests are being executed from a docker container? Not sure how these environments are spawned but that could be a reason or maybe because they all run on a VM/container which has only a single core/CPU allocated? I've got time this evening to look into this in more detail again if you have some concrete ideas what to try or how to debug it the best I'll do my best to get this done today. |
The best thing to do is to narrow it down to a single test that fails. It may take a bit of binary searching where you delete half of the tests and see what happens. |
I also see that the tests target both net461 and netcoreapp2.1. I've tried dropping net461 in my PR. |
Ok, I can help you with this, because I've already done a lot of these tests where I was removing everything but a few tests. I'll add commits to your PR if that is ok with you? |
Go right ahead. |
@davidfowl here's another threadpool exhastion variant. giraffe-fsharp/Giraffe#317 (comment) xunit limits the number of threads dynamically based on the number of cores. On CIs like AppVeyor and Travis that could be a very low number (2?). It also runs tests in parallel. If tests start blocking any of those threads then it can quickly starve the threadpool, block any async operations, and cause a full deadlock. |
x-thread update. This is mostly resolved by the debugging/discussion at giraffe-fsharp/Giraffe#317. The tests were starving threads via Task.Result and making them properly async addressed the issue. The TestHost 2.1 changes discussed in #1537 (comment) appear to have made TestHost more sensitive to thread starvation, likely by running more callbacks on the threadpool rather than inline. The only question remaining is which of the two checkins caused this and what background operation is stuck. |
Closing as there's no plan to pursue this. |
I just ran across the same issue, and switching everything to async/await fixed it, cheers all |
From @livarcocc on Tuesday, 21 August 2018 23:19:09
From @dustinmoris on August 21, 2018 20:20
Steps to reproduce
.\build.ps1
(or just rundotnet test ./tests/Giraffe.Tests/Giraffe.Tests.fsproj -f netcoreapp2.1
)The build will succeed.
If you go to the
Giraffe.Tests.fsproj
file and changeto
and re-run the tests then you'll see that it will hang forever as if something went into a deadlock.
Previously I was using
dotnet xunit
which also didn't have problems, but sincedotnet xunit
has been discontinued I switched over todotnet test
which is giving me problems now.Also this issue doesn't exist if I downgrade the test project to netcoreapp2.0.
Expected behavior
Tests will not deadlock with
Microsoft.AspNetCore.TestHost
version2.1.*
.Actual behavior
Tests hang forever.
Environment data
dotnet --info
output:Copied from original issue: dotnet/cli#9874
Copied from original issue: dotnet/aspnetcore#3443
The text was updated successfully, but these errors were encountered: