Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: ProfessionalColorTable_ChangeUserPreferences_GetColor_ReturnsExpected deadlock #3254

Closed
RussKie opened this issue May 11, 2020 · 18 comments
Labels
🪲 bug Product bug (most likely) test-bug Problem in test source code (most likely)

Comments

@RussKie
Copy link
Member

RussKie commented May 11, 2020

Problem description:

After #3226 was merged, ProfessionalColorTable_ChangeUserPreferences_GetColor_ReturnsExpected tests started deadlocking in x86 mode that caused CI builds to fail again.
I managed to reproduce the deadlock locally, though it took few attempts to do so.

Looks like the test deadlocks on itself, there are no other user-code executed:
image

image

Expected behavior:

The tests work as expected.

Minimal repro:

A repro is bit convoluted. Unfortunately build.cmd -test -platform x86 command doesn't appear to work unless Winforms.sln is configured for x86 platform (which breaks other modes).

@RussKie RussKie added the test-bug Problem in test source code (most likely) label May 11, 2020
@RussKie
Copy link
Member Author

RussKie commented May 11, 2020

/cc: @hughbe

@weltkante
Copy link
Contributor

weltkante commented May 11, 2020

Looks like the test deadlocks on itself, there are no other user-code executed:

When it looks like its "deadlocking on itself" it instead is usually sending a window message to a control without message loop.

SendMessageOnUserPreferenceChange seems to send a message to the desktop, which probably tries to notify all forms, so if there is any that has not been properly disposed and still hanging around without a message loop you'll probably see this hang.

You could try running the test under RemoteExecutor like the other one @hughbe recently moved, forgot which it was, something with visual styles I think. Though considering that its sending to Desktop it may actually broadcast across processes, so not entirely sure if RemoteExecutor helps here.

Alternatively figure out which control is not properly disposed, but that probably needs some tracking infrastructure.

[edit] might have misunderstood the comments, "desktop" might mean "Desktop Framework" and not the "Desktop HWND", going to check the implementation of SystemEvents

@weltkante
Copy link
Contributor

weltkante commented May 11, 2020

The SystemEvents class will "take over" the STA thread from which it is invoked (by creating a HWND on it that it never disposes) and expects it to live until the process terminates.

That means SystemEvents are incompatible with multiple UI threads UI threads that terminate early, you'll have to test SystemEvents in a RemoteExecutor.

Alternatively, if you could force SystemEvents to be first accessed on an MTA thread, then it would create and maintain its own STA thread and not cause failures. But I don't see how you can easily enforce that the first access to SystemEvents happens on MTA.

@RussKie
Copy link
Member Author

RussKie commented May 11, 2020

Thank you.

Alternatively figure out which control is not properly disposed,

That's probably not too difficult - throw an exception in ~Control :) Though I think we could drown in those.

You could try running the test under RemoteExecutor

The trouble with the RE - dotnet/arcade#5325
So while it is runnable locally, it doesn't appear to be runnable on the build agents...

@RussKie
Copy link
Member Author

RussKie commented May 11, 2020

The SystemEvents class will "take over" the STA thread from which it is invoked and expects it to live until the process terminates.

That means SystemEvents are incompatible with multiple UI threads, you'll have to test SystemEvents in a RemoteExecutor.

Alternatively, if you could force SystemEvents to be first accessed on an MTA thread, then it would create and maintain its own STA thread and not cause failures. But I don't see how you can easily enforce that the first access to SystemEvents happens on MTA.

Wow, thank you!

@weltkante
Copy link
Contributor

That's probably not too difficult - throw an exception in ~Control :) Though I think we could drown in those.

Yeah and it doesn't help here either way, the HWND which deadlocks doesn't inherit from Control, SystemEvents has its own implementation.

@RussKie
Copy link
Member Author

RussKie commented May 11, 2020

If it deadlocks on a undisposed control, wouldn't we just need to catch those via the finalizer calls?
Or is there something else that you have in mind?

@weltkante
Copy link
Contributor

weltkante commented May 11, 2020

Its a native control (HWND) which is created for message dispatching purposes (not visible) - it doesn't inherit from WinForms Control (SystemEvents doesn't seem to be part of WinForms, had to inspect the assembly in ILSpy to see what it was doing, the source doesn't seem to be indexed on source.dot.net either). The HWND is created the first time someone accesses SystemEvents, it checks the apartment. If its STA it creates the HWND on the current thread, otherwise it creates a separate STA thread.

So if your first invocation of SystemEvents is on a WinFormsFact then SystemEvents will create its broadcast HWND on that thread, but the thread is not going to live until process exit, so any message sent to SystemEvents by anything else than the very first WinFormsFact will hang.

@hughbe
Copy link
Contributor

hughbe commented May 11, 2020

FYI it is implemented here: https://github.com/dotnet/runtime/blob/master/src/libraries/Microsoft.Win32.SystemEvents/src/Microsoft/Win32/SystemEvents.cs

Would a fix be to change the WinFormsFact to be Fact?

@weltkante
Copy link
Contributor

weltkante commented May 11, 2020

FYI it is implemented here

Thanks, the line I'm referring to is here. Uninitialization only happens on AppDomain (on Desktop) or Process shutdown, obviously neither is going to happen when your UI threads finish their test.

Would a fix be to change the WinFormsFact to be Fact?

Probably, but you need to enforce that on every usage of SystemEvents (including indirect usages). I don't know if WinForms itself depends on SystemEvents, but if the first invocation happens indirectly through Application.Run or a Form constructor or something similar you still are flaky since you depend on the "right" test to be executed first.

Running it in RemoteExecutor seems to be the safer thing to do to make this particular test reliable.

@Tanya-Solyanik
Copy link
Member

We had seen this bug in the .NET framework, in Watson crash dumps and from customer service. Here is a small repro in the .Net Framework. Would be good to have it fixed.

using System;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace TwoStaThreads
{
    static class Program
    {
        [STAThread]
        static void Main()
        {
            Thread staThread = new Thread(NonAlwaysPumpingThreadProc)
            {
                IsBackground = true
            };
            staThread.SetApartmentState(ApartmentState.STA);
            staThread.Start();

            Application.Run(new Form
            {
                Text = "Main UI Thread"
            });
        }

        static void NonAlwaysPumpingThreadProc()
        {
            Form f1 = new Form
            {
                Text = "non-pumping thread - close me",
                Size = new System.Drawing.Size(600, 100)
            };

            f1.ShowDialog();

            // at this point we are no longer pumping messages, 
            // but the thread and it's message only marshaling window is alive
            while (true)
            {
                Thread.Sleep(1000);
            }
        }

    }
}

@weltkante
Copy link
Contributor

weltkante commented May 13, 2020

Just for the record, your snippet has a bug, a form that is displayed with ShowDialog needs explicit disposal, you only get automatic disposal when calling Show. That is because after ShowDialog the user usually wants to read out dialog results and if the handles are already gone it may not be possible to read all entered data anymore.

I don't think you can do anything if a STA thread decides to no longer pump messages, this is basically a bug in user code the user has to fix. Any COM call on an STA object created on this thread would hang as well, so this problem is not .NET specific. Its a general problem and the users fault, marking a thread STA means it needs to pump messages or be shut down.

The design issue with SystemEvents is that there is no infrastructure to uninitialize itself when a UI thread shuts down (which is different from stopping to pump messages). It is designed assuming there is only one UI thread and it will live to the end of the process/app domain lifetime.

@JeremyKuhne JeremyKuhne added 🪲 bug Product bug (most likely) and removed test-bug Problem in test source code (most likely) labels May 14, 2020
@JeremyKuhne
Copy link
Member

This is something we've gotten user feedback on (CSS & Watson).

@JeremyKuhne JeremyKuhne added this to the 5.0 milestone May 14, 2020
@RussKie RussKie added the test-bug Problem in test source code (most likely) label May 20, 2020
@Tanya-Solyanik
Copy link
Member

This could be fixed in this test, but it would be even better to re-design how SystemEvents are initialized.

@hughbe
Copy link
Contributor

hughbe commented May 20, 2020

would have to open issue in dotnet/runtime. @weltkante you're an infinitely busy guy but if could open an issue, would be awesome since you know everything!

@weltkante
Copy link
Contributor

weltkante commented May 20, 2020

Created an issue, not sure I can do much more, basically just analyzed the stacktrace and did some guesswork. The cleanest solution would be if thread shutdowns were observable, then it could automatically unregister the HWND before the thread is gone.

[edit] apparently they moved my issue back to winforms, waiting for this to be resolved

@weltkante
Copy link
Contributor

weltkante commented May 30, 2021

This has now been fixed in the runtime for .NET 6

@RussKie
Copy link
Member Author

RussKie commented May 30, 2021

Thank you ;)

@RussKie RussKie closed this as completed May 30, 2021
@ghost ghost removed this from the 6.0 milestone May 30, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Feb 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🪲 bug Product bug (most likely) test-bug Problem in test source code (most likely)
Projects
None yet
Development

No branches or pull requests

6 participants