Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HwndWrapper leaks class registrations (ATOMs returned by RegisterClassEx) #9026

Open
bbmmcodegirl opened this issue Apr 13, 2024 · 15 comments
Open
Labels
Investigate Requires further investigation by the WPF team.

Comments

@bbmmcodegirl
Copy link

bbmmcodegirl commented Apr 13, 2024

Description

The implementation of MS.Win32.HwndWrapper leaks Win32 Windows class registrations.

MS.Win32.HwndWrapper in its constructor calls the Win32 API RegisterClassEx function (and then creates a window with it using CreateWindowEx).

It holds unmanaged resource, so it also implements the Dispose pattern, and in its internal Dispose(bool, bool) it will schedule the corresponding UnregisterClass function. But it will only do that if the second parameter to Dispose(bool, bool), called isHwndBeingDestroyed, is true. And Dispose(bool, bool) is called in two locations, the ordinary Dispose() method that implements IDisposable, and in the finalizer. But in both calls, the second parameter is false. isHwndBeingDestroyed is never true. The class will never be unregistered.

As it happens, the ATOM returned by RegisterClassEx is is from the User Atom Table for which there is no useful API, so it is not possible to inspect this atom table or maybe search for old class registrations lying around. It also appears to be the case that these class registrations persist even when the process exits. This means that a WPF application that creates a HwndWrapper anywhere slowly eats up the available Windows class registrations. And only 16,384 are available in total. If an application unwittingly manages to exhaust them, this renders the operation system unusable. No window can be created anymore, not a Task Manager to shut down the process, not a Shutdown dialog with the option to shut down or restart the machine. This is fatal.

The User Atom Table is not really documented (or I haven't found the documentation), so it is not clear to me if it gets cleared when the user logs out, or only when the system restarts. It appears to definitely not get cleared when the process exits, judging from the effect it can have. From the name 'UserAtomTable' it sounds like it should be cleared when logging out.

Reproduction Steps

Warning. If you want to reproduce this, be prepared to restart your operating system afterwards, or at least log off. Have a command window open already to do that. Best try this only in a VM.

HwndWrapper is used in various parts of WPF. Here is a code snippet that demonstrates the effect, I have tried it:

class MyViewModel
{
    private Dispatcher _dispatcher;
    public MyViewModel()
    {
        _dispatcher = Dispatcher.CurrentDispatcher;
        Restart();
    }

    /// .. some properties here to display something while running ....

    private void Restart()
    {
        var thread = new Thread(MakeDispatcher);
        thread.Start();
    }
  
    private void MakeDispatcher()
    {
        try
        {
            var dispatcher = Dispatcher.CurrentDispatcher;

            int threadId = dispatcher.Thread.ManagedThreadId;
            _dispatcher.Invoke(() =>
            {
               /// note this happens on the UI thread. This is not relevant to the example, is just here to show how to make this display someting.
                // update some properties here
            });
            Restart();
        }
        catch (Win32Exception e)
        {
            _dispatcher.Invoke(() =>
            {
                Exception = $"{e.Message} ({e.NativeErrorCode})";
            });
         }
    }
}

When this comes to its Win32 exception, the system's class atom's are exhausted, and you better restart the machine (or log out). At the very least, close some windows.

You may ask what the real-world scenario is here, so why one should deliberately call Dispatcher.CurrentDispatcher on lots of different threads.

  • In fact, our production code did not call Dispatcher.CurrentDispatcher but CommandManager.InvalidateRequerySuggested() which internally calls Dispatcher.CurrentDispatcher.
  • We were on lots of different threads because some code wanted to update an UI button and forgot it was not on the UI thread but a random thread from a background process that it was being called on.

Expected behavior

Expected behavior would be to not to leak any class handles. Since class handles are invisible and the User Atom Table cannot be inspected, there is no obvious symptom at all except that the system will be unresponsive all of a sudden:

  • If one WPF application that accidentally does this runs for a long time, or
  • If the system is not restarted for a long time, and one WPF application that accidentally does this gets used a lot, even if it is closed down in between.

Actual behavior

The actual behavior is described above. If a WPF application manages to exhaust the available class registrations, it will get the error "Not enough memory resources are available to process this command" (Native error 8). But in addition, this makes the operation system completely unusable. No window can be created anymore. Not even the Task Manager, nor the Performance Monitor, nor the Shutdown Dialog can be created unless some other window is closed. If a user doesn't have the wits to control-tab to some other window, close it, and then start an orderly shutdown or logout, all they can do is hard-reset the system, i.e. use the power button to switch the computer off.

Regression?

From my code inspections, this is the case in .Net Framework 4.7.2 as well as the current code from dotnet core, and for sure all other versions in between.

Known Workarounds

If the depletion of windows class handles is very rapid, it will usually be because of an error. Normally it should not be necessary to use Dispatcher.CurrentDispatcher on lots of different threads. It should not even be necessary to create many threads (new Thread()) in the first place. Even if someone makes a mistake and calls Dispatcher.CurrentDispatcher on a background thread, if this thread is from the threadpool (Task.Run()), the number of lost class handles will be limited to at most the number of threadpool threads.

But there is no working around that the handles stay leaked even when the process exits, and so the only workaround is to restart the system occasionally.

Impact

Here is a story of some of the impact this had.

https://stackoverflow.com/questions/78319645/long-running-process-with-wpf-front-end-on-a-windows-ipc-fails-with-not-enough

On an actual machine out there in the real world, this failed a company's production process in progress, made some of their products unusable, and harmed the reputation of the manufacturer of the machine.

Configuration

As stated above, this code seems to be the same from at least .Net Framework 4.7.2 to now.
Operating system will be Windows, and it is a matter of the Win32 API, which underpins all versions of Windows AFAIK. any architecture. "Not enough memory resources are available to process this command" is an error of the Win32 API when RegisterClassEx fails. The problem will be happening until either WPF starts cleaning up the class registrations correctly, or Win 32 will start cleaning them up when a process exits.

Other information

https://referencesource.microsoft.com/#q=HwndWrapper

https://github.com/dotnet/wpf/blob/main/src/Microsoft.DotNet.Wpf/src/Shared/MS/Win32/HwndWrapper.cs

@miloush
Copy link
Contributor

miloush commented Apr 13, 2024

Please provide a repro project. I have WPF apps running for weeks and months without this issue, so it's not as simple as that, probably depends on what it is doing.

isHwndBeingDestroyed is never true. The class will never be unregistered.

if (message == WindowMessage.WM_NCDESTROY)
{
Dispose(/*disposing = */ true,
/*isHwndBeingDestroyed = */ true);

@bbmmcodegirl
Copy link
Author

bbmmcodegirl commented Apr 14, 2024

ClassRegistrationLeak.zip
Hi @miloush , thanks, I corrected myself. It's not happening with "any WPF app", but an app that is written in a certain way. I provided a codes snippet, and it's part of a runnable project I have actually made and that exhibited the problem for me. Attached.
Thanks for the code snippet. I cannot inspect directly what is happening in .Net as it runs. I cannot track the registered windows classes because AFAIK there is no API for it. I only conclude that they can't possibly all be unregistered because I have seen the effect happen.

@miloush
Copy link
Contributor

miloush commented Apr 14, 2024

Thanks @bbmmcodegirl! From my initial look the HwndWrapper is being diposed fine which tries to enqueue the DestroyWindow method on the dispatcher which would in turn unregister the class. However, it seems that posting a message to the window to process the queue is failing.

// Request{Foreground|Background}Processing can encounter failures from an
// underlying OS method. We cannot recover from these failures - they are
// typically due to the application flooding the message queue, or starving
// the dispatcher's message pump until the queue floods, and thus outside
// the control of WPF. WPF ignores the failures, but that can leave the
// dispatcher in a non-responsive state, waiting for a message that will
// never arrive. It's difficult or impossible to determine whether
// this has happened from a crash dump.

It feels that all things are the way they are supposed to, but the dispatcher never gets to dispatch the DestroyWindow operation. I am suspecting this might be because you simply terminate the thread without properly telling the dispatcher. Calling Dispatcher.CurrentDispatcher.InvokeShutdown() at the end of it (i.e. either at the end of Restart() or MakeDispatcher() method seems to resolve the issue, at least I don't see the number of atoms going up anymore.

There is some way to access the table, see e.g. https://github.com/JordiCorbilla/atom-table-monitor

@dipeshmsft dipeshmsft added the Investigate Requires further investigation by the WPF team. label Apr 15, 2024
@bbmmcodegirl
Copy link
Author

bbmmcodegirl commented Apr 15, 2024

Hi @miloush (Jan). Thanks for confirming that something appears not to be working as it should.
"because you simply terminate the thread without properly telling the dispatcher." In this contrived example application, I of course could call Dispatcher.CurrentDispatcher.InvokeShutDown().

But we have run into this in the real world, I only made this example to show how it's happening. In the real world, we are calling CommandManager.InvalidateRequerySuggested(), and we are doing that in a callback. The callback has no clue at all what thread it is being called on, and if it is the last piece of code that needs to execute on that thread. In fact, it is not. In the real-world example, the thread originates in a library that the programmer of the callback does not even have direct access to, and after all callbacks have been called, control returns to that library, and that's where the thread finally ends. The library cannot invoke the shutdown of the Dispatcher, it does not even have a dependency on WPF.

Also in the real world, we have, of course, now implemented a different workaround where we simply don't call CommandManager.InvalidateRequerySuggested() on the thread we are being called on, but on the UI thread of the application.

But this problem has caused us and our customers problems until we found out about this. We ran into this problem unwittingly. And so might someone else.

"There is some way to access the table, see e.g. https://github.com/JordiCorbilla/atom-table-monitor".
Thanks, yes I had seen and even downloaded that repo. But it doesn't contain a program that produces the fancy screenshots posted in the wiki for that repository. It only contains Pascal/Delphi code for an 'AtomScanner' service. All that service does is scan two particular Atom tables that do have a Win32 API, calling two API functions and write their contents into text files. The Win32 API functions it uses are GetClipboardFormatName and GlobalGetAtomName. And contrary to what the screenshots seem to suggest, neither of those contain the register/unregister window methods being called, or the User Atom table. So the Atom Table Monitor is not, at this stage, of any use for this problem.

More to the point, the Task Manager has a column User objects, and those also do not appear to count the class registrations. At least I do not see this count increasing when I know that the application is leaking class registrations. Please, if you are from Microsoft, can you find someone from the Win32 API team who knows how to query and enumerate the current window class registrations?

@miloush
Copy link
Contributor

miloush commented Apr 15, 2024

Well if you are the one who creates a dispatcher then it seems fair you should be the one to shut it down. I am not quite sure what you are hoping WPF would do here else other than offer you a way to unregister the class. Can you create your own thread with dispatcher and then shut it down when you are done? Or keep one dispatcher for yourself that you would use to invoke the code you need.

But it doesn't contain a program that produces the fancy screenshots posted in the wiki for that repository

The releases of the repo has a zip file which has ATOMTableMonitor.exe. It does list the HwndWrapper registrations as RWM.

@czdietrich
Copy link

czdietrich commented Apr 15, 2024

Wow, stumbling upon this issue shocked me for some degree, because I wanted to create the very exact issue last Friday, but because of time shortage I postponed it to today and realized someone created it already.

Last week we encountered the exact same problem. We had some FileSystemWatcher or other event source in our code that in the end triggered CommandManager.InvalidateRequerySuggested() on a background thread.
None of our test scenarios has shown any issues but then we received crash reports containing the error "Not enough memory resources are available to process this command" when customers had run our application for a longer period of time.
Since we've already had a similar issue in the past I knew about the cause of the issue and that it can be kind of debugged with the ATOMTableMonitor.exe.

The first time we stumbled upon that problem was when we tried to offload image loading to background threads to make the UI thread more responsive. If again the application runs for a long enough amount of time the atom table will be exhausted which then also destabilizes the whole system.

Well if you are the one who creates a dispatcher then it seems fair you should be the one to shut it down.

I kind of have to disagree on that. Most WPF APIs do not hint in any way that a Dispatcher is created which especially will leak system resources, just to name a few: CommandManager.InvalidateRequerySuggested(), new BitmapImage(), new RectangleGeometry().

In my opinion the fail in design comes from Dispatcher.CurrentDispatcher to secretly create a new Dispatcher when there is not already one created before for that thread. It would have been better to enforce developers to explicitly create an instance before allowing the usage of these WPF objects, then it would be much clearer to also explicitly shut down the Dispatcher.

Some idea to tackle this problem could be to disallow the implicit creation of Dispatcher objects, but that's probably a not so ideal option, since it's a change in behavior and therefore would need to be placed behind an app-switch.
A maybe better option would be to add a static event handler for when a new Dispatcher is about to be created. So a developer can manually handle these cases and for example log this event or explicitly crash the application if just unintended.

We should also consider to move the creation of the MessageOnlyHwndWrapper to PushFrame or alike to only create it when the message queue is actually processed to not accidentally leak it when not even handled.

@miloush
Copy link
Contributor

miloush commented Apr 15, 2024

In my opinion the fail in design comes from Dispatcher.CurrentDispatcher to secretly create a new Dispatcher when there is not already one created before for that thread.

Dispatcher.FromThread returns null if there isn't any. Dispatcher.CurrentDispatcher checks that and if there isn't any, creates one, that seems the main purpose of it.

It could have been called EnsureDispatcher() and nothing would change. If someone is at fail, it would be the callsites.

CommandManager.InvalidateRequerySuggested checks for the dispatcher being null, which is odd, since it can never be null. Perhaps the intent here indeed was to use Dispatcher.FromThread. Then InvalidateRequery wouldn't do anything if there isn't any dispatcher, you would have to create your own dispatcher. But even creating the dispatcher works fine, the issue - and so far it appears to me to be the reason of the leak - is the unexpected termination of the dispatcher thread.

BitmapImage and RectangleGeometry are DispatcherObjects, documented as "an object that is associated with a Dispatcher". It is hardly a surprise that they require a Dispatcher.

We should also consider to move the creation of the MessageOnlyHwndWrapper to PushFrame or alike to only create it when the message queue is actually processed to not accidentally leak it when not even handled.

Note that this will not solve the reported issue, because the CommandManager posts an operation to the dispatcher and relies on the message to the hwnd wrapper to process the queue. And it will not fix the "leak" anyway if you push frame and then terminate the thread.

I am not saying (yet) the situation cannot be improved, but I don't see any obvious improvement opportunities without breaking existing code.

@bbmmcodegirl
Copy link
Author

bbmmcodegirl commented Apr 15, 2024

@czdietrich thanks for posting your problem with the same issue.

@miloush "the issue - and so far it appears to me to be the reason of the leak - is the unexpected termination of the dispatcher thread." On one hand, you are right, on the other hand, in my case for example, the creator of the thread lies somewhere buried in a legacy library that was used. And this is not the fault of the (current) implementer. At the time of creation the thread is a complete background thread, it's not a dispatcher thread. It only becomes one because an unwitting developer used an innocuous-looking WPF method in it, not even on their own fully aware that (CommandManager.InvalidateRequeryRequested() happened inside a RaiseCanExecuteChanged().

Nobody is "shutting down the thread", the thread happens to accidentally end because at the place where it was created, in the library, the thread function ended. If it had been a Task instead of a Thread, the thread would have survived the end of the function.

I agree that it would probably be hard to fix that without breaking existing code. I go with czdietrich's suggestion, at some point there could be an event raised. Maybe even an exception thrown. Imagine the signature of the method was CommandManager.InvalidateRequerySuggested(bool throwIfDispatcherMissing = false). Or if it was decorated with an attribute, something like RequireDispatcher. Or yes, a static event that gets triggered when a dispatcher is created. In particular it strikes me as wrong that Dispatcher does not implement IDisposable when it has a field that does. This would of course only help when people were directly writing using Dispatcher.CurrentDispatcher , because then the Dispose() would probably happen in the same thread. It would not help when a Dispatcher gets created under the hood like in CommandManager.InvalidateRequeryRequested(). There is just no clue to the developer here that this is happening. There is no documentation to that end.

At the very least there should be a more explicit warning about explicitly created Threads in the documentation than just the very weak statement "In .NET Framework 4, the TPL is the preferred way to write multithreaded and parallel code." in https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/task-parallel-library-tpl.

@bbmmcodegirl
Copy link
Author

.. a dumb question on the side. Why does the UnregisterClass have to happen on the same thread that the class was registered on?

@czdietrich
Copy link

czdietrich commented Apr 15, 2024

Dispatcher.FromThread returns null if there isn't any. Dispatcher.CurrentDispatcher checks that and if there isn't any, creates one, that seems the main purpose of it.

It could have been called EnsureDispatcher() and nothing would change. If someone is at fail, it would be the callsites.

CommandManager.InvalidateRequerySuggested checks for the dispatcher being null, which is odd, since it can never be null. Perhaps the intent here indeed was to use Dispatcher.FromThread. Then InvalidateRequery wouldn't do anything if there isn't any dispatcher, you would have to create your own dispatcher. But even creating the dispatcher works fine, the issue - and so far it appears to me to be the reason of the leak - is the unexpected termination of the dispatcher thread.

Still I think the decision to make Dispatcher.CurrentDispatcher to silently create a new instance is not ideal and instead should through an exception if Dispatcher.Run(), Dispatcher.EnsureDispatcher(), etc. wasn't called before. But we won't change this anyway.

But even creating the dispatcher works fine, the issue - and so far it appears to me to be the reason of the leak - is the unexpected termination of the dispatcher thread.

I think the following happens for us:
Some starting event -> offload to a thread pool thread using Task.Run -> then e.g creating a new BitmapImage and freeze it (Dispatcher.Run() is never invoked) -> Dispatcher and MessageOnlyHwndWrapper are no longer referenced and will be garbage collected at some point -> the finalizer of MessageOnlyHwndWrapper is invoked and tries to queue the destruction on the Dispatcher -> but since the Dispatcher never processed window events, that event will also be garbage collected at some point and the class registration is leaked.

Note that this will not solve the reported issue, because the CommandManager posts an operation to the dispatcher and relies on the message to the hwnd wrapper to process the queue.

This is ok from my side, calling CommandManager.InvalidateRequeryRequested from a background thread was unintended anyway, that's the reason I would prefer an exception or an event handler to detect this unintended behavior.

And it will not fix the "leak" anyway if you push frame and then terminate the thread.

I'm not sure about this, but maybe we point to different things here.
I just wanted to point out that creating the MessageOnlyHwndWrapper only when PushFrame/ Run is invoked would be the counter operation to the destruction of the MessageOnlyHwndWrapper, when the last dispatcher frame is exited, which is already the case with the current code.
Therefore no invocation is needed because the code already runs on the correct thread at that point and though can just destroy it directly.

@czdietrich
Copy link

.. a dumb question on the side. Why does the UnregisterClass have to happen on the same thread that the class was registered on?

I would guess that UnregisterClass is not thread affine and can be called on any thread.
But you need also to destroy the MessageOnlyHwndWrapper window using DestroyWindow which must be called on the same thread the window was created.

https://learn.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-destroywindow#remarks

@miloush
Copy link
Contributor

miloush commented Apr 15, 2024

In general I would think it is "not polite" to create a dispatcher on thread that doesn't belong to you. I am sure there are other ways the thread owner can "screw up" the dispatcher than just exiting the thread.

I understand that the issue is that people are generally not aware that something creates a dispatcher. There isn't that many CurrentDispatcher calls, maybe we could review them to see if they are justified.

It sounds like if CommandManager threw if not called from the dispatcher thread would have avoided the problems in both your cases. But then again had the thread you called it on not unexpectedly exited, it wouldn't be an issue to have a dispatcher created. This is one of the issues trying to fix this. Whatever mechanism we come up with to warn people that dispatcher is created when they might be on a background thread, it's not really a problem until the thread exits or is terminated without shutting down the dispatcher. There is no thread events that the dispatcher could use to dispose itself.

@czdietrich
Copy link

BitmapImage and RectangleGeometry are DispatcherObjects, documented as "an object that is associated with a Dispatcher". It is hardly a surprise that they require a Dispatcher.

I agree, but the issue is not really that they require a Dispatcher (this is just fine) but that they silently create/instantiate it. Which wouldn't be a problem either if this would not leak Windows resources by default. Which is fairly easy to be overlooked.

There are even some more things to consider here:
You need to know that you must call Dispatcher.InvokeShutdown() even though you did not instantiated it (directly). But then you cannot just call Dispatcher.InvokeShutdown() after e.g. using BitmapImage, because at that point you don't know whether your code created the Dispatcher or someone else before that still needs it. And even when you are sure that you must call Dispatcher.InvokeShutdown() in the end, it cannot be safely used together with thread pool threads, because the Dispatcher object is cached and another thread pool task may run on the same thread, but then you get a Dispatcher that cannot be run again, since it is already shut down.

@bbmmcodegirl
Copy link
Author

@czdietrich wrote: "offload to a thread pool thread using Task.Run() ... -> the finalizer of MessageOnlyHwndWrapper is invoked and tries to queue the destruction on the Dispatcher -> but since the Dispatcher never processed window events, that event will also be garbage collected at some point and the class registration is leaked."

@miloush says "it's not really a problem until the thread exits or is terminated"

But in the case described by @czdietrich, the call chain was invoked with Task.Run(), and the threadpool thread would still be running. Who is right? Can we even leak resources in the same way when the parent call was made on a threadpool thread?

@miloush
Copy link
Contributor

miloush commented Apr 16, 2024

To clarify, I am attributing the post message failure to the thread being gone, because the other option (of message flooding) seems unlikely at this point. I didn't see the actual error code.

I would also assume that the task scheduler has freedom to manage threads as it sees fit. If it keeps it alive at the time of GC of the dispatcher, it either should work or the cause is different (but not any more helpful). If it decides to span new threads we are back at the original scenario.

Either way, the fix is to not create dispatchers on threads you don't control. I believe we are in agreement here with @czdietrich and that the problem is in dispatchers being created without developers suspecting it. Unfortunately it seems the best we can do is document the cases somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Investigate Requires further investigation by the WPF team.
Projects
None yet
Development

No branches or pull requests

4 participants