Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mailsync leaks handles #388

Closed
c-lan opened this issue Dec 3, 2017 · 13 comments
Closed

Mailsync leaks handles #388

c-lan opened this issue Dec 3, 2017 · 13 comments

Comments

@c-lan
Copy link

c-lan commented Dec 3, 2017

Mailsync processes running in the background leak massive amounts of OS handles. All of the leaked handles are handles to events that were never signaled.

Basically all operations on the account (including sync) increase the handle count by amounts varying from 1 to over 100k. Simply opening a message that Mailspring needs to download creates >1k of handles.
Log file is available here

Over time this leads to millions of open handles, which causes excessive memory usage and slight reduction of performance when opening unread messages. OS functionality is degraded - after certain number of handles, the NtQuerySystemInformation(SystemHandleInformation) API for querying all handles open in the OS starts to fail and debugging/security/anti-cheat tools cease to work correctly.

I encountered this issue with two Gmail accounts I tested.

Are there any related issues?

No

What operating system are you using?

Windows 10 x64 version 16299

What version of Mailspring are you using?

1.0.9

--

Bug?

Do you have any third-party plugins installed? If so, which ones?

No

Is the issue related to a specific email provider (Gmail, Exchange, etc.)?

Can't determine, all my accounts are on Gmail

Is the issue reproducible with a particular attachment, message, signature, etc?

Any message, any Gmail account

@bengotow
Copy link
Collaborator

bengotow commented Dec 3, 2017

Hey! Thanks for the detailed report, this is really interesting. Let me see if this is also happening on Mac and Linux, it's definitely odd that it'd be windows-specific.

@bengotow
Copy link
Collaborator

bengotow commented Dec 4, 2017

Quick update: I've been able to reproduce this. Inspecting the mailsync process in windbg shows that all of these leaked event handles are created here:

0x00007ffa47cc59d4: ntdll!NtCreateEvent+0x0000000000000014
0x00000000538146e2: wow64!whNtCreateEvent+0x0000000000000062
0x0000000053816245: wow64!Wow64SystemServiceEx+0x0000000000000155
0x0000000053801c87: wow64cpu!ServiceNoTurbo+0x000000000000000b
0x000000005381bdb2: wow64!RunCpuSimulation+0x0000000000000022
0x000000005381bcc0: wow64!Wow64LdrpInitialize+0x0000000000000120
0x00007ffa47c96e91: ntdll!_LdrpInitialize+0x00000000000000dd
0x00007ffa47c96d5e: ntdll!LdrInitializeThunk+0x000000000000000e
0x0000000076f8707c: ntdll_76f10000!NtCreateEvent+0x000000000000000c
0x00000000767fccd6: KERNELBASE!CreateEventExW+0x0000000000000066
0x00000000767fcc33: KERNELBASE!CreateEventA+0x0000000000000033
0x00000000658a28c1: pthreadVC2!pthread_timechange_handler_np+0x00000000000015c4
--------------------------------------
Handle = 0x00000000000ef940 - OPEN
Thread ID = 0x0000000000000f04, Process ID = 0x00000000000007f0

It looks like this particular event is broadcast to condition_variables (which mailsync uses pretty extensively) so they can evaluate whether they should wake (https://www.sourceware.org/pthreads-win32/manual/pthread_timechange_handler_np.html).

It's unclear why these aren't getting cleaned up, and also unclear why pthread_timechange_handler_np is being called so frequently. It's docs say "To improve tolerance against operator or time service initiated system clock changes... this routine can be called by an application when it receives a WM_TIMECHANGE message from the system.", which doesn't sound like something that should be happening a zillion times.

@bengotow
Copy link
Collaborator

bengotow commented Dec 4, 2017

Interesting - this page notes that virtual machine time sync can do this (https://www.greyware.com/kb/KB2015.401.asp: "Another possible cause of unexpected clock change are time sync features of virtual machines. Be sure that the VMWare Tools/Hyper-V Integration Services Time Sync features are turned off for your VMs.")

Any chance you're running Mailspring on Windows inside a VM? (I'm using VMWare Fusion)

@c-lan
Copy link
Author

c-lan commented Dec 4, 2017

No, but I have Hyper-V enabled. It always runs a hypervisor underneath my host system, but as far as I know no time sync features touch the host machine.

@bengotow
Copy link
Collaborator

bengotow commented Dec 4, 2017

Hey! Thanks for the quick reply. In my VM, I disabled Time Synchronization from the virtual machine settings and it didn't have any effect, but disabling Hyper-V in Windows (by typing bcdedit /set hypervisorlaunchtype off on an elevated command prompt) fixed the issue.

Definitely need to find a workaround for this (since Hyper-V is enabled by default I think...) but this should narrow it down a lot!

@c-lan
Copy link
Author

c-lan commented Dec 4, 2017

I just tried disabling Hyper-V according your suggestion and was still able to reproduce the bug.

@c-lan
Copy link
Author

c-lan commented Dec 4, 2017

Also, your htrace stacktrace is a bit misleading. pthreadVC2!pthread_timechange_handler_np+0x00000000000015c4 is in fact inside pthread_mutex_init exported function.

Edit: after a quick look I think the handles are created by the constructor named mailcore::AutoreleasePool::AutoreleasePool(void). The corresponding destructor function does not directly nor indirectly call pthread_mutex_destroy.

Edit2: not sure if this is the matching source, but could this be the culprit? The destructor does not free the mutex.
https://github.com/MailCore/mailcore2/blob/f708ce74e23b61ec6e5ae958eba0b8bcd8831a1e/src/core/basetypes/MCObject.cpp#L43

@c-lan
Copy link
Author

c-lan commented Dec 4, 2017

Not sure if github sends notifications about comment edits, so let me ping: @bengotow

@bengotow
Copy link
Collaborator

bengotow commented Dec 4, 2017

Hey! Ahh good catch—that MCObject lock is definitely suspicious. Just read it over and it seems like there should be a call to pthread_mutex_destroy(&mLock); in the object destructor. I'm going to insert it and see if things look better.

@bengotow
Copy link
Collaborator

bengotow commented Dec 4, 2017

I recompiled the app with that additional line and it 1) doesn't crash and 2) prevents the handle count from increasing indefinitely. (I also turned Hyper-V back on and confirmed that I see the 20k+ open handles without the change.)

I think that's a wrap! Really glad we got to the bottom of this. We've had some reports of Mailspring causing Windows to "hang for an extended period of time" when the computer wakes from sleep, and I think it was caused by this issue (having a zillion leaked handles around when WM_TIMECHANGE is emitted).

I'm gonna PR this change into Mailcore2 and we'll see if @dinhviethoa has any thoughts.

@dinhvh
Copy link

dinhvh commented Dec 4, 2017

Go ahead and send this PR! Thanks a lot!

@bengotow
Copy link
Collaborator

bengotow commented Dec 8, 2017

Hey! The fix for this has shipped in 1.0.10 - thanks for the help getting this narrowed down and fixed quickly.

@bengotow bengotow closed this as completed Dec 8, 2017
@foundry376-bot
Copy link

This issue has been mentioned on Mailspring Community. There might be relevant details there:

https://community.getmailspring.com/t/event-handle-leak-in-mailsync-exe-on-windows/1022/1

@Foundry376 Foundry376 locked and limited conversation to collaborators Mar 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants