Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BSOD on cleanup #372

Closed
2 of 4 tasks
sdecugis opened this issue Oct 20, 2016 · 12 comments
Closed
2 of 4 tasks

BSOD on cleanup #372

sdecugis opened this issue Oct 20, 2016 · 12 comments

Comments

@sdecugis
Copy link

sdecugis commented Oct 20, 2016

Environment

  • Windows version: Win 10 anniversary, 64 bits
  • Processor architecture: amd64
  • Dokany version: 1.0.0 release (also tested with a newer build, same effect) -- using the Dbg release for the logs below, but the same happens with the normal release.
  • Library type (Dokany/FUSE): Dokany

Check List

  • I checked my issue doesn't exist yet
  • My issue is valid with mirror default sample and not specific to my user-mode driver implementation
  • I can always reproduce the issue with the provided description below.
  • I have updated Dokany to the latest version (https://github.com/dokan-dev/dokany/releases) and have reboot my computer after.

Description

With the newer Win10 and Windows Defender from this version, we are seeing a BSOD that happens randomly but frequently. We narrowed the sequence to the following (just highlighting the relevant data, the log spreads above 200k lines and contains confidential information):

  • process 1 (my.exe)
==> DokanCreate (my.dll)
FCB and CCB are allocated
==> DokanCompleteCreate : FILE_OPENED, status = STATUS_SUCCESS

from this point, we can see my.dll in the list when DokanGetFCB enumerates the entries, with FCB filecount = 1.

  • process 2 (MsMpEng.exe)
==> DokanCreate (my.dll)
 the log shows the kernel is in DokanGetFCB, and starts enumerating the FCB entries looking for my.dll
  • process 1 (my.exe)
==> DokanCleanup
<== DokanCleanup

The DokanGetFCB enumeration related to MsMpEng.exe continues, and the existing FCB for my.dll is found.

at some point, DokanCompleteCleanup is called but I'm not sure of the exact sequence (many instances in the log).

when other tasks are enumerating entries in DokanGetFCB, we can see my.dll has FCB filecount =2

  • process 2 (MsMpEng.exe)
==> DokanCompleteCreate status STATUS_ACCESS_DENIED. (this is expected, our user callback returns this status).
[DokanFS]    IRP_MJ_CREATE failed. Free CCB:FFFFDA06CF2D2530. Status 0xc0000022
[DokanFS]    Free CCB

when other tasks are enumerating entries in DokanGetFCB, we can see my.dll has FCB filecount = 1

  • process 1 (my.exe)
DokanClose (my.dll) ==> this triggers a breakpoint (under WinDbg) followed by a BSOD as follows:

[DokanFS] ==> DokanClose
[DokanFS]   ProcessId 7788
[DokanFS]   FileName: MY.DLL FCB.FileName: MY.DLL
[DokanFS]    UserContext:53CD178
[DokanFS]    Free CCB:FFFFDA06CF52BBC0
[DokanFS]    Free CCB 
[DokanFS]   Free FCB:FFFFDA06C77FED40
Break instruction exception - code 80000003 (first chance)
*** ERROR: Module load completed but symbols could not be loaded for dokan1.sys
dokan1+0xb123:
fffff800`57b3b123 cc              int     3
5: kd> kP
 # Child-SP          RetAddr           Call Site
00 ffff8b80`fca7a4a8 fffff801`1e6579c6 nt!DbgBreakPointWithStatus
01 ffff8b80`fca7a4b0 fffff801`1e6573b5 nt!KeEnterKernelDebugger+0x20a
02 ffff8b80`fca7a510 fffff801`1e5ce3c4 nt!KeInitializeEnumerationContextFromAffinity+0x8b5
03 ffff8b80`fca7ac20 fffff801`1e5d9429 nt!KeBugCheckEx+0x104
04 ffff8b80`fca7ac60 fffff801`1e5d8bbc nt!setjmpex+0x3f39
05 ffff8b80`fca7ada0 fffff801`1e5d46bd nt!setjmpex+0x36cc
06 ffff8b80`fca7ade0 fffff801`1e4dfa01 nt!_chkstk+0x5d
07 ffff8b80`fca7ae10 fffff801`1e4de7b4 nt!RtlCaptureStackBackTrace+0x1529
08 ffff8b80`fca7b510 fffff801`1e5d9502 nt!RtlCaptureStackBackTrace+0x2dc
09 ffff8b80`fca7bbe0 fffff801`1e5d77bd nt!setjmpex+0x4012
0a ffff8b80`fca7bdc0 fffff801`1e4aaf51 nt!setjmpex+0x22cd
0b ffff8b80`fca7bf50 fffff801`1e56a815 nt!SeAccessCheckFromState+0x2e5
0c ffff8b80`fca7bf80 fffff80c`9cc43027 nt!FsRtlUninitializeOplock+0x1f1
0d ffff8b80`fca7bfe0 fffff80c`9cc4226c dokan1+0x3027
0e ffff8b80`fca7c020 fffff80c`9cc47e2c dokan1+0x226c
0f ffff8b80`fca7c0e0 fffff80c`9cc47c10 dokan1+0x7e2c
10 ffff8b80`fca7c120 fffff80c`9b125206 dokan1+0x7c10
11 ffff8b80`fca7c170 fffff80c`9b123146 FLTMGR!FltIsCallbackDataDirty+0x316
12 ffff8b80`fca7c200 fffff801`1e8e6e1d FLTMGR!FltDecodeParameters+0x3d6
13 ffff8b80`fca7c260 fffff801`1e8e2298 nt!NtWriteFile+0xcdd
14 ffff8b80`fca7c2e0 fffff801`1e533681 nt!ObOpenObjectByNameEx+0x21e8
15 ffff8b80`fca7c340 fffff801`1e9005b5 nt!ObfDereferenceObject+0xa1
16 ffff8b80`fca7c380 fffff801`1e5849a6 nt!FsRtlGetFileSize+0x13c9
17 ffff8b80`fca7c3c0 fffff801`1e5f92e5 nt!FsRtlGetVirtualDiskNestingLevel+0x196
18 ffff8b80`fca7c460 fffff801`1e4e6dd9 nt!memset+0x1ee65
19 ffff8b80`fca7c4b0 fffff801`1e8bf8a3 nt!RtlAvlRemoveNode+0xaf9
1a ffff8b80`fca7c5e0 fffff801`1e8bf687 nt!MmMapViewOfSection+0x107b
1b ffff8b80`fca7c650 fffff801`1e905a90 nt!MmMapViewOfSection+0xe5f
1c ffff8b80`fca7c6d0 fffff801`1e96a24a nt!KeUserModeCallback+0x9a0
1d ffff8b80`fca7c810 fffff801`1e4dc632 nt!FsRtlKernelFsControlFile+0x124e
1e ffff8b80`fca7c840 fffff801`1e5d1940 nt!KiCheckForKernelApcDelivery+0x342
1f ffff8b80`fca7c8d0 fffff801`1e5d903a nt!KeSynchronizeExecution+0x2790
20 ffff8b80`fca7ca10 00007ff8`302c8624 nt!setjmpex+0x3b4a
21 00000000`001ce348 00000000`72ae74ce ntdll+0xa8624
22 00000000`001ce350 00000000`72ae6ec5 wow64!Wow64SystemServiceEx+0x77e
23 00000000`001ce3e0 00000000`72a51cf7 wow64!Wow64SystemServiceEx+0x175
24 00000000`001ceca0 00000000`72afbfa1 wow64cpu!TurboDispatchJumpAddressEnd+0xb
25 00000000`001ced50 00000000`72aecbb0 wow64!Wow64KiUserCallbackDispatcher+0x4151
26 00000000`001cedd0 00007ff8`30298ebd wow64!Wow64LdrpInitialize+0x120
27 00000000`001cf080 00007ff8`30298d5e ntdll+0x78ebd
28 00000000`001cf100 00000000`00000000 ntdll+0x78d5e

In user-space library, the log shows that the last callback that was involved was Cleanup.

Note:

  • if we disable MsMpEng.exe
  • or, if we allow it to open the file successfully (instead of responding access denied) in our callback, this BSOD does not happen.

Logs

Minidump generated after the BSOD attached.

102016-74718-01.zip

@Liryna
Copy link
Member

Liryna commented Oct 20, 2016

Hi @sdecugis ,

This seems to be the same as #344
Still the FsRtlUninitializeOplock that seems the crash. The OP of the other issue didn't gave much information about how he fixed it 😢

So it does not offen happen ?
I dream a little but, it would be really great to have a sample to reproduce this or to even retrace this call and see if it oplock object really passed the FsRtlInitializeOplock

@sdecugis
Copy link
Author

Hi @Liryna , indeed the crash happens within FsRtlUninitializeOplock in both case, but from a different call path (in #344 it seems to be at the time of file creation while in my case it is when the file is cleanup). It could be the same root cause still.

Since it is kernel memory being corrupted, I really have no access to that memory so I am not sure I could do anything in my callbacks to improve the situation. It looks a kernel driver issue to me, do you agree ?

The issue happens very often with some of our sequences of using the virtual filesystem (almost never runs to completion), while some other relatively similar sequence never hit it... I was not able to clearly qualify what makes it happen more or less often.

We have one workaround at the moment of stopping MsMpEng.exe but that s far from ideal...

I'll write to you separately concerning your dream :D

@Liryna
Copy link
Member

Liryna commented Oct 20, 2016

@sdecugis I agree it is a kernel issue that has to be fixed ofc.

I am just wondering, can you test with the RC4 to see if it also happen ?

@sdecugis
Copy link
Author

I ll try to get this tested tomorrow as well, not sure when I can get feedback...

@Liryna Liryna added this to the 1.0.1 milestone Oct 23, 2016
@sdecugis
Copy link
Author

Hi,

A couple of additional information:

  • yes it can be reproduced with RC4
  • I can reproduce it sometimes with the attached modified mirror, and executing a "test.exe" in the mirror FS that loads/unloads many libraries (for example a game exiting or similar). -- the BSOD happens when the test.exe is quitting usually.
    mirror.c.txt

@Liryna Liryna modified the milestones: 1.0.2, 1.0.1 Nov 2, 2016
@Liryna
Copy link
Member

Liryna commented Nov 26, 2016

A fixed has been proposed here https://github.com/dokan-dev/dokany/tree/back-out-oplock and waiting test result.

@velislav87
Copy link

Hi guys, seems that the BSOD described here is similar to my reported one(#344). Since we did fix it(unintentionally unfortunately) I can speculate on what caused the BSOD - in my opinion our application was causing it by holding the CreateFile or FindFilesWithPattern method calls too long, and it seems that using multiple drive instances exaggerated the BSOD occurrence.

If you need any further logs I can try to retrieve them, unfortunately it is difficult because the BSOD corrupts the WinDbg program as well as any debug info it collected.

@Liryna
Copy link
Member

Liryna commented Jan 5, 2017

Hi @velislav87 ,

Humm what you describe is different from what we have been to detect during our test. Maybe your BSOD was different in your case ? Only a dump analyse can answer to this 😢

@velislav87
Copy link

Hi @Liryna ,

do you have any idea on how to go about analyzing the issue further? I can collect, analyze and upload the minidump file from the crash, however any debug info gathered by WinDbg is lost after the BSOD. Will the information in the minidump be enough for the investigation?

@Liryna
Copy link
Member

Liryna commented Jan 5, 2017

You can analyse the minidump on the machin after the bsod/reboot and post the report.
https://github.com/dokan-dev/dokany/wiki/How-to-Debug-Dokan#crash-report-bsod

@Liryna
Copy link
Member

Liryna commented Feb 26, 2017

We still need some feedback of people facing the issue here before merging the fix .

@Liryna
Copy link
Member

Liryna commented May 10, 2019

IRP MJ CREATE cancel is now implemented ea6c3fd

@Liryna Liryna closed this as completed May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants