Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BSODs after DokanMounter service restart #26

Closed
don-Pardon opened this issue Jun 24, 2015 · 51 comments
Closed

BSODs after DokanMounter service restart #26

don-Pardon opened this issue Jun 24, 2015 · 51 comments
Labels

Comments

@don-Pardon
Copy link

Hi guys!
I've been investigating a BSOD issue in dokanx fork, than I found out about dokany fork, checked if that issue is reproduced here and it does, so I thinks you guys should know about it too.
Here is the summary:
When you mount your fs on some drive letter, restart DokanMounter service and then you try to kill that fs-app, the BSOD occurs. Here is full description (many letters): BenjaminKim/dokanx#47
Hope to get any comments and/or suggestions.
Thanks.

@Liryna
Copy link
Member

Liryna commented Jun 25, 2015

Hi !
I already had such BSOD, the device is not unmount and nobody controle it so the system become unstable until the os crash.

I think the service should force unmount when killed/restart and also contact mirrorfs to forward that the device is killed.

We should even probably add a ping system from the service to mirrorfs in case that something goes wrong for one of them.

It is only suggestion because I do not know if the driver have such security already but it seems not.

(I hope that I have properly understand your ticket)

@don-Pardon
Copy link
Author

Yep, you get it right.
I've also thought about force unmount when service is being restarted, it would solve the problem. Moreover, is there any use-case for DokanMount service being restarted and mounted FS being run?
On the other hand service restart doesn't affect mounted FS, it is still mounted and is working properly, so why turn it off?..

@Liryna
Copy link
Member

Liryna commented Jun 25, 2015

I think we should see the service as a guard. If someone ask him to restart or stop he should clean everything before leaving.

On the other hand service restart doesn't affect mounted FS, it is still mounted and is working properly, so why turn it off?..

I have never tested such case 😃 but as there will be nobody to clean the device...we will get a BSOD. Better to give the possibility to the mirror for handling it that break the OS.

For me if the service restart, it mean that something goes wrong or someone that have forget to clean his running "mirror" properly and none of this reason are normal use-case.

@don-Pardon
Copy link
Author

Alright, I agree, cleaning is a preferable solution.
The other problem is service being killed or when service simply crashes. The ping system sounds good (and I didn't found anything similar in dokan yet), but we need to take into account that FS can perform some heavy operations - like downloading some chunk of data for ReadFile request, so FS will be mute for ping requests for some time.

I have never tested such case

You can try =) but don't forget to do dokan_control.exe /u Z: /f when you finished =)

@Liryna
Copy link
Member

Liryna commented Jun 26, 2015

Have you already use DOKAN_OPTION_KEEP_ALIVE ? it say that it is for auto unmount.
When enabled it perform IOCTL_KEEPALIVE to the kernel driver for updating a internal timer.
If the timer is reach the device is unmount.

DokanCheckKeepAlive(

If this really work, killing the service with the option should work.
Can you test with this option ?

@don-Pardon
Copy link
Author

I've been testing with that option turned on. The thing is - when auto unmount is triggered, DokanService searches for mount entry and if service was restarted, appropriate record won't be found and DokanControlUnmount won't be called. It would be called if DOKAN_CONTROL_OPTION_FORCE_UNMOUNT option was specified (like in dokan_control /u Z: /f call).
So, another thought comes up - if KEEP_ALIVE was specified, use FORCE_UNMOUNT option. But I'm not sure that it wouldn't lead to other issues and moreover "KEEP_ALIVE" and "FORCE_UNMOUNT" are kind of opposite ideas.

@Liryna
Copy link
Member

Liryna commented Jun 26, 2015

@don-Pardon Oh ok I see ><

@Maxhy
Copy link
Member

Maxhy commented Jul 10, 2015

Looks like @marinkobabic fixes resolved this issue. I cannot reproduce your BSOD with these changes whereas I was with previous versions. Could you try this pre-release and let me know if you still have BSOD? https://github.com/dokan-dev/dokany/releases/tag/0.7.3-RC2
Thanks.

@Maxhy
Copy link
Member

Maxhy commented Jul 20, 2015

@don-Pardon could you give a try with https://github.com/dokan-dev/dokany/releases/tag/0.7.3-RC3? Thanks.

@viciousviper
Copy link

don-Pardon's results notwithstanding I can report that dokany 0.7.3-RC3 greatly reduces BSODs on my machine.
I still get crashes whenever I leave a mount point open after aborting a DokanNet.Dokan instance in the debugger and try to mount for a second time - but that's fair enough. With 0.7.3-RC2 the BSOD appeared right after aborting the debug session.

@marinkobabic
Copy link
Contributor

@Maxhy
There are few changes which you should merge as well.

@viciousviper
If you could provide more details using WinDbg and the command !analyze -v when you have opened your crash dump, that would be great.

There is no reason for a BSOD. We must identify the problem and solve it.

@Liryna
Copy link
Member

Liryna commented Jul 29, 2015

Big thank you @marinkobabic ! Your contributions are always welcomed !
I made a Pre-release with your changes.

@viciousviper could you test with this version ? and make a report using WinDbg as marinkobabic explained ?
https://github.com/dokan-dev/dokany/releases/tag/v0.7.3-RC4

@viciousviper
Copy link

There you go.
This happened last night with v0.7.3-RC4 while debugging a Dokan.Net-application in VS2015RC:

-- removed misleading .dmp from devenv.exe

@Liryna
Copy link
Member

Liryna commented Jul 30, 2015

Maybe I am missing something but the crash report is from devenv.exe

PROCESS_NAME: devenv.exe

Are you debugging dokan with VS when you run it ? If yes, could you run dokan without VS and make a new crash report ?

@viciousviper
Copy link

Well, yes, just as I wrote above. So far I've only witnessed BSODs after I aborted a VS debugging session on my still very incomplete Dokan.Net application.
I'll see if I can make my machine crash without the help of VS :-)

@Liryna
Copy link
Member

Liryna commented Jul 30, 2015

Oh sorry! I missed this information 😄 haha

So for now, you have never been able to make a crash without VS ?
For me, the crash with VS is much more related to the current VS 2015 RC stability.

@viciousviper
Copy link

You certainly have a point there. I'll probably get around to upgrading my VS to 2015 final in the next couple of days. However, while I did see my share of exceptions and crashes inside VS 2015 RC I've never had regular BSODs until I started to fiddle with dokan.
No offense intended - I'm just trying to help nail down the BSOD source.

In the meantime, how's this:

Microsoft (R) Windows Debugger Version 6.3.9600.17336 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [C:\Windows\MEMORY.DMP]
Kernel Bitmap Dump File: Only kernel address space is available

************* Symbol Path validation summary **************
Response Time (ms) Location
Deferred SRV_C:\Windows\symbol_cache_http://msdl.microsoft.com/download/symbols
Symbol search path is: SRV_C:\Windows\symbol_cache_http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows 8 Kernel Version 9600 MP (8 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 9600.17736.amd64fre.winblue_r9.150322-1500
Machine Name:
Kernel base = 0xfffff8037b210000 PsLoadedModuleList = 0xfffff8037b4e9850
Debug session time: Thu Jul 30 22:35:28.516 2015 (UTC + 2:00)
System Uptime: 0 days 0:39:30.251
Loading Kernel Symbols
...............................................................
................................................................
................................................................
.............
Loading User Symbols

Loading unloaded module list
............


* *

  •                    Bugcheck Analysis                                    *
    
  •                                                                         *
    

Use !analyze -v to get detailed debugging information.

BugCheck 7E, {ffffffffc0000005, fffff8037b2c896b, ffffd001687af758, ffffd001687aef60}

*** ERROR: Module load completed but symbols could not be loaded for dokan.sys
Probably caused by : dokan.sys ( dokan+2523 )

Followup: MachineOwner

2: kd> !analyze -v


* *

  •                    Bugcheck Analysis                                    *
    
  •                                                                         *
    

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffffc0000005, The exception code that was not handled
Arg2: fffff8037b2c896b, The address that the exception occurred at
Arg3: ffffd001687af758, Exception Record Address
Arg4: ffffd001687aef60, Context Record Address

Debugging Details:

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08l referenced memory at 0x%08lx. The memory could not be "%s".

FAULTING_IP:
nt!IopfCompleteRequest+c1b
fffff803`7b2c896b 488b4018 mov rax,qword ptr [rax+18h]

EXCEPTION_RECORD: ffffd001687af758 -- (.exr 0xffffd001687af758)
ExceptionAddress: fffff8037b2c896b (nt!IopfCompleteRequest+0x0000000000000c1b)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: 0000000000000019
Attempt to read from address 0000000000000019

CONTEXT: ffffd001687aef60 -- (.cxr 0xffffd001687aef60;r)
rax=0000000000000001 rbx=ffffe00107d5d910 rcx=0000000000000884
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000001
rip=fffff8037b2c896b rsp=ffffd001687af990 rbp=ffffd001687afa90
r8=0000000000000001 r9=ffffe00107cb2650 r10=0000000000000000
r11=ffffd001687afac8 r12=00000000a000000c r13=0000000000000000
r14=ffffe00108a19200 r15=00000000a0000003
iopl=0 nv up ei pl nz na pe nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010202
nt!IopfCompleteRequest+0xc1b:
fffff8037b2c896b 488b4018 mov rax,qword ptr [rax+18h] ds:002b:0000000000000019=????????????????
Last set context:
rax=0000000000000001 rbx=ffffe00107d5d910 rcx=0000000000000884
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000001
rip=fffff8037b2c896b rsp=ffffd001687af990 rbp=ffffd001687afa90
r8=0000000000000001 r9=ffffe00107cb2650 r10=0000000000000000
r11=ffffd001687afac8 r12=00000000a000000c r13=0000000000000000
r14=ffffe00108a19200 r15=00000000a0000003
iopl=0 nv up ei pl nz na pe nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010202
nt!IopfCompleteRequest+0xc1b:
fffff8037b2c896b 488b4018 mov rax,qword ptr [rax+18h] ds:002b:0000000000000019=????????????????
Resetting default scope

PROCESS_NAME: System

CURRENT_IRQL: 0

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08l referenced memory at 0x%08lx. The memory could not be "%s".

EXCEPTION_PARAMETER1: 0000000000000000

EXCEPTION_PARAMETER2: 0000000000000019

READ_ADDRESS: unable to get nt!MmNonPagedPoolStart
unable to get nt!MmSizeOfNonPagedPoolInBytes
0000000000000019

FOLLOWUP_IP:
dokan+2523
fffff800`de02e523 4883c428 add rsp,28h

BUGCHECK_STR: AV

DEFAULT_BUCKET_ID: NULL_CLASS_PTR_DEREFERENCE

ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre

LAST_CONTROL_TRANSFER: from fffff800de02e523 to fffff8037b2c896b

STACK_TEXT:
ffffd001687af990 fffff800de02e523 : ffffe0011026f080 ffffe00108a19200 ffffe00108a19200 ffffd00100000000 : nt!IopfCompleteRequest+0xc1b
ffffd001687afad0 fffff800de031f7c : 0000000000025090 ffffe00108a19260 fffffff600000002 fffff80000000006 : dokan+0x2523
ffffd001687afb00 fffff800de031c3d : ffffe00108a191d0 0000000000000000 ffffe00107177501 00007ff800000000 : dokan+0x5f7c
ffffd001687afb60 fffff8037b31036c : 0000000000000000 ffffe0010b985880 ffffe0010b985880 fffff90140819b10 : dokan+0x5c3d
ffffd001687afc00 fffff8037b3672c6 : ffffd001657e4180 ffffe0010b985880 ffffd001657f03c0 0000000000000000 : nt!PspSystemThreadStartup+0x58
ffffd001687afc60 0000000000000000 : ffffd001687b0000 ffffd001687aa000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x16

SYMBOL_STACK_INDEX: 1

SYMBOL_NAME: dokan+2523

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: dokan

IMAGE_NAME: dokan.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 55b8993b

STACK_COMMAND: .cxr 0xffffd001687aef60 ; kb

FAILURE_BUCKET_ID: AV_dokan+2523

BUCKET_ID: AV_dokan+2523

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:av_dokan+2523

FAILURE_ID_HASH: {9d91f95c-aa94-4f23-2d77-0f802f2a29b4}

Followup: MachineOwner

@marinkobabic
Copy link
Contributor

The last crash is interesting. You should get the actual symbol file and then execute the analyze command. So we would get a clear stack trace.

@viciousviper
Copy link

I'd be happy to help if someone (Maxhy?) could provide me with the .pdb for dokan.sys 0.7.3-RC4.

@Liryna
Copy link
Member

Liryna commented Jul 31, 2015

Sorry @viciousviper, The pdb files have been erased by a new build 😢 .
Could you install this version of 0.7.3-RC4 and reproduct the crash ?
http://download.islog.com/dokan/

The sys pdf files of this build are in x64.rar.

For the next releases, I will add the pdb files next to the installer in the download page.

@viciousviper
Copy link

Ok, updated to your special build. Now I'll have to see if I can crash nicely again #-)

@viciousviper
Copy link

Another one bites the dust ...

Microsoft (R) Windows Debugger Version 6.3.9600.17336 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [C:\Windows\MEMORY.DMP]
Kernel Bitmap Dump File: Only kernel address space is available

************* Symbol Path validation summary **************
Response Time (ms) Location
OK D:\Temp\Source
Deferred SRV_C:\Windows\symbol_cache_http://msdl.microsoft.com/download/symbols
Symbol search path is: D:\Temp\Source;SRV_C:\Windows\symbol_cache_http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows 8 Kernel Version 9600 MP (8 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 9600.17736.amd64fre.winblue_r9.150322-1500
Machine Name:
Kernel base = 0xfffff80082e19000 PsLoadedModuleList = 0xfffff800830f2850
Debug session time: Wed Aug 5 00:34:07.875 2015 (UTC + 2:00)
System Uptime: 0 days 0:23:19.610
Loading Kernel Symbols
...............................................................
................................................................
................................................................
..............
Loading User Symbols

Loading unloaded module list
.............



** Bugcheck Analysis *



Use !analyze -v to get detailed debugging information.

BugCheck 7E, {ffffffffc0000005, fffff80082ed196b, ffffd00024215758, ffffd00024214f60}

Probably caused by : dokan.sys ( dokan!DokanCompleteIrpRequest+2b )

Followup: MachineOwner

2: kd> !analyze -v



** Bugcheck Analysis *



SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffffc0000005, The exception code that was not handled
Arg2: fffff80082ed196b, The address that the exception occurred at
Arg3: ffffd00024215758, Exception Record Address
Arg4: ffffd00024214f60, Context Record Address

Debugging Details:

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be "%s".

FAULTING_IP:
nt!IopfCompleteRequest+c1b
fffff800`82ed196b 488b4018 mov rax,qword ptr [rax+18h]

EXCEPTION_RECORD: ffffd00024215758 -- (.exr 0xffffd00024215758)
ExceptionAddress: fffff80082ed196b (nt!IopfCompleteRequest+0x0000000000000c1b)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: 0000000000000019
Attempt to read from address 0000000000000019

CONTEXT: ffffd00024214f60 -- (.cxr 0xffffd00024214f60;r)
rax=0000000000000001 rbx=ffffe00194206ee0 rcx=0000000000000884
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000001
rip=fffff80082ed196b rsp=ffffd00024215990 rbp=ffffd00024215a90
r8=0000000000000001 r9=ffffe001943dc820 r10=0000000000000000
r11=ffffd00024215ac8 r12=00000000a000000c r13=0000000000000000
r14=ffffe0019254f700 r15=00000000a0000003
iopl=0 nv up ei pl nz na pe nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010202
nt!IopfCompleteRequest+0xc1b:
fffff80082ed196b 488b4018 mov rax,qword ptr [rax+18h] ds:002b:0000000000000019=????????????????
Last set context:
rax=0000000000000001 rbx=ffffe00194206ee0 rcx=0000000000000884
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000001
rip=fffff80082ed196b rsp=ffffd00024215990 rbp=ffffd00024215a90
r8=0000000000000001 r9=ffffe001943dc820 r10=0000000000000000
r11=ffffd00024215ac8 r12=00000000a000000c r13=0000000000000000
r14=ffffe0019254f700 r15=00000000a0000003
iopl=0 nv up ei pl nz na pe nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010202
nt!IopfCompleteRequest+0xc1b:
fffff80082ed196b 488b4018 mov rax,qword ptr [rax+18h] ds:002b:0000000000000019=????????????????
Resetting default scope

PROCESS_NAME: System

CURRENT_IRQL: 0

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be "%s".

EXCEPTION_PARAMETER1: 0000000000000000

EXCEPTION_PARAMETER2: 0000000000000019

READ_ADDRESS: unable to get nt!MmNonPagedPoolStart
unable to get nt!MmSizeOfNonPagedPoolInBytes
0000000000000019

FOLLOWUP_IP:
dokan!DokanCompleteIrpRequest+2b [d:\islog\dev\app\tmp\dokany\sys\dokan.c @ 487]
fffff801`47878523 4883c428 add rsp,28h

BUGCHECK_STR: AV

DEFAULT_BUCKET_ID: NULL_CLASS_PTR_DEREFERENCE

ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre

LAST_CONTROL_TRANSFER: from fffff80147878523 to fffff80082ed196b

STACK_TEXT:
ffffd00024215990 fffff80147878523 : 0000000000000000 fffff80082edb800 0000000000000000 0000000000000000 : nt!IopfCompleteRequest+0xc1b
ffffd00024215ad0 fffff8014787bf7c : 0000000000015de7 ffffe0019254f7f0 0000000000000000 fffff8014787bba8 : dokan!DokanCompleteIrpRequest+0x2b [d:\islog\dev\app\tmp\dokany\sys\dokan.c @ 487]
ffffd00024215b00 fffff8014787bc3d : ffffe0019254f760 0000000000000000 ffffe00188196001 00007ff800000000 : dokan!ReleaseTimeoutPendingIrp+0x1b0 [d:\islog\dev\app\tmp\dokany\sys\timeout.c @ 202]
ffffd00024215b60 fffff80082f1936c : 0000000000000000 ffffe00194c93300 ffffe00194c93300 fffff901412216b0 : dokan!DokanTimeoutThread+0x95 [d:\islog\dev\app\tmp\dokany\sys\timeout.c @ 300]
ffffd00024215c00 fffff80082f702c6 : ffffd0003d7ea180 ffffe00194c93300 ffffd0003d7f63c0 0000000000000000 : nt!PspSystemThreadStartup+0x58
ffffd00024215c60 0000000000000000 : ffffd00024216000 ffffd00024210000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x16

FAULTING_SOURCE_LINE: d:\islog\dev\app\tmp\dokany\sys\dokan.c

FAULTING_SOURCE_FILE: d:\islog\dev\app\tmp\dokany\sys\dokan.c

FAULTING_SOURCE_LINE_NUMBER: 487

SYMBOL_STACK_INDEX: 1

SYMBOL_NAME: dokan!DokanCompleteIrpRequest+2b

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: dokan

IMAGE_NAME: dokan.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 55bb26f2

STACK_COMMAND: .cxr 0xffffd00024214f60 ; kb

BUCKET_ID_FUNC_OFFSET: 2b

FAILURE_BUCKET_ID: AV_dokan!DokanCompleteIrpRequest

BUCKET_ID: AV_dokan!DokanCompleteIrpRequest

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:av_dokan!dokancompleteirprequest

FAILURE_ID_HASH: {a0df1a10-978f-2e6d-4f52-b253bcbf29f6}

Followup: MachineOwner

@Liryna
Copy link
Member

Liryna commented Aug 5, 2015

Thats a good report! Do you know exactly how to reproduct it ?

It seems that there is corruption of one IRP in the pending irp list causing the crash.
All PendingIrp are protected with a KeAcquireSpinLock but at this part of the code a new list with complete pendings irp is create and unprotected.

@marinkobabic I would like your advice about it since you seems to know more than me 😋
Should we move the KeReleaseSpinLock right after the while ? (L208)
https://github.com/marinkobabic/dokanx/blob/Windows8DeleteIssue/sys/timeout.c#L199
This could protect the IRP pointer used by IoCompleteRequest.

The documentation about IoCompleteRequest say:
"Never call IoCompleteRequest while holding a spin lock. Attempting to complete an IRP while holding a spin lock can cause deadlocks."
Are they talking about every spin lock or the spin lock of the IRP ?

EDIT: I just found that RemoveTailList is never used :O does that mean the PendingIrp is really never cleaned from irp completed ? if we clean it, the source of this BSOD will be removed.

@viciousviper
Copy link

No, I cannot reliably reproduce the crash. What I can say is that the BSOD appears several seconds after I terminate the thread that my .NET application gets called on via Dokan.NET.
Usually the method being called is one of CreateFile(), GetFileInformation() or GetVolumeInformation() in DokanNet.Dokan.
Prior to the BSOD Windows Explorer starts to lag when I point it to the filesystem root ("This PC"), probably due to a leftover "disconnected network drive" at my mount point. My application then repeatedly runs into the case DOKAN_MOUNT_ERROR: throw new DokanException(status, "Can't assign a drive letter or mount point" error when trying to unmount and re-mount the Dokan drive and may or may not succeed eventually.
As you may have concluded from my memory dump m< dev environment is running on Windows 8.1 Pro x64 in a physical box - which I'll probably move to Hyper-V unless the BSODs disappear.
And finally, I'm still on VS 2015 RC/.NET 4.6 RC which could also be a factor - although I've never had a BSOD with this setup outside of my Dokan experiments.

@marinkobabic
Copy link
Contributor

@Liryna
Just few information for you, to make some details clear so that we can together investigate the issue. Inside of the method DokanCompleteIrp you have the following line https://github.com/dokan-dev/dokany/blob/master/sys/event.c#L319 so the entry is removed.
To your previous question the completeList is inside of the local scope of the method. So it doesn’t help to protect this list.

@viciousviper
Is there a way to get the memory.dmp from you?

@viciousviper
Copy link

@marinkobabic
Do you need the full MEMORY.dmp (1.3Gb raw, 170Mb as a .7z) or will the associated minidump be sufficient (330kb raw, 140kb as a .7z)? I could easily mail you the minidump while the full dump would require uploading to a filehoster.

@marinkobabic
Copy link
Contributor

Full would be great if possible :-)

@viciousviper
Copy link

@marinkobabic
Please let me know your email address in a mail to j_h at mail.org so I can provide you with a download location for the dump.

@Liryna
Copy link
Member

Liryna commented Aug 14, 2015

Does the analysis of the memory dump gave more informations ?

@ghost
Copy link

ghost commented Aug 17, 2015

I think I also got this crash. Is there any quick change to Dokan that I can make to make this cause an error rather than a BSOD?

http://www.voltagex.org/081615-16812-01.dmp but I may need to move this file. No full dump captured by the look of it.

I caused this BSOD by running the project in http://voltagex.org/DokanTest.7z a couple of times.

@Liryna
Copy link
Member

Liryna commented Aug 17, 2015

@voltagex Thank you for the dump. Unfortunatly, the crash happen in ntoskrnl.exe.
WinDbg can tell you from which software it come from.
https://github.com/dokan-dev/dokany/wiki/How-to-Debug-Dokan#crash-report-bsod
In case you succeed to get the crash in dokan, I would be glad to look at the report.

I have try to open DokanTest.7z but SevenZippedFile.cs is full of '\0'.

@ghost
Copy link

ghost commented Aug 17, 2015 via email

@ghost
Copy link

ghost commented Aug 18, 2015

Download DokanTest.7z again, I've fixed that file. I don't think the current version will crash.

Try this: mount a drive as Z:, mount it again (fails), unmount Z: and mount again (crash)

@Liryna
Copy link
Member

Liryna commented Aug 18, 2015

But @voltagex ... you have implemented nothing 😐.
Please provide a crash report of dokan when you will have finished your implementation.

@ghost
Copy link

ghost commented Aug 18, 2015

Even with only a few things implemented, Dokan shouldn't cause a BSOD,
right?
On 18 Aug 2015 5:27 am, "Liryna" notifications@github.com wrote:

But @voltagex https://github.com/voltagex ... you have implemented
nothing [image: 😐].
Please provide a crash report of dokan when you will have finished your
implementation.


Reply to this email directly or view it on GitHub
#26 (comment).

@marinkobabic
Copy link
Contributor

@Liryna
Are you able to reproduce the isssue. I have started the DokanTest min. 25 times without crash.

@voltagex
When you test the DokanTest.exe without debugger attached, are you able to reproduce the crash? When you are debugging, which method of the callbacks?

@ghost
Copy link

ghost commented Aug 18, 2015 via email

@Liryna
Copy link
Member

Liryna commented Aug 18, 2015

@marinkobabic I have made the same test as you. DokanTest more than 25 times with CTRL + C very fast or slowly.
I got no crash and no zombie device. System was still stable after.

@voltagex If you can make a crash, please use WinDbg to see in which software it crashed.

@joepperkins
Copy link

Was the volume active while it was being dismounted?

Sent on a Sprint Samsung Galaxy Note® 3

-------- Original message --------
From: Liryna notifications@github.com
Date: 08/18/2015 1:48 PM (GMT-06:00)
To: dokan-dev/dokany dokany@noreply.github.com
Subject: Re: [dokany] BSODs after DokanMounter service restart (#26)

@marinkobabichttps://github.com/marinkobabic I have made the same test as you. DokanTest more than 25 times with CTRL + C very fast or slowly.
I got no crash and no zombie device. System was still stable after.

@voltagexhttps://github.com/voltagex If you can make a crash, please use WinDbg to see in which software it crashed.

Reply to this email directly or view it on GitHubhttps://github.com//issues/26#issuecomment-132315910.

@marinkobabic
Copy link
Contributor

The fact is that a lot of people have the BSOD while debugging in Visual Studio. What is the difference when the debugger is used, compared to running the user mode application without attached debugger:

If debugger is used we slow down the request and other requests are not processed fast enough which results in more timed out Irp requests. If the developer detaches the debugger then the process terminates. The timeout thread realizes that the user mode application is no longer there and starts to clean up everything. Imagine that the timeout thread is collecting the timed out Irp requests and those are now in a list to complete. At the same time the driver removes all the other pending Irp requests and deletes the device and symbolic link. The timeout thread completes the Irp requests in a loop for a device which no longer exists and here we have an invalid Irp which is completed.

To simulate the timeout we could delay the response from user mode methods randomized to cause some of the Irp requests to time out. Then at some point we should exit the application. After several tries the application should crash.
It’s actually just a theory but we can try to reproduce the issue this way, if the theory is correct.

@ghost
Copy link

ghost commented Aug 18, 2015

Reproduced without the debugger. (updated this note to add unmounting step)

  • Mount a drive - any drive, and make sure your application is still running.
  • Open another cmd.exe and run cd z:\ (or whatever drive)
  • Restart the DokanMounter service
  • Make your application unmount the drive
  • Attempt to access z:\ again.
  • Wait for the crash.
PROCESS_NAME:  cmd.exe

CURRENT_IRQL:  0

ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre

LAST_CONTROL_TRANSFER:  from fffff801a0b19f5c to fffff801a09c59a0

STACK_TEXT:  
ffffd001`730cd368 fffff801`a0b19f5c : 00000000`000000c2 00000000`00000007 00000000`00001200 00000000`04070081 : nt!KeBugCheckEx
ffffd001`730cd370 fffff801`a0a10fbb : ffffe001`85722990 ffffe001`8406b6f0 00000000`c0000120 ffffd001`00000007 : nt!ExDeferredFreePool+0x6ec
ffffd001`730cd460 fffff801`a0cb508f : 00000000`00000085 ffffd001`730cd7b0 00000000`c0000120 ffffe001`8406b6f0 : nt! ?? ::FNODOBFM::`string'+0x3b10b
ffffd001`730cd490 fffff801`a0c60c39 : ffffc000`4a8146b8 ffffc000`4a8146b8 ffffc000`4bc110f0 ffffe001`8503fc20 : nt!IopParseDevice+0xbbf
ffffd001`730cd6b0 fffff801`a0c5ea63 : 00000000`00000000 ffffd001`730cd8a8 ffffe001`00000040 ffffe001`840aef20 : nt!ObpLookupObjectName+0x6b9
ffffd001`730cd830 fffff801`a0cd77ab : ffffe001`00000001 ffffe001`853810a8 00000000`00000001 00000000`00000020 : nt!ObOpenObjectByName+0x1e3
ffffd001`730cd960 fffff801`a0cd73b8 : 000000a4`1f5ff508 000000a4`00100020 000000a4`1f5ff488 fffff801`00001000 : nt!IopCreateFile+0x36b
ffffd001`730cda00 fffff801`a09d11b3 : fffff6fb`7dbed7f8 fffff6fb`7daffed0 fffff6fb`5ffdaa98 fffff6bf`fb5539f0 : nt!NtOpenFile+0x58
ffffd001`730cda90 00007ffa`bd750f7a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
000000a4`1f5ff418 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ffa`bd750f7a

@Liryna
Copy link
Member

Liryna commented Aug 19, 2015

@voltagex Perfect report ! I have been able to repoduce the issue (Win 8.1) but not the crash.
I have follow your steps, and after Make your application unmount the drive

M:\>dir
The parameter is incorrect.

M:\>c:

C:\Users\Liryna\Desktop\dokan>m:
The system cannot find the path specified.

BUT I still see the M driver in explorer.
Edit: After restart of the service, the device is still alive and the mirror is still connected to the device. The issue happen when the mirror is killed DokanRemoveMountPoint failed. This could be fixed if during stop, the service would force to unmount.

@marinkobabic I totally agree with you. It is worth to try.
The timeout thread completes the Irp requests in a loop for a device which no longer exists and here we have an invalid Irp which is completed. In which part of the code you think this happen ?

@marinkobabic
Copy link
Contributor

@Liryna
Copy link
Member

Liryna commented Aug 19, 2015

@marinkobabic Thank you
Otherwise I have try to force unmount when the service is stop by adding:

//Force Unmount every device
EnterCriticalSection(&g_CriticalSection);

PLIST_ENTRY     listEntry;
PMOUNT_ENTRY    mountEntry = NULL;

for (listEntry = g_MountList.Flink; listEntry != &g_MountList; listEntry = listEntry->Flink) {
    mountEntry = CONTAINING_RECORD(listEntry, MOUNT_ENTRY, ListEntry);
    DbgPrintW(mountEntry->MountControl.MountPoint);

    ZeroMemory(&unmount, sizeof(DOKAN_CONTROL));
    unmount.Type = DOKAN_CONTROL_UNMOUNT;
    wcscpy_s(unmount.DeviceName, sizeof(unmount.DeviceName) / sizeof(WCHAR),
        mountEntry->MountControl.DeviceName);
    DokanControl(&unmount);
}

LeaveCriticalSection(&g_CriticalSection);

Here:

The device is well unmount but the mirror is not notified of it and keep running.
Do you have a idea why ?

@marinkobabic
Copy link
Contributor

@Liryna
It's just a question of time until we will remove the service. To my opinion the service is not needed if the Mount Manager of windows is used.

If would not make to much dependencies to the service. An option you have is to let the driver to unmount the drive with force flag. What you need to do is to extend UNMOUNT_CONTEXT with Flags and set the force unmount flag here https://github.com/dokan-dev/dokany/blob/master/sys/timeout.c#L64. The mounter service will take the EVENT_CONTEXT and set the flag here after this line. https://github.com/dokan-dev/dokany/blob/master/dokan_mount/mounter.c#L391

@Liryna
Copy link
Member

Liryna commented Aug 20, 2015

@marinkobabic I have try what you say, but since the service has lost all mount informations during the restart. FindMountEntry cannot retrieve the MountPoint from the DeviceName so the Unmount, even with force flag, fail.
https://github.com/dokan-dev/dokany/blob/master/dokan_mount/mounter.c#L74

I totally agree with you, the service seems to be useless compared to the issue that it create.
Have you already used Mount Manager ?

@marinkobabic
Copy link
Contributor

@Liryna
This is not possible what you have described. Please check the following lines

mountEntry = FindMountEntry(Control);
if (mountEntry == NULL) {
if (Control->Option == DOKAN_CONTROL_OPTION_FORCE_UNMOUNT &&
If the entry is not found and the force flag is set, an unmount will be performed.

MountManager requires Plug & Play implementation. By the way, can you open pdf files on Windows 8.1 using the native pdf viewer and not the Adobe Reader?

@Liryna
Copy link
Member

Liryna commented Aug 20, 2015

@marinkobabic DokanControlUnmount(Control->MountPoint) MountPoint is empty 😢 only DeviceName is set by the sys driver.
(Sorry my first description was misleading)

Can we get the MountPoint at this part of the code ?
https://github.com/dokan-dev/dokany/blob/master/sys/timeout.c#L61

I just tested with the native pdf viewer: "There is a problem with the file format."

@marinkobabic
Copy link
Contributor

@Liryna
You are right 😥 so this way an unmount is not possible. A device can have multiple mountpoints and the service has lost all information after the restart. In this case you can just loop over all existing drives and perform a QueryDosDevice until you find the target device and then you can set the MountPoint and perform the unmount.

The reason you can't open the pdf file is that since Windows 8 a lot of programs rely on OpLocks https://msdn.microsoft.com/en-us/library/windows/hardware/ff551007(v=vs.85).aspx . The implementation is no longer optional for File System Drivers. Keep it in mind, when somebody is not able to play/open some formats. The check what requests are sent you can use the process monitor or the filespy https://www.osronline.com/article.cfm?article=370

As you can see there is a lot to do. The first stuff which should be done is to change the DriverEntry like in fastfat example and to catch all exception on one place.

@Liryna
Copy link
Member

Liryna commented Aug 20, 2015

@marinkobabic
Ok I will add QueryDosDevice in FindMountEntry later.

What do you mean by catching all exception on one place ?

I agree, we are discovering that dokan need a lots of changes to achieve his goal.
I create a new issue with a TODO list by priority, I will add every changes we found interesting and keep it update.
Feel free to suggest any ideas.
#45

@marinkobabic
Copy link
Contributor

@Liryna
I mean the following example which can be very easy implemented and would help us to make the driver more stable https://github.com/Microsoft/Windows-driver-samples/blob/3528ffbb369fa0611aefd14c97bdd6ac5ee50c41/filesys/cdfs/cddata.c#L223-L240

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants