Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Server 2008 crashes with BSOD when invoking winfsp_ x64.sys from 'rclone mount' #392

Closed
lxnglz opened this issue Sep 13, 2021 · 3 comments

Comments

@lxnglz
Copy link

lxnglz commented Sep 13, 2021

I have mounted several cloud storages as local folders using rclone and its 'mount' command, which on Windows uses winFSP.
These folders were mounted inside of a parent folder which is shared accross the network, so that other devices can access the cloud storages via CIFS/SMB.
The above setup was done on Windows Server 2008 running on a relatively weak machine with 4Gb of memory.
When the mounted folders are accessed from other devices on the network, the server at times (i.e. sometimes immediately, and sometimes once a day) crashes with the blue screen of death.
The crash dump contains information that the culprit process is rclone.exe and the culprit module is winfsp_ x64.sys.
To test it further, I configured the Windows built-in checking utility, 'verifier.exe', to monitor 'winfsp_ x64.sys'. With this monitoring turned on the server crashes a few minutes after reboot, and this time the culprit process is 'explorer.exe', whereas the culprit module is still 'winfsp_ x64.sys'. This is strange, because explorer.exe is not supposed to use winFSP directly. Furthermore, if rclone is not started, the 'winfsp_ x64.sys' driver is not even seen by the verifier. However, this is what I got. Maybe the Windows explorer scans at user login the folders tree of all drives, and this causes the crash, which in this case is not implicitly associated with rclone.
I tried then to reproduce this issue on a more powerful Windows 10 computer, but there it works without problems.
I also tried to reproduce it locally on the server, but with no luck - I successfully open mounted folders locally.
However, when I open a mounted folder from another local computer, the server crashes more often if I have a vpn connection up on that local computer.
I don't know, if Windows Server 2008 architecture differs too much from the Windows 10 with regards to the file system. Probably not.
So I assume this issue is more likely to happen on a slow computer and when the network traffic is more complex than usual and there are more unusual delays in it.
Also it appears to happen when the cloud is scanned for the first time (or at a later time when the local rclone file cache expires) - when the process of opening the mounted folder is quite heavy and may easily take up to 10 seconds.
I don't believe that this issue is totally dependent on winFSP, because otherwise there would have been other mentions of similar problems on the internet.
I also don't believe that rclone by itself can crash the system because it is not working on the system kernel level.
So, the problem is somewhere in between.
Maybe winFPS makes some callbacks into the calling code, which are executed at the kernel level? Or are there any other cases when the system crash may happen because of a unique way of interaction between winFSP and other software?
So far I have no idea how to test it further, neither how to propery log the details.
As for rclone, this issue does not seem to depend on what particular cloud storage I mount. It happens both with Google Drive and with Microsoft OneDrive. So, the problem must be somewhere on a lower level.
By the way, other rclone commands besides 'mount' successfully work with all clouds and transfer hundreds of gigabytes without a problem. Only 'mount' is unsuccessful.
And as I said, the problem may not happen immediately, but if I just leave the server running for a day or two, the first attempt to open the monted folder remotely causes BSOD in 75% of times.
This is the thread on rclone github: https://github.com/rclone/rclone/issues/5570

version of winFSP: 1.9.21096

@billziss-gh
Copy link
Collaborator

WinFsp does not support OS'es prior to Windows 7. Windows Server 2008 is based on Vista which is an unsupported OS. I note that I have never tested or even ran WinFsp on a Vista-based OS.

Having said that, let me suggest some troubleshooting steps:

  • What does the command fltmc filters report? (This command lists file system filters on your system; often third party filters can cause instability.)

  • You mention that you are able to reliably crash WinFsp when you enable the verifier on it. This is great, because it may allow us to collect a crash dump from the machine and analyze it. Look for a memory.dmp file in the Windows system folder.

  • If you do not have a memory.dmp file then we will have to enable crash dumps on the machine in order to understand this problem further.

@lxnglz
Copy link
Author

lxnglz commented Oct 11, 2021

Hi, I am sorry for not responding for so long. And thank you for paying attention to this issue even though my OS is not officially supported.

  1. This is the output of "fltmc filters" command:
    Filter Name Num Instances Altitude Frame

luafv 1 135000 0

  1. And this is the analysis of the memory.dmp file, performed by WinDbg (I could not install WinDbg on MS Server 2008 for some reason, so I copied the memory.dmp file to another machine, a Windows 10 based one, and ran WinDbg there; not sure if it matters):
3: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except.
Typically the address is just plain bad or it is pointing at freed memory.
Arguments:
Arg1: fffffffffffffff2, memory referenced.
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
Arg3: fffff80001a2cd03, If non-zero, the instruction address which referenced the bad memory
	address.
Arg4: 0000000000000000, (reserved)

Debugging Details:
------------------


KEY_VALUES_STRING: 1

    Key  : AV.Type
    Value: Read

    Key  : Analysis.CPU.mSec
    Value: 2530

    Key  : Analysis.DebugAnalysisManager
    Value: Create

    Key  : Analysis.Elapsed.mSec
    Value: 3880

    Key  : Analysis.Init.CPU.mSec
    Value: 1218

    Key  : Analysis.Init.Elapsed.mSec
    Value: 40151

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 64

    Key  : WER.OS.Branch
    Value: win7sp1_ldr_escrow

    Key  : WER.OS.Timestamp
    Value: 2020-01-02T17:07:00Z

    Key  : WER.OS.Version
    Value: 7.1.7601.24545


BUGCHECK_CODE:  50

BUGCHECK_P1: fffffffffffffff2

BUGCHECK_P2: 0

BUGCHECK_P3: fffff80001a2cd03

BUGCHECK_P4: 0

READ_ADDRESS:  fffffffffffffff2 

MM_INTERNAL_CODE:  0

PROCESS_NAME:  System

TRAP_FRAME:  fffff88005d68830 -- (.trap 0xfffff88005d68830)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000002
rdx=0000000049707346 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80001a2cd03 rsp=fffff88005d689c0 rbp=fffff88005d68b18
 r8=0000000000000000  r9=0000000000000001 r10=fffffa80070cb4f8
r11=00000000000000ff r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl nz na pe nc
nt!ExFreePoolWithTag+0x43:
fffff800`01a2cd03 418b45f0        mov     eax,dword ptr [r13-10h] ds:ffffffff`fffffff0=????????
Resetting default scope

STACK_TEXT:  
fffff880`05d686d8 fffff800`019c09b2     : 00000000`00000050 ffffffff`fffffff2 00000000`00000000 fffff880`05d68830 : nt!KeBugCheckEx
fffff880`05d686e0 fffff800`018f2fdc     : 00000000`00000000 ffffffff`fffffff2 fffffa80`051bc000 fffff8a0`129ff780 : nt!MmAccessFault+0x2322
fffff880`05d68830 fffff800`01a2cd03     : fffff880`05d68bc0 00000000`00000000 fffffa80`04eb4d30 00000000`00000000 : nt!KiPageFault+0x35c
fffff880`05d689c0 fffff880`05882784     : 00000000`00000068 fffffa80`04eb4d30 fffffa80`07ae3250 fffffa80`06ffdaf0 : nt!ExFreePoolWithTag+0x43
fffff880`05d68a70 fffff880`058795b4     : fffff8a0`054329b0 fffff880`05d68b18 fffffa80`06ffda00 fffffa80`070cb4db : winfsp_x64+0x1a784
fffff880`05d68aa0 fffff880`058781e1     : 00000000`00000000 fffffa80`00000000 fffffa80`06ffdfb0 fffffa80`03a1d100 : winfsp_x64+0x115b4
fffff880`05d68b60 fffff880`058780d8     : fffffa80`06ffdaf0 00000000`00000000 fffffa80`056215d0 fffff880`05801406 : winfsp_x64+0x101e1
fffff880`05d68b90 fffff880`012446af     : fffffa80`03a1d060 fffffa80`06ffdaf0 fffffa80`00000000 00000000`00000000 : winfsp_x64+0x100d8
fffff880`05d68bf0 fffff880`058315a5     : fffffa80`070cb010 00000000`00000000 00000000`0000000d fffffa80`070cb010 : fltmgr!FltpDispatch+0x9f
fffff880`05d68c50 fffff880`058277b0     : fffffa80`070cb010 fffff880`05821110 fffffa80`070cb010 fffffa80`051a3170 : srv2!Smb2ExecuteQueryDirectory+0x3a5
fffff880`05d68c80 fffff880`058276fb     : 00000000`00000001 00000000`0000000c fffffa80`051bb020 fffffa80`070cb020 : srv2!SrvProcessPacket+0xa0
fffff880`05d68cc0 fffff800`01b922e8     : 00000000`00000000 fffff800`01a37180 00000000`00000080 00000000`00000001 : srv2!SrvProcWorkerThread+0x2fb
fffff880`05d68d40 fffff800`018ecec6     : fffff800`01a37180 fffffa80`051bc040 fffffa80`04ce5450 00000000`00000000 : nt!PspSystemThreadStartup+0x194
fffff880`05d68d80 00000000`00000000     : fffff880`05d69000 fffff880`05d63000 fffff880`05d689b0 00000000`00000000 : nt!KiStartSystemThread+0x16


SYMBOL_NAME:  winfsp_x64+1a784

MODULE_NAME: winfsp_x64

IMAGE_NAME:  winfsp-x64.sys

STACK_COMMAND:  .thread ; .cxr ; kb

FAILURE_BUCKET_ID:  0x50_R_(null)_winfsp_x64+1a784

OS_VERSION:  7.1.7601.24545

BUILDLAB_STR:  win7sp1_ldr_escrow

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 7

FAILURE_ID_HASH:  {ed3ed176-609e-e666-d3c9-dd2d9c70ce85}

Followup:     MachineOwner
---------

@billziss-gh
Copy link
Collaborator

I apologize for the long delay. Unfortunately I do not have time to troubleshoot WinFsp on unsupported OS'es. For this reason I am closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants