Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Background FUSE issue with Cygwin 32 bits #161

Closed
benrubson opened this issue Apr 26, 2018 · 40 comments
Closed

Background FUSE issue with Cygwin 32 bits #161

benrubson opened this issue Apr 26, 2018 · 40 comments
Milestone

Comments

@benrubson
Copy link
Contributor

benrubson commented Apr 26, 2018

Hi,

When using Cygwin 32 bits on a 64 bits OS, FUSE FS (EncFS) does not start, sounds like it fails to be sent to background.
If started in foreground (with -f, or with -v which must implies -f), then it works.

Tested on Windows 7 64 bits and 2012 R2 64 bits, with WinFsp 2017.2 and 2018.1 B2.
No issue there using Cygwin 64 bits.

I just made a test using Cygwin 32 bits on Windows 7 32 bits, and it works correctly.

Pretty strange.

Thank you 👍

Ben

@benrubson
Copy link
Contributor Author

passthrough-cygfuse has the same behavior / gives the same results.

@billziss-gh
Copy link
Collaborator

Thanks for the report.

Let me make sure I understand. Is the following matrix correct?

. Win64 Win32
Cyg64 ✔️ N/A
Cyg32 ✔️
  • ✔️: can fuse_daemonize
  • ❌: cannot fuse_daemonize

@benrubson
Copy link
Contributor Author

Yes Bill, your matrix is right.
Of course there's (at least I think) no reason to use Cygwin32 on a 64 bits OS, but we never know what user could do...

@billziss-gh
Copy link
Collaborator

Of course there's (at least I think) no reason to use Cygwin32 on a 64 bits OS, but we never know what user could do...

I think some people used to prefer it because it has more packages. I do not know if this is still true today. (I use Cygwin64 myself.)

In any case we should make sure that it works in all cases.

A few questions:

  • When the file system dies, does it disappear without trace or does it leave some information behind (e.g. in the Windows event log)?
  • Does this happen on a single Cyg32 on Win64 system or have you confirmed on more?
  • Does this happen if you link against the WinFsp DLL's directly rather than Cygfuse?

There is nothing obvious in the fuse_daemonize implementation that suggests a problem.

@billziss-gh
Copy link
Collaborator

billziss-gh commented Apr 26, 2018

This seems to work for me (Cygwin32 on Windows 10 64-bit):

$ make cygfuse
gcc passthrough-fuse.c -o passthrough-cygfuse -g -Wall `pkg-config fuse --cflags --libs`
$ ./passthrough-cygfuse.exe . y:
$ ps
      PID    PPID    PGID     WINPID   TTY         UID    STIME COMMAND
     1048       1    1048       1048  ?         197609 09:41:23 /cygdrive/c/Users/billziss/Projects/winfsp/tst/passthrough-fuse/passthrough-cygfuse
     3344     868    3344       4408  cons0     197609 09:41:33 /usr/bin/ps
      868       1     868        868  cons0     197609 09:40:55 /usr/bin/bash
$ ls -l /cygdrive/y
total 236
-rw-r--r-- 1 billziss None    620 Jan 12 14:36 Makefile
-rw-r--r-- 1 billziss None    341 Jan 12 14:36 README.md
drwxr-xr-x 1 billziss None      0 Apr 23 15:30 build
-rwxr-xr-x 1 billziss None 175113 Apr 26 09:41 passthrough-cygfuse.exe
-rw-r--r-- 1 billziss None   9640 Apr  2 23:13 passthrough-fuse.c
-rw-r--r-- 1 billziss None   1313 Jan 12 14:36 passthrough-fuse.sln
-rw-r--r-- 1 billziss None  10382 Jan 12 14:36 passthrough-fuse.vcxproj
-rw-r--r-- 1 billziss None    708 Jan 12 14:36 passthrough-fuse.vcxproj.filters
-rw-r--r-- 1 billziss None  16420 Apr  2 23:13 winposix.c
-rw-r--r-- 1 billziss None   2415 Apr  2 23:13 winposix.h
$ uname -a
CYGWIN_NT-10.0-WOW windows 2.9.0(0.318/5/3) 2017-09-12 10:41 i686 Cygwin
$ cmd /c ver

Microsoft Windows [Version 10.0.14393]
$ systeminfo | grep x64
System Type:               x64-based PC

@benrubson
Copy link
Contributor Author

I think some people used to prefer it because it has more packages.

I see that it also allows to build a (32 bits) binary which would work on both 32 and 64 bits OS.

When the file system dies, does it disappear without trace or does it leave some information behind (e.g. in the Windows event log)?

No stacktrace, no error message in Windows event log, and no Cygwin log.

Does this happen on a single Cyg32 on Win64 system or have you confirmed on more?

I have confirmed on a newly built Windows 7 & 2012 R2 64 bits VMs, with last version of Cygwin32 installed from scratch.

Does this happen if you link against the WinFsp DLL's directly rather than Cygfuse?

I just tried but can't manage to make winfsp-fuse with Cygwin32 on a 64 bits Windows. If I swap DLLs files to make the build to succeed, then the program won't start.

@benrubson
Copy link
Contributor Author

benrubson commented Apr 26, 2018

CYGWIN_NT-10.0-WOW windows 2.9.0(0.318/5/3) 2017-09-12 10:41 i686 Cygwin

Mine :
CYGWIN_NT-6.1-WOW win764 2.10.0(0.325/5/3) 2018-02-02 15:21 i686 Cygwin
Let me give 2.9 a try.

@billziss-gh
Copy link
Collaborator

Does this happen if you link against the WinFsp DLL's directly rather than Cygfuse?

I just tried but can't manage to make winfsp-fuse with Cygwin32 on a 64 bits Windows. If I swap DLLs files to make the build to succeed, then the program won't start.

Argh, yes. When linking against WinFsp directly the fuse.pc file in /cygdrive/c/Program Files (x86)/WinFsp/lib references the x64 architecture. You would have to manually change to x86, or specify your own fuse.pc in the pkg-config command line.

@benrubson
Copy link
Contributor Author

Let me give 2.9 a try.

It works on the same system using Cygwin 2.9 (same version as yours).
A chance you had it !

@billziss-gh
Copy link
Collaborator

Wow, so something important changed in Cygwin 32 2.9 -> 2.10 ?!

@billziss-gh
Copy link
Collaborator

Or perhaps a different compiler, etc.

@billziss-gh
Copy link
Collaborator

billziss-gh commented Apr 26, 2018

BTW, this may also be a problem with Cygwin fork. Unfortunately it is not very reliable for a number of reasons (most of them outside Cygwin's control).

Check out these Cygwin FAQ entries:

4.44. What applications have been found to interfere with Cygwin?
4.45. How do I fix fork() failures?

@benrubson
Copy link
Contributor Author

benrubson commented Apr 26, 2018

When linking against WinFsp directly the fuse.pc (...) would have to manually change to x86

It helps at build time, but then at runtime :
passthrough-winfsp-fuse.exe: error while loading shared libraries: winfsp-x86.dll: cannot open shared object file: No such file or directory

so something important changed in Cygwin 32 2.9 -> 2.10 ?!

I think so too, something has certainly been broken after 2.9 release.
Btw do you manage to reproduce the issue with 2.10 version ?

@billziss-gh
Copy link
Collaborator

It helps at build time, but then at runtime

When linking with WinFsp-FUSE directly, you do not have cygfuse to help find the DLL's via the registry.

So you have to help the Windows loader by copying the DLL to the same directory, adding it to the PATH, etc.

@benrubson
Copy link
Contributor Author

Does this happen if you link against the WinFsp DLL's directly rather than Cygfuse?

So ! No !
passthrough-winfsp-fuse actually works !

@billziss-gh
Copy link
Collaborator

Very interesting.

So we only have a failure with Cygwin 2.10 32-bit through Cygfuse on Win64. Same version of Cygwin works when linked against WinFsp-FUSE directly. Correct?

@benrubson
Copy link
Contributor Author

Yes correct, good summary 👍

@billziss-gh
Copy link
Collaborator

billziss-gh commented Apr 26, 2018

One final question: have you tried any of the troubleshooting steps in https://cygwin.com/faq/faq.html#faq.using.fixing-fork-failures.

(In particular the rebase trick. I am wondering if the cygfuse DLL has an address conflict with some other component.)

It may take me a couple of days to install 2.10 and try it as I am currently busy with some other work.

@benrubson
Copy link
Contributor Author

benrubson commented Apr 26, 2018

have you tried any of the troubleshooting steps

I'm on this, for sure it's not BLODA related, as I used at least 2 different fresh-installed Windows versions for the tests.

One related question to this, when U'll have time, what's the difference between linking to CYGFUSE and linking directly with the WinFsp DLL ? Is there generally a preferred method ?
As build is done with Cygwin in both cases, Cygwin DLLs will still be required, so I can't really see the difference :) Thx !

@billziss-gh
Copy link
Collaborator

what's the difference between linking to CYGFUSE and linking directly with the WinFsp DLL

CYGFUSE is a very simple wrapper around WinFsp-FUSE, that forwards FUSE calls. Its reasons of existence in order of importance:

  • The WinFsp DLL provides a FUSE compatible API on both native Windows and Cygwin. There are significant differences between the 2 environments (e.g. different malloc, different errno's, etc.) and the provided FUSE API must account for those. For this reason, the DLL exports fsp_fuse_* symbols rather than fuse_* symbols. The fsp_fuse_* symbols accept an extra parameter env that "describes" the environment used. The CYGFUSE DLL simplifies this by providing fuse_* symbols appropriate for use from Cygwin.

  • CYGFUSE knows how to find and load WinFsp. The CYGFUSE DLL is installed in /usr/bin and is readily available to FUSE programs.

  • CYGFUSE makes it easy to build FUSE programs by using pkg-config.

@benrubson
Copy link
Contributor Author

benrubson commented Apr 26, 2018

Thank you Bill, clearly understood 👍

have you tried any of the troubleshooting steps

So none of the given solutions worked :-/

I gave several different Cygwin versions a try :
https://cygwin.com/snapshots/

  • 2017-12-09 : works, this is Cygwin 2.9
  • 2017-10-18 : works
  • 2017-11-14 : works
  • 2017-12-01 : does not work
  • 2017-12-10 : does not work
  • 2018-02-02 : does not work, this is current Cygwin 2.10

So I would say something was broken between 2017-11-14 and 2017-12-01.

@billziss-gh
Copy link
Collaborator

Thanks for the great troubleshooting. I am getting quite curious about this one and may look at it sooner rather than later :)

@billziss-gh
Copy link
Collaborator

CYGFUSE has this ugly hack which may actually be related to the problem:

https://github.com/billziss-gh/winfsp/blob/master/opt/cygfuse/cygfuse.c#L55-L73

@billziss-gh
Copy link
Collaborator

I am able to reproduce on Win10 with Cygwin32 2.10.0.

@billziss-gh
Copy link
Collaborator

I created a modified version of CYGFUSE and added some tracing calls and a sleep after the daemon call. This proved that daemonization actually works and it also allowed me to attach a debugger to the daemonized process.

Soon after attaching the debugger and after the sleep finished, I got this lovely dialog box:

capture

The status code 0xC0000028 is STATUS_BAD_STACK. My (limited) understanding of this status code, is that it will be thrown when the stack looks bad during Windows Structured Exception unwinding. Obviously it is a pretty fatal condition.

The problem happens within the StartServiceCtrlDispatcher API, which is called by the WinFsp DLL during FUSE initialization (and after daemonization if any). This call allows a WinFsp-FUSE file system to also act as a service (if launched within the Windows Service context). The call will fail when launched from the command line, which is expected and benign.

My theory is that StartServiceCtrlDispatcher fails (as it supposed to) and this failure is communicated internally as a Windows Structured Exception. During unwinding Windows examines the stack and finds it wrong for some reason.

This is as much debugging as I have had the time to do so far.

@billziss-gh
Copy link
Collaborator

billziss-gh commented Apr 27, 2018

The problem appears to be caused by the -fstack-protector gcc switch. Modifying the passthrough-fuse Makefile to add a -fno-stack-protector gcc switch fixes the problem.

This Cygwin commit may be related: ssp: add APIs for Stack Smashing Protection

I am inclined to think that this is a Cygwin problem in the WOW64 (Windows 32-bit on Windows 64-bit) environment and not related to WinFsp.

It might be possible to create a minimal repro that forks/daemonizes and calls StartServiceCtrlDispatcher.

@benrubson
Copy link
Contributor Author

benrubson commented Apr 27, 2018

Wow, very interesting, really nice investigation Bill !

Though I'm not sure to understand why it fails linking to CYGFUSE, but not linking directly with the WinFsp DLL, as in both cases I assume StartServiceCtrlDispatcher comes into play ?

Modifying the passthrough-fuse Makefile to add a -fno-stack-protector gcc switch fixes the problem.

This is a nice workaround, but as you stated, I'm not sure this is the way to go.

Do you want me to open an issue / ask for support through the Cygwin dev list ?

@billziss-gh
Copy link
Collaborator

Do you want me to open an issue / ask for support through the Cygwin dev list ?

Good idea. This is likely related to their recent stack smashing and SSP commits, although we do not have a simple repro for them.

@benrubson
Copy link
Contributor Author

Here's the request to Cygwin list : https://cygwin.com/ml/cygwin-developers/2018-04/msg00025.html

@billziss-gh
Copy link
Collaborator

Ben, thanks. I just subscribed to cygwin-developers to follow that conversation.

@billziss-gh
Copy link
Collaborator

@benrubson just wondering if you have seen Corinna's response in cygwin-developers and whether you have any plans to follow up on this.

@billziss-gh
Copy link
Collaborator

Ping @benrubson. Just wondering whether you will have any time to investigate this further?

@benrubson
Copy link
Contributor Author

benrubson commented Jun 20, 2018

Sorry for my late answer Bill, I was fully stuck last days...
I'm now back on these subjects.
And thank you very much for tracking this down
👍

So, Jony on the ML seems to say there was a change regarding compilation options around SSP.
Which would confirm your findings.
And Corinna advises to bisect Cygwin or to debug the program using GDB.
I'm not really sure how to properly perform the debugging, as I almost never used GDB.
Do we expect to see the relevant info setting a breakpoint to Cygwin's exception::handle method ?
I first plan to bisect, as it may help to find the culprit.
Thank you again 👍

@billziss-gh
Copy link
Collaborator

@benrubson thanks. Did not mean to offload this work to you, just wondered if it was something that you were planning to do :)

Re: bisecting. I would probably start at the suspicious commit and see if a Cygwin that includes it has the problem, while a Cygwin that does not include it does not have the problem.

Another thing to consider is whether the gcc defaults changed between Cygwin 2.9 and 2.10. For example, it might be that gcc in Cygwin 2.9 compiles without fstack-protector by default, whereas gcc in Cygwin 2.10 compiles with fstack-protector by default.

@benrubson
Copy link
Contributor Author

OK Bill :)

whether the gcc defaults changed between Cygwin 2.9 and 2.10

You certainly mean between the culprit commit I'll find out bisecting, and the the one just before ?

@billziss-gh
Copy link
Collaborator

You certainly mean between the culprit commit I'll find out bisecting, and the the one just before ?

Yes. The process is usually:

  • Start with a range of commits A < B, where you know that your test passes at A and fails at B.
  • Pick the middle commit M between A and B. Build and try your test. If the test passes the suspect commit is between M and B. If the test fails the suspect commit is between A and M.
  • Repeat the process.

In our case we already have an initial suspect commit (the SSP commit). So I suggested trying to short-circuit this long process by trying that commit and its predecessor.

@billziss-gh
Copy link
Collaborator

billziss-gh commented Jul 26, 2018

The -fno-stack-protector option no longer works for me as of Cygwin 2.10.0(0.325/5/3).

$ uname -a
CYGWIN_NT-10.0-WOW windows 2.10.0(0.325/5/3) 2018-02-02 15:21 i686 Cygwin

I have also tested with an executable built under Cygwin 2.9. It works correctly on Cygwin 2.9, but not on Cygwin 2.10. So this may not be related to the compiler after all.

@billziss-gh
Copy link
Collaborator

An update on this.

Although I have never found the reason that this problem happens on latest Cygwin, I have successfully worked around it. The workaround was included in the latest WinFsp beta (2018.2 B2), but unfortunately I managed to screw things up at a different place and broke daemonization on ALL Cygwin platforms in that release.

A further commit (bef5ba7) further fixes daemonization, which should now work on ALL Cygwin platforms (at least in my testing). It will be included in the next beta release.

@billziss-gh billziss-gh added this to the v1.4 milestone Aug 1, 2018
@benrubson
Copy link
Contributor Author

Hi Bill,

The -fno-stack-protector option no longer works for me as of Cygwin 2.10.0(0.325/5/3).

Were you able to figure out what changed between your previous tests and this failing one ?

Although I have never found the reason that this problem happens on latest Cygwin...

Well you answer here :)

I have successfully worked around it.

Really good news ! Many thanks for your work !
So the question, how did you manage to find the workaround ? Only by testing ? Or did you find a clue somewhere ?

So finally, issue was not on Cygwin side ?
Can I post to the mailing list to say issue is now closed ?
Or do you think something may remain there ?

I'll be glad to test the next beta and report 👍

Thank you again !

@billziss-gh
Copy link
Collaborator

I have not found the cause of the problem. I still find it likely that it is a Cygwin problem in the WOW environment.

Feel free to close the issue on the Cygwin side if you want. However the understanding should be that the problem was not resolved, only worked around.

The -fno-stack-protector option no longer works for me as of Cygwin 2.10.0(0.325/5/3).
Were you able to figure out what changed between your previous tests and this failing one ?

I was not, although this was on a different computer than the original one.

So the question, how did you manage to find the workaround ? Only by testing ? Or did you find a clue somewhere ?

At some point I decided that it would be more productive for me to look into a workaround rather than trying to resolve this. This is especially because the problem appears to happen under a very specific set of circumstances that I could easily change.

At the same time I was planning to do a reimplementation of FUSE loops to resolve #135. Recall that the problem was happening while executing StartServiceCtrlDispatcher from the thread that did fuse_daemonize. So I ended up moving the StartServiceCtrlDispatcher API call in a new Win32 thread (and hence a new stack without any Cygwin stack frames in it) that is created after daemonization. This fixed both #135 and this issue!

It should be noted here that because we did not really resolve the issue, the problem may reappear later; perhaps when calling another Windows API that throws an exception and during unwinding finds the stack not right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants