Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_fuse: weird test failure only on freebsd #6871

Closed
ThomasWaldmann opened this issue Jul 17, 2022 · 23 comments
Closed

test_fuse: weird test failure only on freebsd #6871

ThomasWaldmann opened this issue Jul 17, 2022 · 23 comments
Assignees
Labels
bug os: bsd FreeBSD, NetBSD, OpenBSD, ...
Milestone

Comments

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Jul 17, 2022

borg 2.0.0a4, python 3.9.13, vagrant, freebsd13, pytest 7.0.1 and 6.2.5:

with open(in_fn, "rb") as in_f, open(out_fn, "rb") as out_f:
    indata = in_f.read()
    outdata = out_f.read()
    print(repr(indata))  # prints b'XXXX...'
    print(repr(outdata))  # prints b'XXXX...'
    assert type(indata) == bytes
    assert type(outdata) == bytes
>   assert indata == outdata
OSError: [Errno 78] Function not implemented
@ThomasWaldmann ThomasWaldmann changed the title weird test failure only on freebsd test_fuse: weird test failure only on freebsd Jul 17, 2022
@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Jul 17, 2022

Hah. It always happens in the last line of that block, no matter what it is. __exit__ of context manager maybe?

Yup, it is out_f.close() that produces this OSError.

@ThomasWaldmann ThomasWaldmann added this to the 2.0.0b1 milestone Jul 17, 2022
@ThomasWaldmann
Copy link
Member Author

I don't think I've seen that for borg 2.0.0a3 recently. So, is something broken in borg, on freebsd, in llfuse?

@ThomasWaldmann
Copy link
Member Author

I also tested on 3 different Linux dists, netbsd, openbsd and macOS - that issue did not show up there.

@ThomasWaldmann
Copy link
Member Author

still broken for 2.0.0b1 and freebsd 13.1-release-p1.

@ThomasWaldmann ThomasWaldmann added the os: bsd FreeBSD, NetBSD, OpenBSD, ... label Sep 17, 2022
@ThomasWaldmann
Copy link
Member Author

And it's still broken. Any freebsd user out there who could debug / fix this?

@ThomasWaldmann
Copy link
Member Author

I added the "bug" label to get this better visibility.

It doesn't look like a bug in borg though, but rather in freebsd's FUSE implementation.

@Jamie-Landeg-Jones
Copy link

Longtime FreeBSD user here, borg user since January, and little to none python skills!

What can I do to help?

@ThomasWaldmann
Copy link
Member Author

@Jamie-Landeg-Jones thanks for helping. The "Function not implemented" is only seen on FreeBSD (not on linux, not on macOS) and even on FreeBSD, this has worked before, so I suspect a bug / regression in FreeBSD's FUSE infrastructure.

Can you maybe check the changes to the fuse stuff done in freebsd, whether something looks suspect?

@Jamie-Landeg-Jones
Copy link

Hi. What I'd like to do first is replicate the error you posted and trace it from there.
I can download borg from git, and the relevant dependencies, but I'm unclear what I need to run to trigger the error.

Cheers

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Apr 8, 2023

Our docs have instructions for developers, see there for the set up:

https://borgbackup.readthedocs.io/en/1.2.4/development.html (maybe rather check out the 1.2-maint branch than master, the 1.2.x stable releases are made from this branch)

For the fuse issue, you do not need to use fakeroot and you could limit pytest to only run the fuse related tests (which triggers the error):

pip install -r requirements.d/development.txt
python setup.py clean clean2
pip install -e .[llfuse]
pytest -v --show-locals --benchmark-skip -k "not remote and fuse"

@Jamie-Landeg-Jones
Copy link

Thanks for that. I've replicated the issue in C, and with a sshfs mounted fuse filesystem.

Of course, this isn't a fix, but it will make it easier to track down the problem.

Here's what I've noticed:

As you said, it works fine on FreeBSD 12, just not on 13. I'll narrow down where tomorrow.

The error only occurs under our use case if you read from the file. (if you disable the read-check in your tests, the close works ok) [N.B. I didn't test writing]

To highlight this, I ran your test without the read check, then from the command line, run a bit of c code that simply opens the file... then closes it.... then opens it... reads 100 bytes. then closes it. then opens it then closes it.

I also did similar on a sshfs filesystem, and got the same effect, but with a different error message, leading me to believe that there is some unset / previously initialised variable leaking through.

I tested mounting ro and rw - no difference. I also tried using the "cat" unix command on one test file. The "cat" showed no error message, but again, by virtue of it reading the file, any subsequent fclose test failed.

So, basically, any file that has had some bytes read, produces that error on close, and continues to do so for the duration of the mount.

I now have enough to look at the fusefs source, and if I can't work it out, at least I'll have enough evidence to prod someone with!

bb@catwalk:~/borg % file-close-check /tmp/tmp5bsl1_xh/mountpoint/archive/input/file1
/tmp/tmp5bsl1_xh/mountpoint/archive/input/file1 | open: ok, close: ok | open: ok, read: 100 bytes, close: Function not implemented | open: ok, close: Function not implemented
bb@catwalk:~/borg %
bb@catwalk:~/borg % file-close-check /tmp/tmp5bsl1_xh/mountpoint/archive/input/file1
/tmp/tmp5bsl1_xh/mountpoint/archive/input/file1 | open: ok, close: Function not implemented | open: ok, read: 100 bytes, close: Function not implemented | open: ok, close: Function not implemented
bb@catwalk:~/borg % file-close-check /tmp/tmp5bsl1_xh/mountpoint/archive/input/file1
/tmp/tmp5bsl1_xh/mountpoint/archive/input/file1 | open: ok, close: Function not implemented | open: ok, read: 100 bytes, close: Function not implemented | open: ok, close: Function not implemented
bb@catwalk:~/borg %
bb@catwalk:~/borg %
bb@catwalk:~/borg % sshfs jamie@catflap.org:/tmp b1
bb@catwalk:~/borg %
bb@catwalk:~/borg % file-close-check b1/*
b1/COPYRIGHT                   | open: ok, close: ok | open: ok, read: 100 bytes, close: ok | open: ok, close: ok
b1/file-close-check.c          | open: ok, close: ok | open: ok, read: 100 bytes, close: Permission denied | open: ok, close: Permission denied
b1/x                           | open: ok, close: ok | open: ok, read: 100 bytes, close: Permission denied | open: ok, close: Permission denied
bb@catwalk:~/borg %
bb@catwalk:~/borg %
bb@catwalk:~/borg % file-close-check b1/*
b1/COPYRIGHT                   | open: ok, close: ok | open: ok, read: 100 bytes, close: ok | open: ok, close: ok
b1/file-close-check.c          | open: ok, close: Permission denied | open: ok, read: 100 bytes, close: Permission denied | open: ok, close: Permission denied
b1/x                           | open: ok, close: Permission denied | open: ok, read: 100 bytes, close: Permission denied | open: ok, close: Permission denied
bb@catwalk:~/borg %
bb@catwalk:~/borg % l b1
total 59
 4 drwxrwxrwt   1 root   wheel  -        256  9 Apr 04:00 ./
 3 drwxr-xr-x  14 bb     bb     uarch     36  9 Apr 03:58 ../
 8 -r--r-----   1 jamie  wheel  -      6,197  9 Apr 04:09 COPYRIGHT
 4 -rw-r-----   1 root   wheel  -      1,554  9 Apr 03:39 file-close-check.c
16 -rwxr-x---   1 root   wheel  -     13,550  9 Apr 03:51 x
bb@catwalk:~/borg %

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Apr 9, 2023 via email

@Jamie-Landeg-Jones
Copy link

Been a bit busy today, but will have another look tomorrow.

Yes, Freebsd changed it from fuse.ko to fusefs.ko in 2019, which is roughly FreeBSD 12.1 upwards. As the oldest officially supported version is 12.4, all supported versions now use fusefs.

https://cgit.freebsd.org/src/commit/sys/modules/fusefs?id=123af6ec70016f5556da5972d4d63c7d175c06d3

@ThomasWaldmann
Copy link
Member Author

Ah, so this was just a rename! Good.

I had the suspicion that they might have switched to a completely new implementation (including new bugs).

@Jamie-Landeg-Jones
Copy link

Yeah, just a rename, (I have one box that has the renamed name, but none of the bugs you discovered)

However, a number of changes since then, and other sorts of new bugs!

Anyway, I've tracked down the commit that caused the issue, and have reported it, and also posted a fix (it works, but I don't know enough about the fusefs subsystem to know if the fix is correct)

Basically, it appears to be due to an attempt to update atime on a file with no write-access. The close fails because of this (whilst it checks for write-access before attempting to write data, it doesn't seem to do so for metadata)

If you're interested, I've posted lots more information to the bug report, here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270749

And in case there are followups there, here is my email letting people know about the bug:
https://lists.freebsd.org/archives/freebsd-current/2023-April/003435.html

As for borg-tests, even when this issue is fixed, it doesn't help people running FreeBSD 13.0 and 13.1

Can you try changing the test to use '-no-atime" on the fusefs? ("-readonly" makes no difference, but if you can mount with "no-atime" I think it might.)

Of course, that only helps if no-atime is acceptable to you -- I assume it is, as the fuse mounted borgbackup is only ever read-only anyway

I hope this helps!

Cheers, Jamie.

P.s. I've found a couple of potential borg issues recently, but I've not tested them on 2.X yet, and need to narrow them down (and make sure it's not user error!) before posting them.

Cheers, Jamie

@ThomasWaldmann ThomasWaldmann added this to the 2.0.0b6 milestone Apr 11, 2023
@Jamie-Landeg-Jones
Copy link

No update yet ... :-(

@ThomasWaldmann ThomasWaldmann modified the milestones: 2.0.0b6, 2.0.0rc1 May 12, 2023
@ThomasWaldmann ThomasWaldmann modified the milestones: 2.0.0rc1, 2.0.0b7 Jun 10, 2023
@ThomasWaldmann
Copy link
Member Author

Considering that this does not seem to be a borg bug, I'll keep this open as a reminder, but move it to rc1 milestone.

@ThomasWaldmann ThomasWaldmann modified the milestones: 2.0.0b7, 2.0.0rc1 Aug 26, 2023
@Jamie-Landeg-Jones
Copy link

The official fix has now been applied to "MAIN', and will hit all supported stable channels in a week.

This means, the fix should be all release versions AFTER 13.2-RELEASE (I don't know if there is going to be a 13.3-RELEASE, but 14.0-RELEASE is imminent)

Ref: https://cgit.freebsd.org/src/commit/?id=fb619c94c679e939496fe0cf94b8d2cba95e6e63

Cheers! Jamie

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Sep 22, 2023

@Jamie-Landeg-Jones great news, thanks!

So guess it might be in the next release there (which is what I use for borg release testing and pyinstaller borg binary building):

https://app.vagrantup.com/generic/boxes/freebsd13

@Jamie-Landeg-Jones
Copy link

I'm not familiar with vagrant, but I just had a look at your link, and that version they updated a few days ago is indeed based on FreeBSD-13.2-RELEASE so yes, their next release should have it.

Of course, vagrant cloud may stop at 13.2 seeing that 14.0 will be released soon, I don't know, but if they keep updating their freebsd13 branch, then yes, the next update will presumably be 13.3-release, which will have the fix.

One thing, 13.2 was released on April 11th, so there may be a bit of a lag on vagrant providing the upgrade when/if 13.3 comes out.

Cheers!

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Dec 23, 2023

borg todo: upgrade the freebsd vagrant machine to a more recent freebsd and check if the issue disappears.

update:

  • did a test on freebsd14 and the error has vanished, all borg tests pass.
  • running same borg code and tests in my freebsd13 vagrant VM still reproduces the issue.

So I will just upgrade all the vagrant VMs to 14.

@Jamie-Landeg-Jones thanks for helping!

    freebsd64:   py39-fuse2: OK (900.77=setup[109.99]+cmd[790.78] seconds)
    freebsd64:   py310-fuse2: OK (993.96=setup[211.16]+cmd[782.79] seconds)
    freebsd64:   py311-fuse2: OK (987.43=setup[212.72]+cmd[774.71] seconds)
    freebsd64:   py312-fuse2: OK (1046.30=setup[224.04]+cmd[822.25] seconds)
    freebsd64:   congratulations :) (3930.93 seconds)

@ThomasWaldmann ThomasWaldmann modified the milestones: 2.0.0rc1, 1.4.0a1 Dec 23, 2023
@ThomasWaldmann ThomasWaldmann self-assigned this Dec 24, 2023
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Dec 24, 2023
@Jamie-Landeg-Jones
Copy link

No problem!

For reference, this fix has entered the FreeBSD 13 tree, but there hasn't been a release yet - 13.2 has been around a while. When 13.3 is out, the fix will be in there.

Still, 13.X is still running for those who have problems upgrading, so if you've got 14.X running, it's best to stick with that!

@ThomasWaldmann
Copy link
Member Author

1.4-maint: fixed in #7988

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug os: bsd FreeBSD, NetBSD, OpenBSD, ...
Projects
None yet
Development

No branches or pull requests

2 participants