Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[f29 aarch64] segfaults in post scripts #1663

Open
timcoote opened this issue Nov 5, 2018 · 19 comments

Comments

@timcoote
Copy link

@timcoote timcoote commented Nov 5, 2018

Host system details
Raspberrypi 3

Provide the output of rpm-ostree status.

State: idle
AutomaticUpdates: disabled
Deployments:
● ostree://fedora-iot:fedora/29/aarch64/iot
                   Version: 29.20181027.0 (2018-10-27 11:33:14)
                    Commit: 75eaac907aea31e1aecd7e41c76f640e8d5866986fed99dc1cf0a48bf7437b91
              GPGSignature: Valid signature by 5A03B4DD8254ECA02FDA1637A20AA56B429476B4

Expected vs actual behavior

 sudo rpm-ostree install kernel-modules-extra
Checking out tree 75eaac9... done
Enabled rpm-md repositories: fedora-modular updates-modular updates fedora
rpm-md repo 'fedora-modular' (cached); generated: 2018-10-28 11:01:36
rpm-md repo 'updates-modular' (cached); generated: 2018-10-31 14:34:38
rpm-md repo 'updates' (cached); generated: 2018-11-05 01:49:46
rpm-md repo 'fedora' (cached); generated: 2018-10-28 11:00:32
Importing metadata [=============] 100%
Resolving dependencies... done
Will download: 1 package (1.2 MB)
  Downloading from fedora: [=============] 100%
Importing (1/1) [=============] 100%
Checking out packages (2/2) [=============] 100%
Running pre scripts... 0 done
Running post scripts... error: Running %post for kernel-modules-extra: Executing bwrap(/bin/sh): Child process killed by signal 7; run `journalctl -t 'rpm-ostree(kernel-modules-extra.post)'` for more information

Expected:

rpm-ostree install kernel-modules-extra
...
Success!

Steps to reproduce it

Provide any additional data that may help debug this - which specific version of
an RPM is in the repo, or any host system configuration.

The exact errors seem intermittent. Is it just a service availability issue on one of the repos?

Would you like to work on the issue?
Sorry, I'm too far behind the curve on this. Happy to provide more info.
journalctl.txt

@timcoote

This comment has been minimized.

Copy link
Author

@timcoote timcoote commented Nov 5, 2018

When I then try sudo rpm-ostree upgrade, I get a similar failure to above (see journalctl-full.txt)
journactl-full.txt.

Just the relevant journaltctl -b -u rpm-ostreed is in rpm-ostreed.txt. The former was started at the broken invocation, rather than everything since reboot.
rpm-ostreed.txt

@cgwalters cgwalters changed the title f29 aarch64 [f29 aarch64] segfaults in post scripts Nov 5, 2018
@cgwalters

This comment has been minimized.

Copy link
Member

@cgwalters cgwalters commented Nov 5, 2018

Nov 05 14:09:55 localhost.localdomain systemd-coredump[2263]: Process 2256 (sort) of user 0 dumped core.
                                                              
                                                              Stack trace of thread 10:
                                                              #0  0x0000ffffa8037880 FIPS_mode_set (libcrypto.so.1.1)
Nov 05 14:09:55 localhost.localdomain systemd-coredump[2264]: Process 2254 (sed) of user 0 dumped core.
                                                              
                                                              Stack trace of thread 8:
                                                              #0  0x0000ffffb9c57d60 re_set_syntax (libc.so.6)

Hm. Does the system seem to function OK outside of rpm-ostree? I mean, here we have sed and sort segfaulting when compiling a regexp and...inside FIPS mode init respectively.

Can you e.g. use podman or run vi or whatever?

@timcoote

This comment has been minimized.

Copy link
Author

@timcoote timcoote commented Nov 5, 2018

Both podman and vi seem ok to me. Although I don't have any containers to work with, so it's not exactly an exhaustive podman thrashing. Buildah won't rpm-ostree install, tho' so I'm not sure that I can construct containers conveniently. sed works from bash, too.

On a second attempt, buildah installed.....

@timcoote

This comment has been minimized.

Copy link
Author

@timcoote timcoote commented Nov 7, 2018

I have no idea whether this is significant or a red herring and I have a few unique pis3s.

I tried Version: 29.20181106.0 (2018-11-06T11:14:51Z). The first program failing seems to be rofiles-fuse. I managed to install gdb to see if there was anything that was consistent. If I sudo gdb /usr/bin/rofiles-fuse <cordeump>, all fails seem to have been generated by the same command:
rofiles-fuse --copyup usr /tmp/rpmostree-rofiles-fuse.<tmpname>

eg:

sudo gdb /usr/bin/rofiles-fuse /var/lib/systemd/coredump/core.rofiles-fuse.0.158d42d3fb084c519422889eab67c95a.1960.1541602984000000
GNU gdb (GDB) Fedora 8.2-3.fc29
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/rofiles-fuse...(no debugging symbols found)...done.
[New LWP 1961]
[New LWP 1962]
[New LWP 1960]
[New LWP 1971]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `rofiles-fuse --copyup usr /tmp/rpmostree-rofiles-fuse.BFdBIb'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000ffffb1ae4008 in do_releasedir () from /lib64/libfuse.so.2
[Current thread is 1 (Thread 0xffffb0124fd0 (LWP 1961))]
Missing separate debuginfos, use: dnf debuginfo-install ostree-2018.9-1.fc29.aarch64
(gdb) quit
@cgwalters

This comment has been minimized.

Copy link
Member

@cgwalters cgwalters commented Nov 8, 2018

Can you attach the output of t a a bt in gdb from that core?

@cgwalters

This comment has been minimized.

Copy link
Member

@cgwalters cgwalters commented Nov 8, 2018

@cgwalters

This comment has been minimized.

Copy link
Member

@cgwalters cgwalters commented Nov 8, 2018

You should be able to experiment with rofiles-fuse outside of rpm-ostree. I suspect this is something going wrong in FUSE generically, although it could definitely be rofiles-fuse specific.

See man rofiles-fuse for some examples; you don't need to use ostree even, just like:

$ mkdir mnt content
$ touch content/existingfile
$ rofiles-fuse content mnt
$ ls -al mnt/existingfile
  # And at this point just create/remove files in both mnt/ and content/
  # To clean up:
$ fusermount -u mnt
@timcoote

This comment has been minimized.

Copy link
Author

@timcoote timcoote commented Nov 9, 2018

(gdb) t a a bt

Thread 4 (Thread 0xffffaf0defd0 (LWP 1971)):
#0  0x0000ffffb0fcd1c8 in read () from /lib64/libpthread.so.0
#1  0x0000ffffb1ae1a2c in fuse_kern_chan_receive () from /lib64/libfuse.so.2
#2  0x0000ffffb1ae3354 in fuse_ll_receive_buf () from /lib64/libfuse.so.2
#3  0x0000ffffb1ae20a4 in fuse_do_work () from /lib64/libfuse.so.2
#4  0x0000ffffb0fc37f8 in start_thread () from /lib64/libpthread.so.0
#5  0x0000ffffb0f1744c in thread_start () from /lib64/libc.so.6

Thread 3 (Thread 0xffffb0148010 (LWP 1960)):
#0  0x0000ffffb0fcc160 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x0000ffffb0fcc280 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x0000ffffb1ae2344 in fuse_session_loop_mt () from /lib64/libfuse.so.2
#3  0x0000ffffb1ae72f8 in fuse_loop_mt () from /lib64/libfuse.so.2
#4  0x0000ffffb1ae9fe0 in fuse_main_common () from /lib64/libfuse.so.2
#5  0x0000aaaace2c59c0 in ?? ()
#6  0x0000ffffb0e66d24 in __libc_start_main () from /lib64/libc.so.6
#7  0x0000aaaace2c5a70 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further

Thread 2 (Thread 0xffffaf901fd0 (LWP 1962)):
#0  0x0000ffffb0fcd1cc in read () from /lib64/libpthread.so.0
#1  0x0000ffffb1ae1a2c in fuse_kern_chan_receive () from /lib64/libfuse.so.2
#2  0x0000ffffb1ae3354 in fuse_ll_receive_buf () from /lib64/libfuse.so.2
#3  0x0000ffffb1ae20a4 in fuse_do_work () from /lib64/libfuse.so.2
#4  0x0000ffffb0fc37f8 in start_thread () from /lib64/libpthread.so.0
#5  0x0000ffffb0f1744c in thread_start () from /lib64/libc.so.6

Thread 1 (Thread 0xffffb0124fd0 (LWP 1961)):
#0  0x0000ffffb1ae4008 in do_releasedir () from /lib64/libfuse.so.2
#1  0x0000ffffb1ae507c in fuse_ll_process_buf () from /lib64/libfuse.so.2
#2  0x0000ffffb1ae2190 in fuse_do_work () from /lib64/libfuse.so.2
--Type <RET> for more, q to quit, c to continue without paging-- 
#3  0x0000ffffb0fc37f8 in start_thread () from /lib64/libpthread.so.0
#4  0x0000ffffb0f1744c in thread_start () from /lib64/libc.so.6
(gdb)
@timcoote

This comment has been minimized.

Copy link
Author

@timcoote timcoote commented Nov 9, 2018

I'm not clear whether rofiles-fuse is supposed to work like this, but I'm sure that you do, and nothing crashes:

[tim@localhost ~]$ sh -x ./go
+ mkdir mnt content
+ touch content/existingfile
+ rofiles-fuse content mnt
+ ls -la mnt/existingfile
-rw-rw-r--. 1 tim tim 0 Nov  9 08:44 mnt/existingfile
+ touch content/newcontent
+ ls -la mnt content
content:
total 8
drwxrwxr-x. 2 tim tim 4096 Nov  9 08:44 .
drwx------. 6 tim tim 4096 Nov  9 08:44 ..
-rw-rw-r--. 1 tim tim    0 Nov  9 08:44 existingfile
-rw-rw-r--. 1 tim tim    0 Nov  9 08:44 newcontent

mnt:
total 8
drwxrwxr-x. 2 tim tim 4096 Nov  9 08:44 .
drwx------. 6 tim tim 4096 Nov  9 08:44 ..
-rw-rw-r--. 1 tim tim    0 Nov  9 08:44 existingfile
-rw-rw-r--. 1 tim tim    0 Nov  9 08:44 newcontent
+ touch mnt/newmnt
+ ls -la mnt content
content:
total 8
drwxrwxr-x. 2 tim tim 4096 Nov  9 08:44 .
drwx------. 6 tim tim 4096 Nov  9 08:44 ..
-rw-rw-r--. 1 tim tim    0 Nov  9 08:44 existingfile
-rw-rw-r--. 1 tim tim    0 Nov  9 08:44 newcontent
-rw-rw-r--. 1 tim tim    0 Nov  9 08:44 newmnt

mnt:
total 8
drwxrwxr-x. 2 tim tim 4096 Nov  9 08:44 .
drwx------. 6 tim tim 4096 Nov  9 08:44 ..
-rw-rw-r--. 1 tim tim    0 Nov  9 08:44 existingfile
-rw-rw-r--. 1 tim tim    0 Nov  9 08:44 newcontent
-rw-rw-r--. 1 tim tim    0 Nov  9 08:44 newmnt
[tim@localhost ~]$ 

I don't know whether this is relevant, but I did create the error message Transport endpoint is not connected, which I've seen in some of the rpm-ostree install failures by rofiles-fuseing the same directory twice.

EDIT by @dustymabe to improve markdown formatting for web viewing.

@timcoote

This comment has been minimized.

Copy link
Author

@timcoote timcoote commented Nov 13, 2018

I don't know whether this helps or not, but persistence is sometimes a workaround:

[tim@localhost open-zwave]$ sudo rpm-ostree install systemd-devel
Checking out tree 974b2ff... done
Enabled rpm-md repositories: updates fedora updates-modular fedora-modular
rpm-md repo 'updates' (cached); generated: 2018-11-13T02:14:44Z
rpm-md repo 'fedora' (cached); generated: 2018-10-28T11:00:32Z
rpm-md repo 'updates-modular' (cached); generated: 2018-11-13T04:23:14Z
rpm-md repo 'fedora-modular' (cached); generated: 2018-10-28T11:01:36Z
Importing metadata [=============] 100%
Resolving dependencies... done
Will download: 1 package (299.8 kB)
  Downloading from updates: [=============] 100%
Importing (1/1) [=============] 100%
Checking out packages (81/81) [=============] 100%
Running pre scripts... 1 done
Running post scripts... error: Executing %transfiletriggerin for info: Executing bwrap(/bin/sh): Child process killed by signal 7; run `journalctl -t 'rpm-ostree(info.transfiletriggerin)'` for more information
[tim@localhost open-zwave]$ sudo rpm-ostree install systemd-devel
Checking out tree 974b2ff... done
Enabled rpm-md repositories: updates fedora updates-modular fedora-modular
rpm-md repo 'updates' (cached); generated: 2018-11-13T02:14:44Z
rpm-md repo 'fedora' (cached); generated: 2018-10-28T11:00:32Z
rpm-md repo 'updates-modular' (cached); generated: 2018-11-13T04:23:14Z
rpm-md repo 'fedora-modular' (cached); generated: 2018-10-28T11:01:36Z
Importing metadata [=============] 100%
Resolving dependencies... done
Checking out packages (81/81) [=============] 100%
Running pre scripts... 1 done
Running post scripts... 9 done
Writing rpmdb... done
Writing OSTree commit... done
Staging deployment... done
Freed: 202.4 MB (pkgcache branches: 82)
Added:
  systemd-devel-239-6.git9f3aed1.fc29.aarch64
Run "systemctl reboot" to start a reboot
[tim@localhost open-zwave]$ 

ie, the install failed, then worked. The only major difference being the download and import of one package.

It may also be worth noting that the whole process seems to spend a lot of time waiting for the SDCard on the pis.

@dustymabe

This comment has been minimized.

Copy link
Member

@dustymabe dustymabe commented Nov 14, 2018

Maybe we are sensitive somehow to really slow media? Or maybe it's possible the SD card is becoming un-reliable? Might be worth trying on a different one to see if you can reproduce easily. It would at least be a good datapoint.

@timcoote

This comment has been minimized.

Copy link
Author

@timcoote timcoote commented Nov 14, 2018

I tried that on a new pi 3 from Element 14, which comes with a 16GB class 10 micro SDCard, with Version: 29.20181113.0 (2018-11-13T13:36:26Z). Same symptoms.

I tried rpm-ostree install git, it failed with the child killed by signal 7 message; first ANOM_ABEND in journalctl is rofiles-fuse. I've not gdb'd the corefiles as it takes a while to install gdb.

After reboot, rpm-ostree install git failed. But then it succeeded.

It may be significant that before the fail error message, there was a message that reached the tty: [ 403.068321] fuse init (API version 7.27). After which the successful install happened.

Would it help to have the symbols for rofiles-fuse (or something else), to get a better view of the stacktrace? (if so, how?).

@nullr0ute

This comment has been minimized.

Copy link

@nullr0ute nullr0ute commented Nov 16, 2018

Maybe we are sensitive somehow to really slow media? Or maybe it's possible the SD card is becoming un-reliable? Might be worth trying on a different one to see if you can reproduce easily. It would at least be a good datapoint.

I'm also wondering whether there's use of some other HW, like crc HW accel that's causing us issues here but I'm not sure if the fuse things would use kernel crc which might be accelerated. I have seem errors around CRC in some of the output of crashes. We seem to be some how tweaking the HW (for which bit I'm not sure) in some unusual way which we don't see with a non ostree Fedora install.

@cgwalters

This comment has been minimized.

Copy link
Member

@cgwalters cgwalters commented Nov 16, 2018

I bought a RPi a while ago but never got around to using it...will dig through my stuff to see if I can find it.

One thing I notice looking at the OpenSSL source is that FIPS_mode_set fairly quickly calls into thread local storage. I thought re_set_syntax() might do the same, but apparently not. (But do you somehow have FIPS mode enabled?)

I'm also wondering whether there's use of some other HW, like crc HW accel that's causing us issues here but I'm not sure if the fuse things would use kernel crc which might be accelerated.

Only thing I can think of related to that is that rpm-ostree calls into grub2-mkconfig which calls os-prober which will definitely end up initializing the kernel crypto API since it probes for RAID which uses checksums.

The rofiles-fuse segfaults seem to point to the biggest issue here, particularly if we're reliably crashing in do_releasedir.

Use of FUSE by default is definitely something that distinguishes rpm-ostree from yum. We could theoretically use overlayfs (like podman does) when privileged (as rpm-ostree acting on the host is).

@nullr0ute

This comment has been minimized.

Copy link

@nullr0ute nullr0ute commented Nov 16, 2018

One thing I notice looking at the OpenSSL source is that FIPS_mode_set fairly quickly calls into thread local storage. I thought re_set_syntax() might do the same, but apparently not. (But do you somehow have FIPS mode enabled?)

Not that I'm aware of but we do have clevis-luks (and related) installed for TPM2 disk encryption support so I wonder if it tweaks anything in this context.

I'm also wondering whether there's use of some other HW, like crc HW accel that's causing us issues here but I'm not sure if the fuse things would use kernel crc which might be accelerated.

Only thing I can think of related to that is that rpm-ostree calls into grub2-mkconfig which calls os-prober which will definitely end up initializing the kernel crypto API since it probes for RAID which uses checksums.

Yes, that makes sense as I've similar output around crypto API when doing an "rpm-ostree update" as when booting and the system is probing for rootfs. Does moving to BLS here like is planned for F-30 for defaults (and as can be used in F-29) have any impact here and does it change/impact rpm-ostree?

The rofiles-fuse segfaults seem to point to the biggest issue here, particularly if we're reliably crashing in do_releasedir.

Use of FUSE by default is definitely something that distinguishes rpm-ostree from yum. We could theoretically use overlayfs (like podman does) when privileged (as rpm-ostree acting on the host is).

Yes, we also seem to bit hitting this device is some special way that we don't do elsewhere here which is ultimately a bug somewhere so it would be good to get to the bottom of it as it's quite strange, I have a RPi3B+ with a sandisk card that doesn't seem to be hitting it, but I have a RPi3 original which seems to destroy SD cards every few month with vanilla Fedora.

@timcoote

This comment has been minimized.

Copy link
Author

@timcoote timcoote commented Nov 20, 2018

dunno whether this helps or not - pls stop me posting noise if that's what it is.

I tried an upgrade today and got a couple of different failures:

[tim@localhost ~]$ sudo rpm-ostree upgrade
988 metadata, 3617 content objects fetched; 111395 KiB transferred in 1026 seconds                                                                                                                                Checking out tree ae2de40... done
Updating metadata for 'updates': [=============] 100%odular fedora-modular                                                                                                                                        
rpm-md repo 'updates'; generated: 2018-11-19T21:29:25Z
Updating metadata for 'fedora': [=============] 100%
rpm-md repo 'fedora'; generated: 2018-10-24T22:18:20Z
Updating metadata for 'updates-modular': [=============] 100%
rpm-md repo 'updates-modular'; generated: 2018-11-17T00:08:04Z
Updating metadata for 'fedora-modular': [=============] 100%
rpm-md repo 'fedora-modular'; generated: 2018-10-24T22:20:20Z
Importing metadata [=============] 100%
Resolving dependencies... done
Will download: 2 packages (1.3 MB)
  Downloading from updates: [=============] 100%
Importing (2/2) [=============] 100%
Relabeling (79/79) [=============] 100%
Checking out packages (81/81) [=============] 100%
Running pre scripts... 1 done
Running post scripts... error: Running %post for binutils: Executing bwrap(/bin/sh): Child process killed by signal 7; run `journalctl -t 'rpm-ostree(binutils.post)'` for more information
[tim@localhost ~]$ sudo rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● ostree://fedora-iot:fedora/29/aarch64/iot
                   Version: 29.20181113.0 (2018-11-13T13:36:26Z)
                BaseCommit: 974b2ffbf34b8c83480b44ca24fefd41ade6c57a6655d50925d4e025d226bedb
              GPGSignature: Valid signature by 5A03B4DD8254ECA02FDA1637A20AA56B429476B4
           LayeredPackages: buildah ccache doxygen gcc 'gcc-c++' gdb git make python3-pip systemd-devel

  ostree://fedora-iot:fedora/29/aarch64/iot
                   Version: 29.20181113.0 (2018-11-13T13:36:26Z)
                BaseCommit: 974b2ffbf34b8c83480b44ca24fefd41ade6c57a6655d50925d4e025d226bedb
              GPGSignature: Valid signature by 5A03B4DD8254ECA02FDA1637A20AA56B429476B4
           LayeredPackages: buildah ccache doxygen gcc 'gcc-c++' gdb git make python3-pip
[tim@localhost ~]$ sudo rpm-ostree upgrade
1 metadata, 0 content objects fetched; 569 B transferred in 13 seconds                                                                                                                                            Checking out tree ae2de40... done
Importing metadata [=============] 100%dora updates-modular fedora-modular                                                                                                                                        
Resolving dependencies... done
Checking out packages (81/81) [=============] 100%
Running pre scripts... 1 done
Running post scripts... error: Executing %transfiletriggerin for info: Executing bwrap(/bin/sh): Child process killed by signal 7; run `journalctl -t 'rpm-ostree(info.transfiletriggerin)'` for more information
[tim@localhost ~]$ 

The second of these looked a lot like earlier fails. However, the first one showed a new type of error (to me), which I thought might help identify where things are going awry:

Nov 20 16:20:03 localhost.localdomain audit[5891]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:install_t:s0 pid=5891 comm="rofiles-fuse" exe="/usr/bin/rofiles-fuse" sig=11 res=1
Nov 20 16:20:03 localhost.localdomain systemd[1]: Created slice system-systemd\x2dcoredump.slice.
Nov 20 16:20:03 localhost.localdomain systemd[1]: Started Process Core Dump (PID 5912/UID 0).
Nov 20 16:20:03 localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-5912-0 comm="systemd" exe="/usr/lib/system>
Nov 20 16:20:04 localhost.localdomain rpm-ostree(binutils.post)[5281]: /sbin/ldconfig: Can't link /lib64/libpcap.so.1 to libpcap.so.1.9.0
Nov 20 16:20:04 localhost.localdomain rpm-ostree(binutils.post)[5281]: /sbin/ldconfig: Can't link /lib64/libteamdctl.so.0 to libteamdctl.so.0.1.5
Nov 20 16:20:06 localhost.localdomain rpm-ostree(binutils.post)[5281]: /sbin/ldconfig: Can't link /lib64/libcomps.so.0.1.6 to libcomps.so.0.1.6
Nov 20 16:20:06 localhost.localdomain rpm-ostree(binutils.post)[5281]: /sbin/ldconfig: Can't link /lib64/libgettextsrc-0.19.8.1.so to libgettextsrc-0.19.8.1.so
Nov 20 16:20:06 localhost.localdomain rpm-ostree(binutils.post)[5281]: /sbin/ldconfig: Can't link /lib64/libbrotlicommon.so.1 to libbrotlicommon.so.1.0.5
Nov 20 16:20:06 localhost.localdomain rpm-ostree(binutils.post)[5281]: /sbin/ldconfig: Can't link /lib64/libaio.so.1.0.0 to libaio.so.1.0.0
Nov 20 16:20:06 localhost.localdomain rpm-ostree(binutils.post)[5281]: /sbin/ldconfig: Can't link /lib64/libdnf.so.2 to libdnf.so.2
Nov 20 16:20:06 localhost.localdomain rpm-ostree(binutils.post)[5281]: /sbin/ldconfig: Can't link /lib64/libiptc.so.0 to libiptc.so.0.0.0



@jlebon

This comment has been minimized.

Copy link
Member

@jlebon jlebon commented Nov 22, 2018

So, was reading the libfuse and kernel codebases. This is total conjecture but I have a theory that this might be some kind of race condition between the FUSE kernel module being loaded and rofiles-fuse.

Nov 05 12:15:38 localhost.localdomain rpm-ostree(kernel-modules-extra.post)[1383]: depmod: ERROR: openat(3, extra, O_RDONLY): Transport endpoint is not connected

This Transport endpoint is not connected msg is ENOTCONN. libfuse replies this if it hasn't received FUSE_INIT from the kernel yet. I'm wondering if the openat is getting called so fast that the fuse module still hasn't had time to fully load and send FUSE_INIT to libfuse? The fact that it works on the second try (when the fuse module is already well up and running) is what makes me think this.

Again, total conjecture, but thought it might be helpful. This is hard to help debug without being able to reproduce locally. If you always modprobe fuse before the operation, does it work on the first try?

@timcoote

This comment has been minimized.

Copy link
Author

@timcoote timcoote commented Dec 18, 2018

Well that worked the first time (one data point). I'll add them as I try more.

@jgillich

This comment has been minimized.

Copy link

@jgillich jgillich commented Jan 20, 2019

I have the same issue, 100% of the time (tried it about 10 times). Fresh install of IoT on an RPi3. Used to run regular Fedora on it and it worked fine, so I don't think this is a hardware issue. Podman also seems to work fine. Running modprobe fuse first makes no difference. Not sure what else I could try, but if one of you wants to debug this, I could give you SSH access.

Edit: A few more attempts and it actually succeeded. So it does not fail 100% of the time, but pretty darn close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.