Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feh stopped working with XQuartz 2.8.4 (maybe 2.8.3) #314

Closed
twerschlein opened this issue Jan 10, 2023 · 11 comments
Closed

feh stopped working with XQuartz 2.8.4 (maybe 2.8.3) #314

twerschlein opened this issue Jan 10, 2023 · 11 comments
Labels
Regression This issue represents a regression from a previous release of XQuartz Upstream This issue is reported upstream (eg: Freedesktop Gitlab) and kept open for tracking
Milestone

Comments

@twerschlein
Copy link

The X11 image viewer "feh", installed on Ubuntu 20.04 (feh version 3.3) or Ubuntu 22.04 (feh version 3.6.3) stopped working with XQuartz 2.8.4 (maybe 2.8.3, I didn't have this version installed). feh in any version works with XQuartz 2.8.2. The issue shows up on Intel and ARM Macs, macOS 13.1 on all Macs:
Starting "feh img.png" on the Ubuntu host starts XQuartz 2.8.4 on the Mac, but never shows the picture. The feh process is uninterruptible and can only be killed with signal 9.
The same img.png can be opened without problems with e.g. "gm display img.png" (GraphicsMagick display). All my X11-tests (such as xclock, xeyes, etc.) worked as well. It seems to be a feh + XQuartz >= 2.8.[3|4] problem only.

@jeremyhu
Copy link
Member

2.8.4 is basically the same as 2.8.3 with some security updates.

Can you please test 2.8.3 beta and rcs to determine when this regressed?

https://www.xquartz.org/releases/archive.html

@jeremyhu jeremyhu added the Regression This issue represents a regression from a previous release of XQuartz label Jan 11, 2023
@jeremyhu jeremyhu added this to the 2.8.5 milestone Jan 11, 2023
@twerschlein
Copy link
Author

Indeed, the regression starts with 2.8.3_beta1. Please let me know if you need more information, I have a working/broken setup running in parallel now (both on M1, 13.1).

@jeremyhu
Copy link
Member

That beta bumped a lot of upstream changes. Can you backup XQuartz.app from 2.8.2, then install 2.8.4, then replace its XQuartz.app with the older 2.8.2 one?

If that works, then at least we know it is a change in the server binary. If it doesn't, we know it is a change in the libraries.

@twerschlein
Copy link
Author

That works. Restoring the 2.8.2 XQuartz.app over a 2.8.4 install gives me "XQuartz 2.8.2 (xorg-server 1.20.14)" in the About.. window. So the server binary is the culprit. Just seen that the version bump from xorg-server 1.20.14 to 21.1.6 is substantial.

@jeremyhu jeremyhu added the Upstream This issue is reported upstream (eg: Freedesktop Gitlab) and kept open for tracking label Jan 12, 2023
@jeremyhu
Copy link
Member

It looks like this is resulting in a difference between autoconf and meson builds.

I built e3a530540f2f13739b0233ec51d7a3985a7ec4be with both autoconf and meson. The autoconf build was fine. The meson build had this issue.

@jeremyhu
Copy link
Member

I've narrowed it down to a difference between the way we build os/io.c with autoconf and meson, and specifically something in dix-config.h

@jeremyhu
Copy link
Member

Looks like the difference between good and bad is that XTRANS_SEND_FDS is set in the failing case...

@jeremyhu
Copy link
Member

It's also something to do with ssh forwarding. If I enable tcp connections directly to the server, it works, but not if I use ssh's forwarding, eg:

[jeremy@fedora]~% DISPLAY=192.168.239.1:5 feh Screenshot\ 2022-07-30\ at\ 13.24.02.png
^^ works

[jeremy@fedora]~% feh Screenshot\ 2022-07-30\ at\ 13.24.02.png
^^ fails

@jeremyhu
Copy link
Member

It looks like it reproduces if using feh built from MacPorts and using ssh forwarding.

@jeremyhu
Copy link
Member

feh is hung at:

  Thread 0x59416f    DispatchQueue 8249179992    1000 samples (1-1000)    priority 31 (base 31)
  1000  start + 2544 (dyldMain.cpp:1170 in dyld + 24144) [0x1901dfe50]
    1000  main + 312 (feh + 68388) [0x100404b24]
      1000  init_slideshow_mode + 304 (feh + 92316) [0x10040a89c]
        1000  winwidget_create_from_file + 280 (feh + 116868) [0x100410884]
          1000  winwidget_render_image + 684 (feh + 115944) [0x1004104e8]
            1000  feh_draw_checks + 48 (feh + 119660) [0x10041136c]
              1000  feh_create_checks + 392 (feh + 118540) [0x100410f0c]
                1000  imlib_render_image_on_drawable + 156 (libImlib2.1.dylib + 140940) [0x10066668c]
                  1000  __imlib_RenderImage + 636 (libImlib2.1.dylib + 159084) [0x10066ad6c]
                    1000  __imlib_ProduceXImage + 260 (libImlib2.1.dylib + 203568) [0x100675b30]
                      1000  __imlib_ShmGetXImage + 508 (libImlib2.1.dylib + 202104) [0x100675578]
                        1000  xcb_wait_for_reply + 108 (libxcb.1.1.0.dylib + 12044) [0x100976f0c]
                          1000  wait_for_reply + 236 (libxcb.1.1.0.dylib + 12312) [0x100977018]
                            1000  poll + 8 (libsystem_kernel.dylib + 39288) [0x1904d2978]
                             *1000  fleh_synchronous + 40 (kernel.development.t8112 + 46980) [0xfffffe000851f784]
                               *1000  sleh_synchronous + 984 (sleh.c:2511 in kernel.development.t8112 + 2323500) [0xfffffe000874b42c]
                                 *1000  unix_syscall + 824 (systemcalls.c:181 in kernel.development.t8112 + 8310144) [0xfffffe0008d00d80]
                                   *1000  poll_nocancel + 840 (sys_generic.c:1838 in kernel.development.t8112 + 6846372) [0xfffffe0008b9b7a4]
                                     *1000  kqueue_scan + 2156 (sched_prim.c:3812 in kernel.development.t8112 + 6379800) [0xfffffe0008b29918]
                                       *1000  thread_block_reason + 440 (sched_prim.c:3796 in kernel.development.t8112 + 735924) [0xfffffe00085c7ab4]
                                         *1000  thread_invoke + 1228 (sched_prim.c:3206 in kernel.development.t8112 + 743192) [0xfffffe00085c9718]
                                           *1000  machine_switch_context + 92 (pcb.c:588 in kernel.development.t8112 + 2376144) [0xfffffe00087581d0]

@jeremyhu
Copy link
Member

I sent an email to xorg-devel asking for guidance:

I traced this to a difference in autoconf vs meson builds. With meson, we're setting XTRANS_SEND_FDS whereas with autoconf, we weren't:

if cc.has_header_symbol('sys/socket.h', 'SCM_RIGHTS')
 conf_data.set('XTRANS_SEND_FDS', '1')
endif

vs

       linux*|solaris*|freebsd*|dragonfly*|openbsd*)
               XTRANS_SEND_FDS=yes
               ;;
       *)
               XTRANS_SEND_FDS=no
               ;;

This change certainly looks fine to me. darwin supports SCM_RIGHTS. It was probably just overlooked in that configure.ac condition, and it never caused enough of a problem for someone to notice.

So I turned my attention to figuring out why things aren't working with XTRANS_SEND_FDS set...

Soon after launching feh, it enters ProcShmCreateSegment. With local and unix connections, we try using SCM_RIGHTS to send an fd in the reply (_XSERVTransSocketSendFd). With inet, inet6, and tcp, we skip sending the FD (_XSERVTransSocketSendFdInvalid). This all sounds fine and good except that ssh forwarding throws a wrench into things...

When we use ssh forwarding with a local DISPLAY (eg: DISPLAY=:0 ssh -Y some.host), the X11 server ends up seeing this as a local connection (thus using _XSERVTransSocketLocalFuncs) and uses SCM_RIGHTS to send a fd. On the server, we successfully send the message with FDs attached. On the client, we receive a reply, but there is no fd, so xcb continues to wait (specifically, read_fds() fails and we end up stuck in wait_for_reply()).

Now, I haven't dug into what's happening between the server and client, but I suspect OpenSSH just drops the FDs on the floor without logging a warning to the user and passes the rest of the message along.

So there seem to be two issues here:

1 - libxcb should recover from this.
2 - The server should be able to determine that the transport does not support SCM_RIGHTS.


How should this work? Why hasn't this been reported as an issue on other platforms? This all seems pretty platform agnostic, so I'd expect this to be an issue on other platforms as well. Is it not? If not, why not?

jeremyhu added a commit to XQuartz/xorg-server that referenced this issue Jan 16, 2023
XTRANS_SEND_FDS was disabled by default on darwin with autoconf builds.

When we moved to meson, this was enabled.  SCM_RIGHTS works well for local
connections, but unfortunatley X11 forwarding over ssh is incorrectly
identified as a local connection.  This is being disabled to restore the
previous functionality until a solution can be determined.

Fixes: XQuartz/XQuartz#314
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
jeremyhu added a commit to XQuartz/xorg-server that referenced this issue Jan 16, 2023
XTRANS_SEND_FDS was disabled by default on darwin with autoconf builds.

When we moved to meson, this was enabled.  SCM_RIGHTS works well for local
connections, but unfortunatley X11 forwarding over ssh is incorrectly
identified as a local connection.  This is being disabled to restore the
previous functionality until a solution can be determined.

Fixes: XQuartz/XQuartz#314
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
jeremyhu added a commit to XQuartz/xorg-server that referenced this issue Jan 17, 2023
XTRANS_SEND_FDS was disabled by default on darwin with autoconf builds.

When we moved to meson, this was enabled.  SCM_RIGHTS works well for local
connections, but unfortunatley X11 forwarding over ssh is incorrectly
identified as a local connection.  This is being disabled to restore the
previous functionality until a solution can be determined.

Fixes: XQuartz/XQuartz#314
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
jeremyhu added a commit to XQuartz/xorg-server that referenced this issue Jan 17, 2023
XTRANS_SEND_FDS was disabled by default on darwin with autoconf builds.

When we moved to meson, this was enabled.  SCM_RIGHTS works well for local
connections, but unfortunatley X11 forwarding over ssh is incorrectly
identified as a local connection.  This is being disabled to restore the
previous functionality until a solution can be determined.

Fixes: XQuartz/XQuartz#314
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
jeremyhu added a commit to XQuartz/xorg-server that referenced this issue Jan 18, 2023
Withoug a proper implementation of DetermineClientCmd, clients that
connect via an ssh tunnel are miscategorized as local.  This results
in failures when we try to use SCM_RIGHTS (eg: in MIT-SHM).

Fixes: XQuartz/XQuartz#314
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
jeremyhu added a commit to XQuartz/xorg-server that referenced this issue Jan 18, 2023
Withoug a proper implementation of DetermineClientCmd, clients that
connect via an ssh tunnel are miscategorized as local.  This results
in failures when we try to use SCM_RIGHTS (eg: in MIT-SHM).

Fixes: XQuartz/XQuartz#314
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
jeremyhu added a commit that referenced this issue Jan 18, 2023
This change allows the server to now properly detect ssh tunneled
connections as remote rather than local connections.

Fixes: #314
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
jeremyhu added a commit to XQuartz/xorg-server that referenced this issue Jan 21, 2023
Withoug a proper implementation of DetermineClientCmd, clients that
connect via an ssh tunnel are miscategorized as local.  This results
in failures when we try to use SCM_RIGHTS (eg: in MIT-SHM).

Fixes: XQuartz/XQuartz#314
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
jeremyhu added a commit to XQuartz/xorg-server that referenced this issue Jan 21, 2023
Withoug a proper implementation of DetermineClientCmd, clients that
connect via an ssh tunnel are miscategorized as local.  This results
in failures when we try to use SCM_RIGHTS (eg: in MIT-SHM).

Fixes: XQuartz/XQuartz#314
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
jeremyhu added a commit to XQuartz/xorg-server that referenced this issue Jan 26, 2023
Withoug a proper implementation of DetermineClientCmd, clients that
connect via an ssh tunnel are miscategorized as local.  This results
in failures when we try to use SCM_RIGHTS (eg: in MIT-SHM).

Fixes: XQuartz/XQuartz#314
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
jeremyhu added a commit to XQuartz/xorg-server that referenced this issue Jan 26, 2023
Withoug a proper implementation of DetermineClientCmd, clients that
connect via an ssh tunnel are miscategorized as local.  This results
in failures when we try to use SCM_RIGHTS (eg: in MIT-SHM).

Fixes: XQuartz/XQuartz#314
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
(cherry picked from commit 0ea9b59)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression This issue represents a regression from a previous release of XQuartz Upstream This issue is reported upstream (eg: Freedesktop Gitlab) and kept open for tracking
Projects
None yet
Development

No branches or pull requests

2 participants