Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault when starting opencpn on wayland #1166

Closed
leamas opened this issue Oct 29, 2018 · 27 comments
Closed

segfault when starting opencpn on wayland #1166

leamas opened this issue Oct 29, 2018 · 27 comments

Comments

@leamas
Copy link
Collaborator

leamas commented Oct 29, 2018

As heading says: on a pristine debian 9 installation, logged in with default options starting opencpn gives a segfault errror.

The culprit seems to be the fact that Debian has moved away from using X11 in the default setup. Logging out, and logging in again with the option "Debian on Xorg" (out of the top of my head) makes opencpn start as expected.

Patching opencpn not to use X11-specific functions might be to a too big job to be feasible in the 5.0.0 cycle. However, ocpn should write a sensible error message with a hint about the walk-around instead of segfaulting.

@SethDart
Copy link
Collaborator

The segfault is related to a wxWidgets bug with Wayland:
https://trac.wxwidgets.org/ticket/17702

We've the same problem in Fedora since they have switched by default to Wayland.

You can run opencpn on Wayland like this:
GDK_BACKEND=x11 opencpn

@leamas
Copy link
Collaborator Author

leamas commented Oct 29, 2018

Thanks for great feedback (I'm actually mostly on fedora myself...). We should see that opencpn on linux is launched using a wrapper script to create some flexibility to handle things like this.

@nohal
Copy link
Collaborator

nohal commented Oct 29, 2018

Did you try the code changes suggested in https://trac.wxwidgets.org/ticket/17702#comment:12 and following comments? They seem harmless enough if they work...

@leamas
Copy link
Collaborator Author

leamas commented Oct 29, 2018

No, but I certainly will. Stay tuned.

@leamas
Copy link
Collaborator Author

leamas commented Oct 29, 2018

Attached patch "works for me" on 4.8.6 and debian-9.

It applies cleanly also to master, but is not tested in that context (I'm currently focused on getting the 4.8.6 package in shape).

EDIT: Remove bad patch

@leamas
Copy link
Collaborator Author

leamas commented Oct 29, 2018

No, patch is broken... stay tuned, again.

@leamas
Copy link
Collaborator Author

leamas commented Oct 29, 2018

But this seems ok:

EDIT: Remove bad patch.

@leamas
Copy link
Collaborator Author

leamas commented Oct 31, 2018

Dammit. The patch works, but each and every line is wrong. Attaching a better attempt

0018-Patch-Initialize-display-to-x11-on-wayland-hosts-116.patch.gz

@bdbcat
Copy link
Member

bdbcat commented Nov 3, 2018

leamas...

How did you install Debian9 so that it defaulted to a wayland backend for gdk? Real hardware, or vbox?

I installed deb9/Gnome in a virtualbox, which seems only to produce an X11 backend.

Dave

@bdbcat
Copy link
Member

bdbcat commented Nov 3, 2018

Thinking further...
The only real X11 library code we call seems to be in the method to detect transparent toolbar support.
You could test this by clearing the line CmakeLists.txt:1911
#1168 ADD_DEFINITIONS(-DOCPN_HAVE_X11)

This would prevent the calls to X11 library, probably. And thus avoid the fault...

A gdb stack trace would also be informative.

@leamas
Copy link
Collaborator Author

leamas commented Nov 3, 2018

How did you install Debian9 so that it defaulted to a wayland backend for gdk? Real hardware, or vbox?

I probably need som kind of care. I'm actually on sid (i. e. upcoming D10), but it prints D9 on the welcome screen. In the end, it's all about gnome which has changed default to wayland since long.
(I mostly run on virtualbox, but testing on bare metal given no difference.)

you could test this by clearing the line CmakeLists.txt:1911
#1168 ADD_DEFINITIONS(-DOCPN_HAVE_X11)

Interesting, will do.

A gdb stack trace would also be informative.

Yes. I'm a lazy human being... stay tuned.

@leamas
Copy link
Collaborator Author

leamas commented Nov 3, 2018

BTW: This was reported already on fedora 25: https://opencpn.org/flyspray/index.php?do=details&task_id=2198

EDIT: I have not always been that lazy! The flyspray bug contains a stacktrace.

@leamas
Copy link
Collaborator Author

leamas commented Nov 3, 2018

Now, despite crashes reported by others, I cannot reproduce the crash at all. It might be that debian is updated (sid is a sort of rolling release) and/or the fact that I nowadays uses gtk2-based builds instead of gtk3.

I have also rebased my current work, so I don't have a clear point in history to walk back to gtk3 :(.

Leaving bug open, hopefully more things should happen with this while doing other work.

@bdbcat
Copy link
Member

bdbcat commented Nov 4, 2018

Looking at stack trace, seems clear that this is truly a wxGLCanvas bug. Has nothing to do with our OCPN_HAVE_X11flag. A red herring. Sorry.

Anyway, I expect that OCPN will not be the only wxGL app affected.
Also clear that wayland on gtk3/gnome is far from free of problems. It is still a WIP, with many strange user-unfriendly config issues.

We shall see this again, I expect. But the fix will not be a part of OCPN488

Dave

@leamas
Copy link
Collaborator Author

leamas commented Nov 4, 2018

L ooking at stack trace, seems clear that this is truly a wxGLCanvas bug. Has nothing to do with our OCPN_HAVE_X11flag. A red herring. Sorry.

The good news seems to be that my current sid is OK, it works even when rebuilding the current ubuntu 4.8.6 sources. It uses:

  • gnome 3.30,
  • ibwxgtk3.0-dev etc. on 3.0.4,
  • libwxsvg2:1.5.15+dfsg.2-1 (in the NEW queue, available on mentors.debian.net as source).

...and what not

@leamas
Copy link
Collaborator Author

leamas commented Nov 13, 2018

Still crash on fedora 29, despite gnome 3.30 also here. However, the gtk2 build on f29 seems broken, so testing using gtk3.

0x7f43b8631696 in  at ??:0
0x7f43b70e5f70 in  at ??:0 
0x7f43b82a3a9f in _XSend at ??:0
0x7f43b82a3ee4 in _XFlush at ??:0
0x7f43b82a6acd in _XGetRequest at ??:0
0x7f43b829a031 in XQueryExtension at ??:0
0x7f43b64ced82 in  at ??:0
0x7f43b64cada9 in glXQueryVersion at ??:0
0x7f43b9131ee5 in wxGLCanvasX11::GetGLXVersion() at ??:0
0x7f43b9132f15 in wxGLCanvasX11::ConvertWXAttrsToGL(int const*, int*, unsigned long) at ??:0
0x7f43b913364c in wxGLCanvasX11::InitXVisualInfo(int const*, __GLXFBConfigRec***,    XVisualInfo**) at ??:0
0x7f43b9133d33 in wxGLCanvas::Create(wxWindow*, int, wxPoint const&, wxSize const&, long,     wxString const&, int const*, wxPalette const&) at ??:0
0x7f43b9133ed7 in wxGLCanvas::wxGLCanvas(wxWindow*, int, int const*, wxPoint const&, wxSize const&, long, wxString const&, wxPalette const&) at ??:0
0x56033408e103 in glChartCanvas::glChartCanvas(wxWindow*) at ??:0
0x560333e3847b in ChartCanvas::ChartCanvas(wxFrame*) at ??:0
0x560333ddfb96 in MyApp::OnInit() at ??:0
0x7f43b8573d32 in wxEntry(int&, wchar_t**) at ??:0

@leamas
Copy link
Collaborator Author

leamas commented Nov 14, 2018

Upstream wxWidgets bug: http://trac.wxwidgets.org/ticket/17702

@leamas
Copy link
Collaborator Author

leamas commented Nov 25, 2018

Debian packaging of gtk3 is on it: https://lists.debian.org/debian-devel/2018/11/msg00551.html

@nkiesel
Copy link
Contributor

nkiesel commented May 9, 2019

I had OpenCPN running on my Debian Sid for a long time now but after the last git pull I cannot start anymore. I built against GTK2 and tried GDK_BACKEND=x11 in various options (cmdline, applying patch) but still no luck. Anyone has a (however hacky) solution to get out of this?

For the record, it crashes for me in

#0  0x00007ffff7089064 in wxImage::InitAlpha() () at /usr/lib/x86_64-linux-gnu/libwx_gtk2u_core-3.0.so.0
#1  0x00007fffe48d1f1d in wmm_pi::SetPositionFix(PlugIn_Position_Fix&) (this=0x55555700c090, pfix=...) at /home/nkiesel/Projects/O/OpenCPN/plugins/wmm_pi/src/wmm_pi.cpp:595
#2  0x0000555555a178cd in PlugInManager::SendPositionFixToAllPlugIns(GenericPosDatEx*) (this=0x5555560c6d60, ppos=0x7fffffffd530) at /home/nkiesel/Projects/O/OpenCPN/src/pluginmanager.cpp:2011

@leamas
Copy link
Collaborator Author

leamas commented May 10, 2019

but after the last git pull I cannot start anymore.

Are you sure you havn't updated sid in the same windpw? If so, it it might be possible to to make a git bisect to find the culprit.

I built against GTK2 and tried GDK_BACKEND=x11 in various options (cmdline, applying patch)

I have had similar problems on Fedora, which basically is further down the same road as Debian. The only remedy here is to login with a X11 backend instead of Wayland. Using the standard login screen, there is an options button just before you login.

@leamas leamas changed the title debian 9: segfault when starting opencpn segfault when starting opencpn on wayland Oct 3, 2019
@leamas
Copy link
Collaborator Author

leamas commented Oct 3, 2019

See also crash report in #1167

@reneherrero
Copy link

If I'm reading this right, there's a fix on it's way for Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900678#67

Right now, Buster has 3.0.4+dfsg-8

@leamas
Copy link
Collaborator Author

leamas commented Oct 3, 2019

This is basically the patch in #1166 (comment) which was rejected at the time, but perhaps looks better now?

@leamas
Copy link
Collaborator Author

leamas commented Oct 3, 2019

The patch does not apply cleanly, but does the job after being applied manually (Fedora 30).

@leamas
Copy link
Collaborator Author

leamas commented Nov 21, 2019

This patch is applied to the new Debian package at https://packages.debian.org/sid/opencpn

@leamas
Copy link
Collaborator Author

leamas commented Feb 23, 2020

The Debian package is accepted.

@leamas
Copy link
Collaborator Author

leamas commented May 2, 2020

Fixed in Debian Sid/bullseye; opencpn without this patch starts just fine here.

@leamas leamas closed this as completed May 2, 2020
leamas added a commit that referenced this issue Apr 26, 2024
Bug: #1166

While changing the deps is enough in isolated builds, explicit
options are required on build hosts with both gtk2 and gtk3 available.

gtk2 is used because of reasons described in  #1666.
leamas added a commit that referenced this issue Apr 26, 2024
  - Carrying patchs 0001-* and 0003-* yet to be merged upstream
    from 4.8.8.
  - Move location of added help_web.html, the original
    destination dir is gone.
  - Fix some broken include paths, to be upstreamed.
  - Relicense appstream metadata to overall GPL-2+ license.
  - Move opencpn.appdata.xml to new location metainfo
  - Clean up rules after upstream fixes and patches

Bug: #1166

testing overrides

more testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants