-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native/bytecode executables segfault on Linux when running on a Wayland compositor #74
Comments
If I understand correctly, this is just happening for hotreloading and not for a "regular" build? |
It also segfaults on native builds.
|
😭 |
Evidently I'm generating coredumps when this happens. This is the backtrace: { "signal": 11
, "executable": "/home/zach/src/reprocessing-example/lib/bs/native/index.native"
, "stacktrace":
[ { "crash_thread": true
, "frames":
[ { "address": 0
, "build_id_offset": 0
}
, { "address": 4554802
, "build_id": "6a31abd6bbc250214c3c4942e0e9b110545afd88"
, "build_id_offset": 360498
, "file_name": "/home/zach/src/reprocessing-example/lib/bs/native/index.native"
} ]
}
, { "frames":
[ { "address": 140430986743142
, "build_id": "b097b427ace57ac70bb636f7a41af7f10a69a851"
, "build_id_offset": 961894
, "function_name": "ppoll"
, "file_name": "/lib64/libc.so.6"
}
, { "address": 140430756520241
, "build_id": "d8eac16837bacf679ba9f307bcf71db7bd931b33"
, "build_id_offset": 151857
, "function_name": "pa_mainloop_poll"
, "file_name": "/lib64/libpulse.so.0"
}
, { "address": 140430756521792
, "build_id": "d8eac16837bacf679ba9f307bcf71db7bd931b33"
, "build_id_offset": 153408
, "function_name": "pa_mainloop_iterate"
, "file_name": "/lib64/libpulse.so.0"
}
, { "address": 140430756521936
, "build_id": "d8eac16837bacf679ba9f307bcf71db7bd931b33"
, "build_id_offset": 153552
, "function_name": "pa_mainloop_run"
, "file_name": "/lib64/libpulse.so.0"
}
, { "address": 5712525
, "build_id": "6a31abd6bbc250214c3c4942e0e9b110545afd88"
, "build_id_offset": 1518221
, "file_name": "/home/zach/src/reprocessing-example/lib/bs/native/index.native"
} ]
}
, { "frames":
[ { "address": 140430986743142
, "build_id": "b097b427ace57ac70bb636f7a41af7f10a69a851"
, "build_id_offset": 961894
, "function_name": "ppoll"
, "file_name": "/lib64/libc.so.6"
}
, { "address": 140430756520241
, "build_id": "d8eac16837bacf679ba9f307bcf71db7bd931b33"
, "build_id_offset": 151857
, "function_name": "pa_mainloop_poll"
, "file_name": "/lib64/libpulse.so.0"
}
, { "address": 140430756521792
, "build_id": "d8eac16837bacf679ba9f307bcf71db7bd931b33"
, "build_id_offset": 153408
, "function_name": "pa_mainloop_iterate"
, "file_name": "/lib64/libpulse.so.0"
}
, { "address": 5712287
, "build_id": "6a31abd6bbc250214c3c4942e0e9b110545afd88"
, "build_id_offset": 1517983
, "file_name": "/home/zach/src/reprocessing-example/lib/bs/native/index.native"
} ]
} ]
} |
😕 Looks like something dying in pulse audio... Wonder if it's a version issue of kind. Currently at a bit of a loss but I'll give it some thought. One thing that seems reassuring is that it's not the actual build that's failing, it's the executable (ie our ocaml compiler binaries are probably fine) |
@bsansouci do you know if there's any way that we should be building this that will give us more symbols in the ocaml code? |
Looking at
The
|
I believe that bsb-native builds with |
Ah you're right. Attaching the debugger I see:
Are |
I find this a little confusing as it looked like your code dump was very pulse-audio-related :/ |
The presence of pulse in the stack traces I'm getting from the core dumps might just be a coincidence (because it runs in its own thread? Not sure.). When I run it in the debugger now I don't see any mention of pulse in the stack trace from the core dump. { "signal": 11
, "executable": "/home/zach/src/reprocessing-example/lib/bs/bytecode/indexhot.byte"
, "stacktrace":
[ { "crash_thread": true
, "frames":
[ { "address": 0
, "build_id_offset": 0
}
, { "address": 5368396
, "build_id": "a9363eba5fed588881aae7ee01a5fc3da026e016"
, "build_id_offset": 1174092
, "file_name": "/home/zach/src/reprocessing-example/lib/bs/bytecode/indexhot.byte"
} ]
} ]
} It's definitely failing as soon as I try to step into this function call to
|
Ah, I see, I was looking at the wrong part of the stacktrace in the core dump. I think you're right that the audio stuff runs in its own thread. I wonder if some GL function is failing quietly and whether we need to think about adding checks to glGetError in more places... There's also a slight chance that glad (our gl loader) isn't kicking in properly and so the call to viewport itself is the problem... Kind of spitballing here. |
@zploskey this means reprocessing is totally dead on linux except for when building to web right? Seems pretty bad. Any idea of when this started happening for you? I'll try to look into it more tonight. |
If anyone knows when (or if) native builds were ever working on Linux please let us know. I don't know when the problem might have been introduced since I only started trying to use this in the last few weeks (when I started filing issues). It may be a bit difficult to bisect due to the build being broken for other reasons, but I can make an attempt. Without a known working version I'll have to pick an arbitrary commit on Reasongl, I guess. Open to suggestions on where that should be. Do you have any intuition about what commits might have been a problem? I have my eye on these commits in particular: https://github.com/bsansouci/reasongl/commits/master/src/native It honestly might be easier to just follow up where things are going wrong when calling in to outside libs. |
Oh if there's a segfault on Gl.viewport that means GL isn't being loaded correctly. The way GL is loaded is through glad which is a cross platform little pile of C that dynamically loads OpenGL and all of the functions you ask it to load. If you get a segfault right at This can be debugged by printing in here which is the entry point to dynamically loading GL. Also maybe making sure SDL loads correctly by doing |
Ah sorry @zploskey I misunderstood and didn't realize that this never worked for you (I had assumed that it was running back when you were fixing the audio stuff). |
This is probably unrelated, but trying to npm install in my tgls clone gives me this:
I've tried with a compiler available through OPAM and without and still get this. May be due to a recent change in bsb-native since I'm pretty sure I could build this previously. |
@zploskey hmm, not sure why you're getting this, I just tried on my mac and couldn't repro. We'll try to see if there's anything obviously wrong there. You don't need an opam compiler installed to install bsb-native. In the meantime, it looks like tgls uses bsb-native master, so as a workaround you can rely on bsansouci/bsb-native#2.1.1 instead for now to get the prebuilt version. |
Looks to be a regression in bsb-native. If I specify 2.1.1 it builds ok. |
On bsb-native 2.1.1, having changed all the deps to use my local clones, I get this build failure on the example project on
Also apparently it needed to be |
These two lines were removed from typedef void* (APIENTRYP PFNWGLGETPROCADDRESSPROC_PRIVATE)(const char*);
static PFNWGLGETPROCADDRESSPROC_PRIVATE gladGetProcAddressPtr; but are needed here, at least on Linux. Forward declaring them in the IFDEF for for Linux stuff in SOIL.c fixes the build. Should these lines be in the glad header or not? |
Ah darn. I think what happened here is that @bsansouci regenerated the glad files but we had made some edits that were then lost. I'm fairly sure I added those back when we started using glad and that there's no issue with them being in their previous location. Thanks for tracking that one down |
Any interest in setting up CI to catch things like this and have an automated test for Linux stuff? Preparing a PR for that now. |
100% interest but haven't had a chance to do it :) |
re |
So it's either segfaulting before either of the suggested points where I tried to print things or the printing is somehow not making it to stdout... |
Mmmh did you call |
Good call, it prints things now. Investigating. |
In |
Mmmmh this error seems to be because the window object is null, as if there was an issue creating it. |
Great job narrowing it down guys, you give me hope for the world <3 |
|
Ok, so as suspected |
Good news, it work in X11. This problem only crops up when running the Wayland display server. We need to support Wayland, though, since X11 is on its way out. |
Ahh, that explains why it was working on my one linux machine... Good stuff |
Could you try running the built executable with |
I'm assuming you have libwayland-dev installed? This linux README for sdl seems to mention |
Hey alright! On Fedora 27 I was able to get this working by doing this: # Install build dependencies of SDL2-devel package (currently 2.0.7 on F27)
# List is in the rpm spec: https://src.fedoraproject.org/rpms/SDL2/blob/f27/f/SDL2.spec
sudo dnf builddep SDL2-devel
git clone https://github.com/bsansouci/reprocessing-example.git
cd reprocessing-example
npm install
npm run build:native
SDL_VIDEODRIVER=wayland ./lib/bs/native/index.native I see at least a couple of outcomes here:
|
Having the requirement of setting an env var to be able to run the program makes me sad 😢 |
My only only guess would be to look into what path the configure script is running on a wayland machine... https://github.com/bsansouci/SDL-mirror/blob/master/configure We might want to try just explicitly enabling wayland support in the configure script??? |
@bsansouci @zploskey ...can we detect wayland-iness in c/ocaml? We could do the niiiice and hacky solution of setting the env var manually before starting up the sdl code... |
There seem to be a few ways we might be able to detect if it's X11 or Wayland. |
See Schmavery/reprocessing#74 for more info
See Schmavery/reprocessing#74 for more info
This should be fixed now by bsansouci/reasongl#9! |
This happens every time. Here's the log from a clean build of the example repo:
Note this happens almost immediately, no chance to edit anything.
The text was updated successfully, but these errors were encountered: