Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Angband-4.2.5 sometimes doesn't start #5952

Open
vext01 opened this issue Apr 14, 2024 · 9 comments
Open

Angband-4.2.5 sometimes doesn't start #5952

vext01 opened this issue Apr 14, 2024 · 9 comments
Labels
C: Port-specific Classification: Port-specific
Milestone

Comments

@vext01
Copy link

vext01 commented Apr 14, 2024

Hi,

I'm updating the OpenBSD package to 4.2.5. I notice that angband non-deterministically doesn't start.

This appears to occurs with all frontends. When you use sdl or sdl2, the GUI window flashes up before immediately disappearing.

When this happens angband exits with zero status.

Does anyone have any ideas why this may be, or what I could try to debug this?

Thanks

@NickMcConnell NickMcConnell added this to the Triage milestone Apr 14, 2024
@NickMcConnell NickMcConnell added the C: Port-specific Classification: Port-specific label Apr 14, 2024
@NickMcConnell
Copy link
Member

It's an odd one. I'd start by trying running it either in a debugger like gdb or a memory tool like valgrind, and see if you get any useful messages. If I had to guess I'd say something to do with file paths, but that is really a guess.

Let us know how it goes.

@backwardsEric
Copy link
Contributor

backwardsEric commented Apr 15, 2024

How frequently does it fail to start up? Is this with a setgid version?

I built and installed 4.2.5 on OpenBSD 7.4 ((EDIT: 32-bit kernel; i386) running in a Parallels virtual machine on macOS). Angband was configured with "--with-private-dirs --enable-sdl2 --prefix=$HOME/angband-4.2.5". The SDL2 front end got to the splash screen without trouble when I launched it about 20 times. In another roughly 20 launches with a character who had sound on, it got to the splash screen and loaded the first level of the dungeon without trouble.

@vext01
Copy link
Author

vext01 commented Apr 16, 2024

How frequently does it fail to start up?

Just now it took 12 attempts to get angband to start.

Is this with a setgid version?

Yes, and this certainly has to do with the setgid bit.

Am I doing this right?

The port Makefile contains:

CONFIGURE_ARGS +=	--with-setgid=games \
			--with-varpath=/var/games/angband \
			--bindir=${PREFIX}/bin

Once the package is installed:

$ ls -al /var/games/ | grep angband
drwxrwxr-x   6 root  games  512 Apr 14 19:18 angband
$ ls -al /usr/local/bin/angband
-rwxr-sr-x  1 root  games  3419784 Apr 14 19:15 /usr/local/bin/angband

If I copy the binary to angband2 it loses the setgid bit. If I run that binary the bug doesn't manifest (but I assume I won't be able to write saves etc.)

Annoyingly I'm unable to ktrace this, as it's setgid.

@backwardsEric
Copy link
Contributor

--with-varpath is ignored by configure on the system I have access to (OpenBSD 7.4; 32 bit; i386). I would recommend using --localstatedir=/var instead (--localstatedir=/var/games/angband results in Angband using /var/games/angband/games/angband for the save files and high scores).

When I built and installed with

./configure --prefix=/usr/local/angband --with-setgid=games --localstatedir=/var
make
su root
make install

on that system (the setting for prefix there was just to avoid interfering with existing files), the executable got to the splash screen without trouble for six consecutive runs. However, that was with a 32-bit build which is likely different than your environment.

@vext01
Copy link
Author

vext01 commented Apr 16, 2024

Thanks.

I tried --localstatedir=/var but alas, nothing changes.

I also notice:

$ angband -l    
Savefiles you can use are:
zsh: segmentation fault  angband -l

backtrace:

(gdb) run
Starting program: /usr/local/bin/angband -l
Savefiles you can use are:

Program received signal SIGSEGV, Segmentation fault.
0x00000c651a496985 in list_saves () at main.c:293
293			if (details->desc) {
(gdb) bt
#0  0x00000c651a496985 in list_saves () at main.c:293
#1  0x00000c651a4962a0 in main (argc=2, argv=0x71b6d59eed48) at main.c:365

(I have no save games at the moment)

@backwardsEric
Copy link
Contributor

When I set up a virtual machine with the amd64 version of OpenBSD 7.4 and built a setgid version of Angband 4.2.5 using the same procedure as I mentioned above for the i386 version of OpenBSD 7.4, the results were much the same as with the 32-bit version: all runs of the installed setgid executable got to the splash screen. You mentioned what you have for the permissions of /var/games/angband, and those match what I see. What are the permissions on the contents of /var/games/angband? What I have from "ls -lR /var/games/angband" as root is

drwxrwx---  2 root  games  512 Apr 16 21:40 panic
drwxrwx---  2 root  games  512 Apr 16 21:40 save
drwxrwx---  2 root  games  512 Apr 16 21:40 scores

/var/games/angband/panic:
total 0

/var/games/angband/save:
total 0

/var/games/angband/scores:
total 0

Is there any trouble getting the artifact spoilers with "/usr/local/bin/angband -mspoil -a spoil.txt"? spoil.txt should be created in ~/.angband/Angband.

Running it under the debugger with a breakpoint on the quit() routine would be my suggestion for trying to debug this, but the usual recommendation for debugging a setgid/setuid executable is to attach the debugger (run as root) after the process has started. That requires some point where the process pauses/stops before doing what one wants to debug, and Angband does not have such a point without modifying main.c (it is tempting to add support for a -p ("pause") option to main.c which would stop the process allowing a debugger to be attached). Another quick hack to the source that could help trace what's happening is to modify splashscreen_note() in ui-display.c so what would be displayed in Angband's front end for the case that's not MSG_BIRTH is also echoed to stderr.

The fact that it is exiting for you with a zero error code suggests that a signal is triggering either the handle_signal_simple() or handle_signal_abort() handlers in ui-signals.c: those call quit(NULL) if a character has not been created yet.

@vext01
Copy link
Author

vext01 commented Apr 19, 2024

gdb

I printed the PID in quit() and added a sleep. Got this trace:

(gdb) bt
#0  _thread_sys_nanosleep () at /tmp/-:2
#1  0xbaed85794aeda159 in ?? ()
#2  0x00000eff23d6cf32 in _libc_nanosleep_cancel (timeout=0x76d62dfb1c28, remainder=0x76d62dfb1c18)
    at /usr/src/lib/libc/sys/w_nanosleep.c:27
#3  0x00000eff23d56334 in sleep (seconds=91) at /usr/src/lib/libc/gen/sleep.c:42
#4  0x00000efcf4c7ca0f in quit (str=0x0) at z-util.c:923
#5  0x00000efcf4bc4533 in handle_signal_abort (sig=13) at ui-signals.c:205
#6  <signal handler called>
#7  _thread_sys_write () at /tmp/-:2
#8  0xb89c4912aad6e661 in ?? ()
#9  0x00000eff23d84f02 in _libc_write_cancel (fd=10, buf=0xeffd675984c, nbytes=40)
    at /usr/src/lib/libc/sys/w_write.c:27
#10 0x00000effe7770d1f in _aucat_wmsg (hdl=0xeffd6759820, eof=0x76d62dfb22f0) at /usr/src/lib/libsndio/aucat.c:102
#11 0x00000effe7771ae5 in _aucat_open (hdl=0xeffd6759820, str=<optimized out>, mode=<optimized out>)
    at /usr/src/lib/libsndio/aucat.c:510
#12 0x00000effe777442e in _sio_aucat_open (str=0xeffe776c024 "snd/default", mode=1, nbio=0)
    at /usr/src/lib/libsndio/sio_aucat.c:161
#13 0x00000effe776f71b in sio_open (str=0xeffb4a9c87c "default", mode=1, nbio=0) at /usr/src/lib/libsndio/sio.c:65
#14 0x00000effb4bea024 in SNDIO_OpenDevice () from /usr/local/lib/libSDL2.so.0.15
#15 0x00000effb4af6795 in open_audio_device () from /usr/local/lib/libSDL2.so.0.15
#16 0x00000effba073dcb in Mix_OpenAudioDevice () from /usr/local/lib/libSDL2_mixer.so.1.1
#17 0x00000efcf4c9b3e6 in open_audio_sdl () at snd-sdl.c:77
#18 0x00000efcf4b658bf in init_sound (soundstr=0x0, argc=1, argv=0x76d62dfb2778) at sound-core.c:379
#19 0x00000efcf4c7fddf in generic_reinit () at main.c:180
#20 0x00000efcf4c7f79f in main (argc=1, argv=0x76d62dfb2778) at main.c:529

So this is audio-related. When I configure with --disable-sdl2-mixer, all is well.

There's another bug about broken sound on OpenBSD already (#4037). Although this looks like a different (additional?) issue, I'll probably just keep sound disabled in our package for now.

Should it have exited with status zero though? It would be nice if it could at least say "a fatal signal was encountered and angband is now exiting" followed by a non-zero exit code.

Thanks

@backwardsEric
Copy link
Contributor

That sound issue is different than #4037 which caused the game compiled with SDL 1.2 to hang when a sound was played. Changing how smpeg handled the monaural sound files packaged with Angband appeared to resolve that (proposed patch here: https://marc.info/?l=openbsd-ports&m=162187552125402&w=2 ). Converting the sound files to stereo would also avoid that problem. Here the issue is with Mix_OpenAudio() using libsndio: apparently a SIGPIPE signal when trying to set up the connection to the audio server.

I agree that the signal handlers in ui-signals.c should do something other than quit(NULL) so there's more information about what caused the game to exit when there isn't a character to save.

backwardsEric added a commit to backwardsEric/angband that referenced this issue Apr 21, 2024
NickMcConnell pushed a commit that referenced this issue Apr 21, 2024
NickMcConnell pushed a commit to NickMcConnell/FAangband that referenced this issue Apr 21, 2024
@NickMcConnell NickMcConnell modified the milestones: Triage, 4.2, Future May 12, 2024
@NickMcConnell
Copy link
Member

Setting this to future for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Port-specific Classification: Port-specific
Projects
None yet
Development

No branches or pull requests

3 participants