Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory corruption when playing at 1440p #181

Closed
winny- opened this issue Apr 10, 2020 · 5 comments
Closed

Memory corruption when playing at 1440p #181

winny- opened this issue Apr 10, 2020 · 5 comments

Comments

@winny-
Copy link

winny- commented Apr 10, 2020

When playing alephone at 1440p exiting the game back to desktop usually results in a segfault. Running alephone in valgrind shows invalid writes in precalculate_bitmap_row_addresses that causes valgrind to crash due to heap corruption. Outside of valgrind, memory corruption does not become apparent until I exit the game, with a segfault happening during glibc's free() finding broken invariants. All the text snippets below are in this gist, along with additional information.

None of these problems have occurred for me playing at 1080p and lower resolutions.

I had noticed significant visual corruption during a previous session; unfortunately I had not captured images of that situation. I'll report back if I manage to create more visual artifacts.

In most cases after playing at 1440p, I get a crash on exit due to glibc's free() catching memory corruption:

winston@snowcrash ~ $ alephone /usr/share/alephone-marathon/
Aleph One Linux 2019-03-31 1.3b3
https://alephone.lhowon.org/

Original code by Bungie Software <http://www.bungie.com/>
Additional work by Loren Petrich, Chris Pruett, Rhys Hill et al.
TCP/IP networking by Woody Zenfell
Expat XML library by James Clark
SDL port by Christian Bauer <Christian.Bauer@uni-mainz.de>

This is free software with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
For details, see the file COPYING.

Built with network play enabled.

Built with Lua scripting enabled.
GL_VENDOR: NVIDIA Corporation
GL_RENDERER: GeForce GTX 760/PCIe/SSE2
GL_VERSION: 4.6.0 NVIDIA 440.64
corrupted size vs. prev_size
Aborted (core dumped)

Crash backtrace (occurs while exiting alephone) (backtrace full available here)

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f9069082545 in __GI_abort () at abort.c:79
#2  0x00007f90690def18 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f90691f4a97 "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007f90690e6bd8 in malloc_printerr (str=str@entry=0x7f90691f2bab "corrupted size vs. prev_size") at malloc.c:5366
#4  0x00007f90690e758e in unlink_chunk (p=p@entry=0x5634b43d5100, av=0x7f9069228c40 <main_arena>) at malloc.c:1468
#5  0x00007f90690e8a7b in _int_free (av=0x7f9069228c40 <main_arena>, p=0x5634b43d10d0, have_lock=<optimized out>) at malloc.c:4354
#6  0x00007f9067d3bccf in pa_xfree () at /usr/lib64/libpulse.so.0
#7  0x00007f9067463492 in pa_mempool_unref () at /usr/lib64/pulseaudio/libpulsecommon-13.0.so
#8  0x00007f9067d10eb1 in  () at /usr/lib64/libpulse.so.0
#9  0x00007f9069832114 in DisconnectFromPulseServer (mainloop=0x5634b43ddb90, context=0x5634b43e25b0) at /usr/src/debug/media-libs/libsdl2-2.0.10/SDL2-2.0.10/src/audio/pulseaudio/SDL_pulseaudio.c:270
#10 0x00007f9069832567 in PULSEAUDIO_CloseDevice (this=0x5634b43ddab0) at /usr/src/debug/media-libs/libsdl2-2.0.10/SDL2-2.0.10/src/audio/pulseaudio/SDL_pulseaudio.c:461
#11 0x00007f906978c9cd in close_audio_device (device=0x5634b43ddab0) at /usr/src/debug/media-libs/libsdl2-2.0.10/SDL2-2.0.10/src/audio/SDL_audio.c:1143
#12 0x00005634b32cd536 in Mixer::Stop() (this=0x5634b436fee0) at Mixer.cpp:70
#13 0x00005634b32d4f34 in SoundManager::SetStatus(bool) (this=this@entry=0x5634b43756e0, active=active@entry=false) at Mixer.h:43
#14 0x00005634b32d6459 in SoundManager::SetStatus(bool) (active=false, this=<optimized out>) at SoundManager.cpp:722
#15 SoundManager::Shutdown() (this=<optimized out>) at SoundManager.cpp:185
#16 0x00007f906909bc40 in __run_exit_handlers (status=0, listp=0x7f9069228718 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#17 0x00007f906909bd8a in __GI_exit (status=<optimized out>) at exit.c:139
#18 0x00007f9069083eb2 in __libc_start_main (main=0x5634b2f949f0 <main(int, char**)>, argc=2, argv=0x7ffe58196548, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe58196538) at ../csu/libc-start.c:342
#19 0x00005634b2fa460a in _start () at /usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/include/g++-v9/bits/stl_vector.h:94

Valgrind trace of the offending bad write. More traces details here:

==15272== Invalid write of size 8
==15272==    at 0x46FBD4: precalculate_bitmap_row_addresses(bitmap_definition*) (textures.cpp:125)
==15272==    by 0x4AA413: render_screen(short) (screen.cpp:1376)
==15272==    by 0x371C36: idle_game_state(unsigned int) (interface.cpp:1208)
==15272==    by 0x18293F: main_event_loop (shell.cpp:805)
==15272==    by 0x18293F: main (shell.cpp:346)
==15272==  Address 0xb075dd8 is 0 bytes after a block of size 9,640 alloc'd
==15272==    at 0x483577F: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==15272==    by 0x4AC602: alephone::Screen::Initialize(screen_mode_data*) (screen.cpp:186)
==15272==    by 0x18278A: initialize_application (shell.cpp:593)
==15272==    by 0x18278A: main (shell.cpp:335)
@winny-
Copy link
Author

winny- commented Apr 11, 2020

Did some additional testing on another machine, this one has an intel cpu and intel graphics card - running the same OS, but using Mesa for opengl. Bug manifests in same way - bad write in precalculate_bitmap_row_addresses when playing with the game configured to render at 2560x1440, and usually crashes at game exit. Like reported earlier, the corruption bug doesn't appear to occur when playing at 1920x1080 and lower resolutions.

I did manage to crash the game when interacting with a pattern buffer. Another occasion the lua script that draws the UI appeared to crash, with the UI disappearing, but the game continuing ([string "HUD Lua"]:310: bad argument #1 to '__index' (Player expected, got userdata) (lua_script.cpp:1118) ).

More backtraces here. They look remarkably identical, so I didn't include any codeblocks here.

@LidMop
Copy link
Collaborator

LidMop commented Apr 11, 2020

Great info! Looks like world_pixels_structure is being allocated with a fixed size that doesn't support vertical screen sizes larger than 1200. If no one else is already on this, I can push a fix next week.

@winny-
Copy link
Author

winny- commented Apr 12, 2020

This quick and dirty patch appears to have fix the issue by bumping the MAXIMUM_WORLD_(HEIGHT|WIDTH) to 8K (7680x4320), but I wonder if this is the best way to resolve the issue… I imagine in a couple years somebody will open a ticket with the same bug on an even higher resolution display.

@TrajansRow
Copy link
Collaborator

Probably the ‘right’ fix would be to not have a hard limit, and instead alloc world_pixels_structure based on whatever the actual screen resolution is. The work there will be handling resolution changes, and re-initializing world_pixels_structure.

A simpler option might be to just prevent the user from selecting a resolution higher than the max.

I’m also not seeing anything that uses MAXIMUM_WORLD_WIDTH, so at least there is no horizontal limit.

@LidMop LidMop closed this as completed in a61e96d Apr 16, 2020
@winny-
Copy link
Author

winny- commented Apr 19, 2020

Looks perfect! I did testing of the commit and was unable to trigger the bug. Valgrind is also happy. Thanks for the responsive discussion and fix =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants