Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Cawbird 1.0.3 (current git tip) coredumps on startup #51

Closed
siebenmann opened this issue Oct 21, 2019 · 20 comments
Closed

Cawbird 1.0.3 (current git tip) coredumps on startup #51

siebenmann opened this issue Oct 21, 2019 · 20 comments
Labels
bug Something isn't working

Comments

@siebenmann
Copy link

I just built the latest git tip (tagged as v1.0.3), and it core dumps on startup for me but in a weird spot. Gdb says that the crash is:

#0  0x00005555555a35ae in multi_media_widget_set_all_media
    (self=0x5555570266b0, medias=0x56fbe4e0, medias_length1=<optimized out>)
    at cawbird@sta/src/widgets/MultiMediaWidget.c:192
#1  0x000055555558f529 in tweet_list_entry_construct
    (object_type=<optimized out>, tweet=0x555556f63300, main_window=main_window@entry=0x555555bd03d0, account=account@entry=0x555555aeb850, restrict_height=restrict_height@entry=0) at cawbird@sta/src/list/TweetListEntry.c:1384
#2  0x000055555559070c in tweet_list_entry_new
    (tweet=<optimized out>, main_window=main_window@entry=0x555555bd03d0, account=account@entry=0x555555aeb850, restrict_height=restrict_height@entry=0)
    at cawbird@sta/src/list/TweetListEntry.c:1473
#3  0x00005555555a5762 in tweet_list_box_widget_create_func
    (obj=0x555556f63300, self=0x555555c3c7e0)
    at cawbird@sta/src/widgets/TweetListBox.c:227

This is translated Vala code:

(gdb) list
187                                     }
188                                     _tmp1_ = FALSE;
189                                     if (!(i < medias_length1)) {
190                                             break;
191                                     }
192                                     _tmp3_ = medias[i];
193                                     _vala_assert (_tmp3_ != NULL, "medias[i] != null");
194                                     _tmp4_ = medias[i];
195                                     multi_media_widget_set_media (self, i, _tmp4_);
196                             }

This corresponds to set_all_media in src/widgets/MultiMediaWidget.vala but it appears impossible for this to happen, because the for loop looks like it should always constrain the value of i here so that it's a valid index. In the translated C code, though, it looks like medias itself could become corrupted without the length being bad, because they're passed as separate parameters and they're generated from separate pieces of information in src/CbTweet.c.

@IBBoard
Copy link
Owner

IBBoard commented Oct 21, 2019

Thanks for putting in a ticket. I found this yesterday when I built 1.0.3 on OBS and upgraded on my openSUSE Tumbleweed box and it failed. I spent about an hour trying to debug it, but couldn't get anywhere last night. Not helped by the fact that my local builds are perfectly fine and not affected!

I mentioned it on Twitter but didn't get round to putting a ticket in here (because I thought I'd be able to track it down and fix it).

@IBBoard
Copy link
Owner

IBBoard commented Oct 21, 2019

Currently working on some debugging but as it's not erroring on local builds then I'm having to wait for OBS to build it

Current patch at https://build.opensuse.org/package/view_file/home:IBBoard:desktop/cawbird/debug.diff?expand=1 if anyone can recreate it locally and wants to test it out/ feed back.

@IBBoard
Copy link
Owner

IBBoard commented Oct 21, 2019

So far I know that:
a) It's nothing to do with stripping the executable
b) I really can't reproduce it with a local build, only with the OBS build (despite Meson arguments and output looking identical - not checked Ninja yet)
c) OBS is going really slowly and hasn't made a build for me all evening

I'll try more tomorrow. In the meantime, I deleted the v1.0.3 tag to make it clearer that the release is a dud and not to be used.

@valerierx
Copy link

valerierx commented Oct 21, 2019

Compiled with the patch, still crashes
GDB:

#0  0x000055555559e9cf in  ()
#1  0x000055555558a243 in  ()
#2  0x00005555555a0c08 in  ()
#3  0x00007ffff7a1cdc1 in  () at /usr/lib/libgtk-3.so.0
#4  0x00007ffff72f2d3a in g_closure_invoke () at /usr/lib/libgobject-2.0.so.0
#5  0x00007ffff72e088e in  () at /usr/lib/libgobject-2.0.so.0
#6  0x00007ffff72e498a in g_signal_emit_valist ()
    at /usr/lib/libgobject-2.0.so.0
#7  0x00007ffff72e57f0 in g_signal_emit () at /usr/lib/libgobject-2.0.so.0
#8  0x00005555555b908a in  ()
#9  0x00005555555d7dde in  ()
#10 0x00005555555be868 in  ()
#11 0x00005555555be9ed in  ()
#12 0x00007ffff73d6c24 in  () at /usr/lib/libgio-2.0.so.0
#13 0x00007ffff73d6c59 in  () at /usr/lib/libgio-2.0.so.0
#14 0x00007ffff7ed62cf in g_main_context_dispatch ()
    at /usr/lib/libglib-2.0.so.0
#15 0x00007ffff7ed8211 in  () at /usr/lib/libglib-2.0.so.0
#16 0x00007ffff7ed8251 in g_main_context_iteration ()
    at /usr/lib/libglib-2.0.so.0
#17 0x00007ffff73b99de in g_application_run () at /usr/lib/libgio-2.0.so.0
#18 0x000055555556c358 in  ()
#19 0x00007ffff6dfd153 in __libc_start_main () at /usr/lib/libc.so.6
--Type <RET> for more, q to quit, c to continue without paging--
#20 0x000055555556c22e in  ()
(gdb) bt full
#0  0x000055555559e9cf in  ()
#1  0x000055555558a243 in  ()
#2  0x00005555555a0c08 in  ()
#3  0x00007ffff7a1cdc1 in  () at /usr/lib/libgtk-3.so.0
#4  0x00007ffff72f2d3a in g_closure_invoke () at /usr/lib/libgobject-2.0.so.0
#5  0x00007ffff72e088e in  () at /usr/lib/libgobject-2.0.so.0
#6  0x00007ffff72e498a in g_signal_emit_valist ()
    at /usr/lib/libgobject-2.0.so.0
#7  0x00007ffff72e57f0 in g_signal_emit () at /usr/lib/libgobject-2.0.so.0
#8  0x00005555555b908a in  ()
#9  0x00005555555d7dde in  ()
#10 0x00005555555be868 in  ()
#11 0x00005555555be9ed in  ()
#12 0x00007ffff73d6c24 in  () at /usr/lib/libgio-2.0.so.0
#13 0x00007ffff73d6c59 in  () at /usr/lib/libgio-2.0.so.0
#14 0x00007ffff7ed62cf in g_main_context_dispatch ()
    at /usr/lib/libglib-2.0.so.0
#15 0x00007ffff7ed8211 in  () at /usr/lib/libglib-2.0.so.0
#16 0x00007ffff7ed8251 in g_main_context_iteration ()
    at /usr/lib/libglib-2.0.so.0
#17 0x00007ffff73b99de in g_application_run () at /usr/lib/libgio-2.0.so.0
#18 0x000055555556c358 in  ()
#19 0x00007ffff6dfd153 in __libc_start_main () at /usr/lib/libc.so.6

@IBBoard IBBoard changed the title Cawbird 1.0.3 (current git tip) coredumps on startup on Fedora 30 x86_64 Cawbird 1.0.3 (current git tip) coredumps on startup Oct 21, 2019
@IBBoard
Copy link
Owner

IBBoard commented Oct 21, 2019

That's odd. That stack trace is completely different and shows nothing that I can see as Cawbird code!

Did you do anything to trigger that, or was it just on start-up? And what was the last bit of Cawbird logging before it?

You might need to run G_MESSAGES_DEBUG=cawbird cawbird to see all of the debug logs.

@valerierx
Copy link

(cawbird:100855): cawbird-DEBUG: 00:05:03.441: MultiMediaWidget.vala:35: XXX Setting 1 medias (cawbird:100855): cawbird-DEBUG: 00:05:03.442: TweetListEntry.vala:231: XXX Setting 1 quote tweet media for 1186267428956921856 (1186266427969429509) (cawbird:100855): cawbird-DEBUG: 00:05:03.442: MultiMediaWidget.vala:35: XXX Setting 1 medias Erreur de segmentation (core dumped

@schmittlauch
Copy link
Contributor

FYI: The NixOS build of 21dd7df does not segfault.

@valerierx
Copy link

1.0.2 does not segfault

@valerierx
Copy link

I think some commits need to be reverted

@siebenmann
Copy link
Author

I can reliably reproduce this crash, but I apparently cannot get a copy of that patch from the OpenSUSE website (even after going through the hassle of registering). Can you put it on Github here in some accessible format, such as an attachment to this issue?

@valerierx
Copy link

@siebenmann
Copy link
Author

Here's the last chunk of messages from a crash:

(cawbird:27801): cawbird-DEBUG: 10:51:15.360: XXX Parsing entities for tweet 1186653122787192833
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX media_count = 0
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX media_count with extended = 0
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX media_count with = 1
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX Found no media - checking expanded URLs
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX Found 0 media in total
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX Parsing entities for tweet 1186651367479431168
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX media_count = 1
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX media_count with extended = 2
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX media_count with = 2
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX Has media element
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX Media URL 0 of 1 is https://t.co/S3E6kF7dzo
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX Has media element for media array
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX Has extended_entities element for media array
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX Checking media array 0
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX Media 0 of 1 is photo
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX Found media candidate 0
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX Checking media array 1
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX Media 0 of 1 is photo
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX Found duplicate image at 0
(cawbird:27801): cawbird-DEBUG: 10:51:15.361: XXX Found 1 media in total
(cawbird:27801): cawbird-DEBUG: 10:51:15.363: TweetListEntry.vala:231: XXX Setting 1 quote tweet media for 1186653122787192833 (1186651367479431168)
(cawbird:27801): cawbird-DEBUG: 10:51:15.363: MultiMediaWidget.vala:35: XXX Setting 1 medias
segmentation violation--core dumped

@siebenmann
Copy link
Author

As an additional piece of information, the crash doesn't happen if I run the same binary (with the same environment variable settings) under valgrind. When run under valgrind, the immediately following messages after the messages about tweet ID 1186653122787192833 are:

(cawbird:10810): cawbird-DEBUG: 10:57:31.362: XXX Parsing entities for tweet 1186652527959465984
(cawbird:10810): cawbird-DEBUG: 10:57:31.363: XXX media_count = 0
(cawbird:10810): cawbird-DEBUG: 10:57:31.364: XXX media_count with extended = 0
(cawbird:10810): cawbird-DEBUG: 10:57:31.365: XXX media_count with = 0
(cawbird:10810): cawbird-DEBUG: 10:57:31.367: XXX Found no media - checking expanded URLs
(cawbird:10810): cawbird-DEBUG: 10:57:31.374: XXX Found 0 media in total

In case it's useful I'll attach the full output for both a non-valgrind crash and a valgrind session.
debug-crash.txt
debug-valgrind.txt

@IBBoard
Copy link
Owner

IBBoard commented Oct 22, 2019

@siebenmann It's odd you can't get the patch. I specifically tested from a Private Browsing window to make sure the file was accessible.

A colleague mentioned optimisation issues. I don't know whether that fits with it running fine through valgrind.

@schmittlauch Thanks for the info about NixOS. There must be something in the build, not just my environment. Do you know what optimisation settings it uses?

@undevdecatos Unfortunately, the bit causing the crash was the big change for v1.0.3, and there's not really anything to unpick and revert as it's all so integrated. If we revert the bits around showing the quoted tweet media then we won't trigger the bug. If we revert the bits around parsing the quoted tweet media separately then we won't have anything to display!

@IBBoard
Copy link
Owner

IBBoard commented Oct 22, 2019

Possible red herring, but something that I'd like to check against NixOS, @schmittlauch - what does file on the executable give you?

The OBS builds give me:

/usr/bin/cawbird: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=740875b496ac2b08f7be51d36f0531ef6a6d7270, for GNU/Linux 3.2.0, stripped

My local build (after stripping) gives me:

build/cawbird: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=3624c57a96d37973c67393cb87632ab520f09440, for GNU/Linux 3.2.0, stripped

"pie" is Position Independent Executables. I'm wondering whether something about that process is causing the difference in behaviour. It doesn't explain why it segfaults, but it might be a clue.

EDIT: Bullseye! PIE breaks it. I install gcc-PIE and now my local build fails. This should speed up debugging.

@siebenmann
Copy link
Author

My build is also PIE. I'm running it out of the raw mock build hierarchy, and file reports it as:

cawbird: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=f5cbfe9f2cb0b20f3cbbbb8482e1036e7ecfa0da, with debug_info, not stripped, too many notes (256)

As far as OBS does, I may have been trying to get the debug.diff file the wrong way. I couldn't select and copy it out of the direct link, so I went to the Cawbird 'overview' page and clicked on the download link. This challenged me with a HTTP Basic authentication challenge asking for an 'API login'.

@IBBoard
Copy link
Owner

IBBoard commented Oct 22, 2019

Huh, odd. I could select and copy the text from the page. I know the download isn't available without a login, but maybe you need to be the owner.

At least we appear to have found the consistent feature that's changing how it behaves.

@IBBoard
Copy link
Owner

IBBoard commented Oct 22, 2019

Well, that was annoying.

If I had the time, I'd strip all of the C out of Cawbird and move it all to Vala!

@schmittlauch
Copy link
Contributor

While this is closed (congrats) I can confirm that the NixOS build is not PIE:

file /nix/store/xb57kvcliccglfsv2fbqd66sghlnirr0-cawbird-1.0.3/bin/.cawbird-wrapped                  
/nix/store/xb57kvcliccglfsv2fbqd66sghlnirr0-cawbird-1.0.3/bin/.cawbird-wrapped: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/hnp72ygssavd526rp7qk3yjylyc2klcw-glibc-2.27/lib/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, not stripped

@IBBoard
Copy link
Owner

IBBoard commented Oct 22, 2019

That explains why it wasn't affected. My local builds weren't PIE at first either, and apparently you can reference a C function that is only in the .c file (not the .h file) without a problem if it is not then built as a PIE.

If it did throw a warning about that at build time then I never saw it.

@IBBoard IBBoard added the bug Something isn't working label Apr 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants