New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diesel Segfaults on host, but not development machine #813

Closed
skeleten opened this Issue Mar 19, 2017 · 17 comments

Comments

Projects
None yet
4 participants
@skeleten

skeleten commented Mar 19, 2017

Hello,
I'm in the progress of writing a small discord bot using diesel.
However my application seems to receive a segfault on my VPS host (uname -a: Linux ubuntu-512mb-fra1-01 4.4.0-67-generic #88-Ubuntu SMP Wed Mar 8 16:34:45 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux; its the smallest size of an digitalocean droplet).
It does not do so on my development machine (uname -a: Linux charon 4.10.2-1-ARCH #1 SMP PREEMPT Mon Mar 13 17:13:41 CET 2017 x86_64 GNU/Linux)

Within GDB I get the following message for the segfautl:

Thread 1 "skellybot" received signal SIGSEGV, Segmentation fault.
0x0000555555a4d5f7 in je_tcache_dalloc_small (binind=<optimized out>, ptr=0x7ffff1ca5880, tcache=0x7ffff1c0c000, tsd=<optimized out>, slow_path=<optimized out>)
at /checkout/src/liballoc_jemalloc/../jemalloc/include/jemalloc/internal/tcache.h:42

bt gives me the following stacktrace:

#0  0x0000555555a4d5f7 in je_tcache_dalloc_small (binind=<optimized out>, ptr=0x7ffff1ca5880, tcache=0x7ffff1c0c000, tsd=<optimized out>, slow_path=<optimized out>)
    at /checkout/src/liballoc_jemalloc/../jemalloc/include/jemalloc/internal/tcache.h:422                                                                           
#1  je_arena_dalloc (tcache=0x7ffff1c0c000, ptr=0x7ffff1ca5880, tsd=<optimized out>, slow_path=<optimized out>)
    at /checkout/src/liballoc_jemalloc/../jemalloc/include/jemalloc/internal/arena.h:1370                      
#2  je_idalloctm (is_metadata=false, tcache=0x7ffff1c0c000, ptr=0x7ffff1ca5880, tsd=<optimized out>, slow_path=<optimized out>) at include/jemalloc/internal/jemalloc_internal.h:1055
#3  je_iqalloc (tcache=0x7ffff1c0c000, ptr=0x7ffff1ca5880, tsd=<optimized out>, slow_path=<optimized out>) at include/jemalloc/internal/jemalloc_internal.h:1079
#4  ifree (slow_path=false, tcache=0x7ffff1c0c000, ptr=0x7ffff1ca5880, tsd=<optimized out>) at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1819
#5  free (ptr=0x7ffff1ca5880) at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1922
#6  0x00007ffff7bb09d9 in ?? () from /usr/lib/x86_64-linux-gnu/libpq.so.5
#7  0x0000555555627670 in diesel::pg::connection::raw::{{impl}}::drop (self=0x7fffffff92e0)
    at /home/skeleten/.cargo/registry/src/github.com-1ecc6299db9ec823/diesel-0.11.4/src/pg/connection/raw.rs:95
#8  0x00005555555e53e1 in drop::hc00e4affa891dcbd () at /checkout/src/libcore/iter/iterator.rs:139
#9  0x00005555555e4411 in drop::h82bb99e6329dd1c6 () at /checkout/src/libcore/iter/iterator.rs:139
#10 0x000055555560ee02 in skellybot::user_seen (ctx=0x7fffffffb478, uid=0x7fffffffa1c0) at /home/skeleten/Source/skeleten/skellybot/src/main.rs:139
#11 0x000055555560de02 in skellybot::message_create_event (ctx=0x7fffffffb478, msg=Message = {...}) at /home/skeleten/Source/skeleten/skellybot/src/main.rs:113
#12 0x000055555560c7ee in skellybot::run () at /home/skeleten/Source/skeleten/skellybot/src/main.rs:89
#13 0x000055555560b5e7 in skellybot::main () at /home/skeleten/Source/skeleten/skellybot/src/main.rs:32
#14 0x0000555555a44146 in std::panicking::try::do_call<fn(),()> () at /checkout/src/libstd/panicking.rs:454
#15 0x0000555555a4b4fb in panic_unwind::__rust_maybe_catch_panic () at /checkout/src/libpanic_unwind/lib.rs:98
#16 0x0000555555a44bf7 in std::panicking::try<(),fn()> () at /checkout/src/libstd/panicking.rs:433
#17 std::panic::catch_unwind<fn(),()> () at /checkout/src/libstd/panic.rs:361
#18 std::rt::lang_start () at /checkout/src/libstd/rt.rs:57
#19 0x0000555555615fe3 in main ()

The complete source code I'm runnning can be found under https://github.com/skeleten/skellybot/tree/ece2a04ec61eaa8a1e62ecc3997aa0b7e4e99c6a

The Segfaults happens after receiving the first message which is processed here https://github.com/skeleten/skellybot/blob/ece2a04ec61eaa8a1e62ecc3997aa0b7e4e99c6a/src/main.rs#L111

@killercup

This comment has been minimized.

Member

killercup commented Mar 19, 2017

I'm really bad at debugging stuff like this, so all I can offer is a bunch of links to source code.

In your stacktrace, I see the following line from diesel

#7  0x0000555555627670 in diesel::pg::connection::raw::{{impl}}::drop (self=0x7fffffff92e0)
   at /home/skeleten/.cargo/registry/src/github.com-1ecc6299db9ec823/diesel-0.11.4/src/pg/connection/raw.rs:95

which is this code:

unsafe { PQfinish(self.internal_connection) };

(Of course the segfault is in unsafe code!)

Looking further, it seems that this drop is called at the end of the user_seen function, when conn goes out of scope. It was called here. The connection comes from this function, which basically just calls diesel's PgConnection::establish(&db_url).

@skeleten

This comment has been minimized.

skeleten commented Mar 19, 2017

The intrigung part in my opinion is, that user_seen executes before (on the ServerCreate event) but only segfaults, when called by the message_create_event function. Is there any additional information I can provide?

@sgrif

This comment has been minimized.

Member

sgrif commented Mar 19, 2017

Are you compiling on the host itself, or from your development machine? Can you confirm that the versions of libpq are the same?

@skeleten

This comment has been minimized.

skeleten commented Mar 19, 2017

It seems my development machine is using

local/postgresql-libs 9.6.1-3
    Libraries for use with PostgreSQL

while on my host machine it is libpq-dev is 9.5.6-0ubuntu0.16.04

@sgrif

This comment has been minimized.

Member

sgrif commented Mar 19, 2017

Are you compiling on the host or target? (I don't think that should cause a segfault, but good to rule out)

@sgrif

This comment has been minimized.

Member

sgrif commented Mar 19, 2017

Also do you know whether the connection actually successfully established or not? Can you check if user_seen was going to return an Ok or Err had it not segfaulted? It seems like the most likely cause is that we're not handling some error condition properly.

@skeleten

This comment has been minimized.

skeleten commented Mar 19, 2017

I'm compiling on my development macheni (as server does not seem to have the resources needed to compile the crate)

@skeleten

This comment has been minimized.

skeleten commented Mar 19, 2017

Stepping through the function with GDB implies its establishing the connection successfully (it does not return early)

@sgrif

This comment has been minimized.

Member

sgrif commented Mar 19, 2017

Can you try statically linking libpq? (export PQ_LIB_STATIC = 1 and then cargo clean should do it)

@skeleten

This comment has been minimized.

skeleten commented Mar 19, 2017

It seems it won't let me compile on my Arch machine

error: could not find native static library `pq`, perhaps an -L flag is missing?

error: Could not compile `pq-sys`.

This also happens under a OpenSUSE WSL.

@skeleten

This comment has been minimized.

skeleten commented Mar 19, 2017

I updated the postgres version on my host to 9.6.2 and built it on an ubuntu version with the same version. It still segfaults though.

@sgrif

This comment has been minimized.

Member

sgrif commented Mar 19, 2017

Thanks. You've given me information to reproduce, so I will try to look into it (to be honest though I don't have any ideas). The only thing I can think to try is to compile on your host machine.

@skeleten

This comment has been minimized.

skeleten commented Mar 19, 2017

Trying to compile on my host machine literally runs out of memory :( Maybe I'll upscale it for a bit to try out, though

@skeleten

This comment has been minimized.

skeleten commented Mar 19, 2017

I did upsize the droplet to be able to actually compile it on the host; it still shows the same behaviour though.
Edit: On my local VM (Ubuntu 16.04) it works seems to work just fine as well.

@sgrif

This comment has been minimized.

Member

sgrif commented Dec 16, 2017

Closing as this issue has been stale for a while, and there's still nothing actionable we can do. If this issue is still occurring or you can provide additional information, let me know and I'll reopen.

@sgrif sgrif closed this Dec 16, 2017

@gnmerritt

This comment has been minimized.

gnmerritt commented Nov 5, 2018

I'm seeing what appears to be a similar segfault when running cargo test on travis-ci or in the travis docker container. The tests pass fine on my mac, and it seems that the problem started when I added multiple tests that make PG connections.

Happy to try and help run this down, please let me know what additional information I can provide.

diesel 1.3.3, postgres 9.2.24

$ uname -a
Linux d63a69da3a83 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ rustc --version
rustc 1.32.0-nightly (451987d86 2018-11-01)
$ cargo --version
cargo 1.32.0-nightly (1fa308820 2018-10-31)
...
#21 0x00007f9d33a30837 in ?? () from /usr/lib/x86_64-linux-gnu/libpq.so.5
#22 0x00007f9d33a308d6 in PQfinish () from /usr/lib/x86_64-linux-gnu/libpq.so.5
#23 0x000055bae32e2f76 in _$LT$diesel..pg..connection..raw..RawConnection$u20$as$u20$core..ops..drop..Drop$GT$::drop::h9d3078a13338664b (self=0x7f9d23062240)
    at /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/diesel-1.3.3/src/pg/connection/raw.rs:101
#24 0x000055bae201746f in core::ptr::drop_in_place::h7521eae5e9bdbb05 () at /rustc/451987d86c89b38ddd8c4c124f1b9b6d4ded6983/src/libcore/ptr.rs:194
...

example failing job on travis: https://travis-ci.org/otterandrye/photothing-api/jobs/449963414

@gnmerritt

This comment has been minimized.

gnmerritt commented Nov 5, 2018

there's nothing helpful in the DB logs either, just seeing the connections drop when the test binary segfaults:

2018-11-05 03:33:29 UTC LOG: could not receive data from client: Connection reset by peer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment