Segfault in LuaTable::set #18

jonas-schievink · 2017-07-21T16:47:21Z

After a long debugging session, I've noticed that cargo test --release fails with a segfault on the current nightly.

This might as well be a regression of the compiler, I'm not yet sure.

The text was updated successfully, but these errors were encountered:

jonas-schievink · 2017-07-21T16:55:00Z

Valgrind stacktrace (take this with a grain of salt, it's optimized after all):

==11084== Invalid read of size 8
==11084==    at 0x14F0E6: lua_touserdata (in /home/jonas/dev/rlua/target/release/rlua-c35ad98123863f87)
==11084==    by 0x1256DA: rlua::util::error_guard::call_impl (in /home/jonas/dev/rlua/target/release/rlua-c35ad98123863f87)
==11084==    by 0x601C9CF: ???
==11084==    by 0x162641: luaD_callnoyield (in /home/jonas/dev/rlua/target/release/rlua-c35ad98123863f87)
==11084==    by 0x160351: luaD_rawrunprotected (in /home/jonas/dev/rlua/target/release/rlua-c35ad98123863f87)
==11084==    by 0x162DDE: luaD_pcall (in /home/jonas/dev/rlua/target/release/rlua-c35ad98123863f87)
==11084==    by 0x152354: lua_pcallk (in /home/jonas/dev/rlua/target/release/rlua-c35ad98123863f87)
==11084==    by 0x127CA2: rlua::lua::LuaTable::set (in /home/jonas/dev/rlua/target/release/rlua-c35ad98123863f87)
==11084==    by 0x144779: rlua::tests::test_error (in /home/jonas/dev/rlua/target/release/rlua-c35ad98123863f87)

==11084== by 0x601C9CF: ??? looks pretty suspicious.

Instruction at fault:

Dump of assembler code for function lua_touserdata:
   0x000055555559b0e0 <+0>:	sub    rsp,0x8
   0x000055555559b0e4 <+4>:	test   esi,esi
=> 0x000055555559b0e6 <+6>:	mov    r8,QWORD PTR [rdi+0x20]

rdi is the first argument, ie. the pointer to the Lua state, and has a value of 0x40 (64), which is obviously not a valid pointer.

I suspect that luaD_callnoyield calls a bad function pointer, causing this mess.

kyren · 2017-07-21T18:18:03Z

hmmmm, that's concerning, it's ONLY on the current nightly compiler? Do you have a reproduction I could look at?

jonas-schievink · 2017-07-21T18:47:22Z

It's also at least on yesterday's nightly, but I neither have the bandwidth nor the disk space to bisect. The reproduction is just to run cargo test --release (I'm referring to rluas own tests here). Maybe it won't reproduce on other platforms, though.

EDIT: I'm on x64 Linux, for reference

jonas-schievink · 2017-07-22T12:39:13Z

The easiest way to cause this:

let lua = Lua::new();
lua.globals().set("bla", LuaNil).unwrap();

It happens with any value, not just LuaNil.

kyren · 2017-07-23T06:27:36Z

It doesn't seem to be failing anymore, but it might be because I have "fixed" #19. I say "fixed" because I'm half convinced nothing will ever be fixed again, and that the Lua API is designed to drive me insane.

jonas-schievink · 2017-07-23T10:33:47Z

@kyren I can still reproduce this

kyren · 2017-07-23T14:15:44Z

It's not failing on travis, maybe I was incorrectly assuming travis was x86_64 linux? Let me actually boot up a vm and try it.

kyren · 2017-07-23T14:21:39Z

I completely can't reproduce this, all tests pass on my linux x86_64 vm on the latest nightly.

I realize that's not super helpful.

jonas-schievink · 2017-07-23T14:39:46Z

Odd. Meanwhile, I've also reproduced this on ARM (also Linux). My exact rustc version, for the record:

rustc 1.20.0-nightly (ae98ebfcb 2017-07-20)
binary: rustc
commit-hash: ae98ebfcb9ad5a5384fd229a6ee91315b02ca969
commit-date: 2017-07-20
host: x86_64-unknown-linux-gnu
release: 1.20.0-nightly
LLVM version: 4.0

Also reproduces on rustc 1.20.0-nightly (8d22af87d 2017-07-22) (the current one).

Maybe it's a difference between C compilers that's exposing UB? I'm using gcc version 7.1.1 20170630, but I can reproduce with clang 4.0.1, too.

kyren · 2017-07-23T14:49:18Z

I should be more clear, I said:

It doesn't seem to be failing anymore

But what I should have said is I actually have not seen this fail at all so far.

kyren · 2017-07-23T14:50:46Z

I have tried gcc version 6.3, but I can try 7.1.1. I've also tried whatever the latest apple clang is, I guess Apple LLVM version 8.1.0 (clang-802.0.42)

kyren · 2017-07-23T15:04:08Z

Actually can you just tell me what OS you're using, that might be faster than trying to set up a specific compiler version

jonas-schievink · 2017-07-23T15:12:24Z

Oh, right. Arch Linux.

Currently bisecting by the way, so no need to start that if you manage to reproduce.

jonas-schievink · 2017-07-23T15:20:22Z

Result	rustup version	`-Vv`
good	`nightly-2017-07-14`	`rustc 1.20.0-nightly (b2c070787 2017-07-13)`
bad	`nightly-2017-07-15`	`rustc 1.20.0-nightly (6d9d82d3d 2017-07-14)`

jonas-schievink · 2017-07-23T15:29:17Z

The following PRs were merged between those:

rust-lang/rust#43180
rust-lang/rust#43175
rust-lang/rust#43026 <- seems like a likely candidate.
rust-lang/rust#43216
rust-lang/rust#43204
rust-lang/rust#43203
rust-lang/rust#43187
rust-lang/rust#43167
rust-lang/rust#43157
rust-lang/rust#43125
rust-lang/rust#42926

kyren · 2017-07-23T15:34:44Z

Nice sleuthing!

kyren · 2017-07-23T16:15:08Z

So, you possibly won't like this answer, but I have now installed arch linux x86_64 in a vm and run rlua tests and they all pass. I'm not sure where to go from here :(

jonas-schievink · 2017-07-23T16:32:20Z

Wow, thanks for putting all the effort into this bug!

The only difference is that I'm running a slightly different kernel now, I'll check out and see if that makes a difference although it shouldn't affect anything really. Just to make sure: You did run with --release, right?

I'll look into more ways to properly debug/reproduce this, then.

kyren · 2017-07-23T16:35:35Z

The only difference is that I'm running a slightly different kernel now, I'll check out and see if that makes a difference although it shouldn't affect anything really. Just to make sure: You did run with --release, right?

AHA, thank you, I was missing that part. NOW I can reproduce it.

I had completely forgotten that was not the default mode of test, because I don't usually run cargo directly, I have a Makefile that has a test target in it, and THAT sets --release. Sorry for all that!

jonas-schievink · 2017-07-23T16:40:12Z

Ah, okay, so I'm not going insane yet. Good :)

kyren · 2017-07-23T16:40:23Z

Forget everything I said before, it's reproducible without a VM, you just have to ACTUALLY run in release.

PEBKAC

kyren · 2017-07-23T17:45:46Z

Okay, for real this time, I think this is fixed by 44c99ea.

I'm sure error_guard was subtly wrong somehow with regards to.. I guess maybe mutable aliasing? I'm not sure, because I abandoned the approach as too difficult to do correctly.

kyren · 2017-07-23T17:55:52Z

So, really sorry about not seeing the bug at first, you were absolutely correct that there was a problem.

Now that I have a bit of time to breathe, are we possibly back to the "only" problem left being that errors in __gc metamethods are unsound? Are all the rest of these fun soundness issues solved?

I have no idea how to even approach dealing with errors in __gc metamethods, other than simply writing a LOT of protected versions of ffi calls. I don't think that's really a workable solution, it would be massively slow and awkward, as well as slow at runtime. SURELY there must be a better way than that?

Is there some way possibly of forcing errors in __gc to simply always panic? That's precisely what would happen when called from the top level of Lua, but in any sort of callback, there's always going to be a protected lua call somewhere up the stack that has called setjmp that will handle it (which is precisely what we don't want).

If I even had an idea of how to approach it I would feel much better, right now I really don't know how to proceed

jonas-schievink · 2017-07-23T18:29:19Z

Great, that seems to fix it for me as well!

Apart from what you mentioned, there's also #21, but it's unclear if my assumptions about setjmp/longjmp always being UB are correct.

Is there some way possibly of forcing errors in __gc to simply always panic?

Since there's no way except pcall to find out that this happens, no.

Well, technically... m means that both out-of-memory and errors in __gc cause the function to raise an error. OOM can be handled by providing our own allocator that simply aborts on OOM like Rust already does by default - in fact, the default allocator might already do this. As for __gc, it might be possible to disable GC to prevent those from being called, but we need some way to reenable it at the right moments so we don't cause a leak.

kyren · 2017-07-23T18:44:46Z

Yeah, that was actually on my list of things to do was use the default allocator as the Lua allocator, and do exactly that, panic on failure to allocate. I should have also mentioned the longjmp from rust problem as one of the remaining issues.

I thought about disabling gc during callbacks, but I had another thought while I was in the shower.

It could be possible to much more cheaply wrap Lua function calls by using thread local storage to store the arguments and return values to protected versions of calls, and if that's possible I could maybe use macros to much more cheaply (in both runtime and programmer time) protected call wrap things? That, combined with stack_guard, could mean that I would just call a very large number of protected calls in most API entry points.

MAYBE that would be okay?

jonas-schievink · 2017-07-23T18:57:58Z

Possibly. I'm not sure how exactly that would look, though. Anyways, closing this issue since it was fixed!

jonas-schievink changed the title ~~Undefined behaviour in LuaTable::set~~ Segfault in LuaTable::set Jul 21, 2017

jonas-schievink mentioned this issue Jul 22, 2017

Current __gc logic for userdata is unsound #19

Closed

jonas-schievink closed this as completed Jul 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault in LuaTable::set #18

Segfault in LuaTable::set #18

jonas-schievink commented Jul 21, 2017 •

edited

jonas-schievink commented Jul 21, 2017 •

edited

kyren commented Jul 21, 2017

jonas-schievink commented Jul 21, 2017 •

edited

jonas-schievink commented Jul 22, 2017

kyren commented Jul 23, 2017

jonas-schievink commented Jul 23, 2017

kyren commented Jul 23, 2017

kyren commented Jul 23, 2017 •

edited

jonas-schievink commented Jul 23, 2017

kyren commented Jul 23, 2017

kyren commented Jul 23, 2017

kyren commented Jul 23, 2017

jonas-schievink commented Jul 23, 2017

jonas-schievink commented Jul 23, 2017

jonas-schievink commented Jul 23, 2017

kyren commented Jul 23, 2017

kyren commented Jul 23, 2017 •

edited

jonas-schievink commented Jul 23, 2017

kyren commented Jul 23, 2017 •

edited

jonas-schievink commented Jul 23, 2017

kyren commented Jul 23, 2017

kyren commented Jul 23, 2017

kyren commented Jul 23, 2017

jonas-schievink commented Jul 23, 2017

kyren commented Jul 23, 2017

jonas-schievink commented Jul 23, 2017

Segfault in LuaTable::set #18

Segfault in LuaTable::set #18

Comments

jonas-schievink commented Jul 21, 2017 • edited

jonas-schievink commented Jul 21, 2017 • edited

kyren commented Jul 21, 2017

jonas-schievink commented Jul 21, 2017 • edited

jonas-schievink commented Jul 22, 2017

kyren commented Jul 23, 2017

jonas-schievink commented Jul 23, 2017

kyren commented Jul 23, 2017

kyren commented Jul 23, 2017 • edited

jonas-schievink commented Jul 23, 2017

kyren commented Jul 23, 2017

kyren commented Jul 23, 2017

kyren commented Jul 23, 2017

jonas-schievink commented Jul 23, 2017

jonas-schievink commented Jul 23, 2017

jonas-schievink commented Jul 23, 2017

kyren commented Jul 23, 2017

kyren commented Jul 23, 2017 • edited

jonas-schievink commented Jul 23, 2017

kyren commented Jul 23, 2017 • edited

jonas-schievink commented Jul 23, 2017

kyren commented Jul 23, 2017

kyren commented Jul 23, 2017

kyren commented Jul 23, 2017

jonas-schievink commented Jul 23, 2017

kyren commented Jul 23, 2017

jonas-schievink commented Jul 23, 2017

jonas-schievink commented Jul 21, 2017 •

edited

jonas-schievink commented Jul 21, 2017 •

edited

jonas-schievink commented Jul 21, 2017 •

edited

kyren commented Jul 23, 2017 •

edited

kyren commented Jul 23, 2017 •

edited

kyren commented Jul 23, 2017 •

edited