Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

citybound release crashes with invalid instruction (ud2) some times on mac, nightly-2017-12-12 (and 12-15) #249

Closed
bogdad opened this Issue Dec 23, 2017 · 8 comments

Comments

Projects
None yet
3 participants
@bogdad
Copy link
Contributor

bogdad commented Dec 23, 2017

Hi!

Stumbled into kind of rust compiler believes citybound code is undefined behaviour for some reason. Hope there is something wrong with my setup, otherwise it would be a hard but interesting thing to track down )

so the bug:
creating some roads, then spawning cars on mac soon results in:

transfer lane not connected for obstacles yet
[1]    42607 illegal hardware instruction  target/release/citybound

running under release lldb:

~/citybound master > lldb -- target/release/citybound
(lldb) run
All mapped
2017-12-16 12:42:17.346005+0100 citybound[42500:3746908] Month 13 is out of bounds
2017-12-16 12:42:17.405368+0100 citybound[42500:3746908] MessageTracer: load_domain_whitelist_search_tree:73: Search tree file's format version number (0) is not supported
2017-12-16 12:42:17.405397+0100 citybound[42500:3746908] MessageTracer: Falling back to default whitelist
2017-12-16 12:42:17.774650+0100 citybound[42500:3746908] Month 13 is out of bounds
Process 42500 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x00000001001482a5 citybound`kay::actor_system::ActorSystem::add_spawner::_$u7b$$u7b$closure$u7d$$u7d$::h0ad618cd037e08ef + 37
citybound`kay::actor_system::ActorSystem::add_spawner::_$u7b$$u7b$closure$u7d$$u7d$::h0ad618cd037e08ef:
->  0x1001482a5 <+37>: ud2
    0x1001482a7 <+39>: ud2
    0x1001482a9 <+41>: nopl   (%rax)

the previous instruction being

    0x1001474c0 <+32>: callq  0x10019a980               ; citybound::core::simulation::kay_auto::SimulationID::wake_up_in::h30045977a0c501c4

I think rust compiler believes that everything after the call to wake_up_in would result in undefined bahaviour.

compiling with asm and mir output to get the source of the closure:

RUSTFLAGS=" --emit asm --emit=llvm-ir" cargo build --release

then grepping for the closure:

cd target/release
rg 'h0ad618cd037e08ef'
>
deps/citybound-47bb228c87096c93.citybound3-504d2d731c62cd7d7aeab2b09e40774b.rs.rcgu.s
88340:__ZN3kay12actor_system11ActorSystem11add_spawner28_$u7b$$u7b$closure$u7d$$u7d$17h0ad618cd037e08efE:
// reveals the asm location
less +88340 deps/citybound-47bb228c87096c93.citybound3-504d2d731c62cd7d7aeab2b09e40774b.rs.rcgu.s

which then has the reference to the file:

.file   76 "/Users/.../citybound/game/economy/households/family/mod.rs"

move_into method

in the move_into method there is a offers vec

 vec![
                Offer::new(
                    MemberIdx(0),
                    TimeOfDayRange::new(16, 0, 11, 0),
                    Deal::new(Some((Awakeness, 3.0)), Duration::from_hours(1)),
                    1,
                    true
                ),
            ].into(),

which gets converted to CVec.

altering the source to have this vec![] empty makes rust and llvm remove the undefined behaviour (ud2 instruction immedately after the call to wake_up_in) and generating the code after wake_up_in.

Unfortunately, this is all that I had time to dig, hope that helps!
I'll try to continue in the meantime.

@o01eg

This comment has been minimized.

Copy link
Contributor

o01eg commented Dec 24, 2017

I suppose illegal instruction is a Rust issue.

@bogdad

This comment has been minimized.

Copy link
Contributor Author

bogdad commented Dec 24, 2017

sure, the cool thing to do would be to construct a minimal failing example that generates ud2. Unfortunately I was not able to do this yet.

However it might be a good idea to merge the temporary fix pr which removes ud2 by altering citybound code until the rust issue is found and fixed.

@bogdad

This comment has been minimized.

Copy link
Contributor Author

bogdad commented Jan 7, 2018

Seems rust issue was fixed:
Rust does not generate ud2 on the nightly-2018-01-06-x86_64-apple-darwin!

After removing clippy from default features in cargo and compiling with nightly-2018-01-06-x86_64-apple-darwin, citybound just crashes with

WHAT HAPPENED:
called 'Option::unwrap()' on a 'None' value

WHERE IT HAPPENED:
at src/libcore/option.rs, line 335

WHERE EXACTLY:
stack backtrace:
   0:        0x1003be61e - backtrace::backtrace::trace::hcb77d5450f717954
                        at /Users/vladimirshakhov/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.4/src/backtrace/mod.rs:42
   1:        0x1003be7cc - backtrace::capture::Backtrace::new::hf5d05e981305cc2c
                        at /Users/vladimirshakhov/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.4/src/capture.rs:63
   2:        0x1000c8e91 - citybound::core::init::set_error_hook::_$u7b$$u7b$closure$u7d$$u7d$::h854c9fa81f7078b3
                        at game/core/init.rs:57
   3:        0x1003c8890 - std::panicking::rust_panic_with_hook::hae4c93fc49c58357
                        at src/libstd/panicking.rs:578
   4:        0x1003c86ae - std::panicking::begin_panic::h4476a0c2c8dc1921
                        at src/libstd/panicking.rs:538
   5:        0x1003c8603 - std::panicking::begin_panic_fmt::h6d8e3f0a8241814b
                        at src/libstd/panicking.rs:522
   6:        0x1003c8572 - rust_begin_unwind
                        at src/libstd/panicking.rs:498
   7:        0x1004197e3 - core::panicking::panic_fmt::h373df78cbc560455
                        at src/libcore/panicking.rs:71
   8:        0x1004196d6 - core::panicking::panic::h179f5d6549a61c91
                        at src/libcore/panicking.rs:51
   9:        0x10011f5b6 - citybound::economy::households::HouseholdCore::new::h90d0d26201781644
                        at game/economy/market/mod.rs:26
  10:        0x10018821b - kay::actor_system::ActorSystem::add_spawner::_$u7b$$u7b$closure$u7d$$u7d$::hb7a4734717c65dc4
                        at game/economy/households/family/mod.rs:35
  11:        0x1003b7540 - kay::actor_system::ActorSystem::single_message_cycle::h0d3e86a8c3b97cad
                        at engine/kay/src/actor_system.rs:281
  12:        0x1003b0ad7 - std::panicking::try::do_call::hd7c5e5365b368fa6
                        at /Users/travis/build/rust-lang/rust/src/libstd/panicking.rs:480
  13:        0x1003ea9ce - __rust_maybe_catch_panic
                        at src/libpanic_unwind/lib.rs:101
  14:        0x1003b77c0 - kay::actor_system::ActorSystem::process_all_messages::h9d4d0470de3dd372
                        at engine/kay/src/actor_system.rs:306
  15:        0x1000c81c6 - citybound::core::init::ensure_crossplatform_proper_thread::hda627220394b254c
                        at game/main.rs:136
  16:        0x1001d0725 - std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::hae3e40befb80ca18
                        at /Users/travis/build/rust-lang/rust/src/libstd/rt.rs:74
  17:        0x1003c84d7 - std::panicking::try::do_call::ha29de4e7d455f8f9
                        at src/libstd/panicking.rs:480
  18:        0x1003ea9ce - __rust_maybe_catch_panic
                        at src/libpanic_unwind/lib.rs:101
  19:        0x1003ceffc - std::rt::lang_start_internal::h2d5329fa569bad0f
                        at src/libstd/rt.rs:58
  20:        0x10027256b - main

and there is no ud2 in the asm!
Still think that that pr solves it
#250 don't know why yet.

@Paul-E

This comment has been minimized.

Copy link
Contributor

Paul-E commented Feb 8, 2018

This may not be limited to mac. On Ubuntu 17.10 I get:

* thread #1, name = 'citybound', stop reason = signal SIGILL: illegal instruction operand
frame #0: citybound`kay::actor_system::{{impl}}::add_spawner::{{closure}}  <citybound::economy::households::vegetable_farm::VegetableFarm,citybound::economy::households::vegetable_farm::kay_auto::MSG_VegetableFarm_move_into,closure> [inlined] citybound::economy::households::vegetable_farm::{{impl}}::move_into(id=<unavailable>, simulation=SimulationID @ 0x00007ffffff9ac98, world=<unavailable>) at iterator.rs:0

This occurs when I attempt to spawn cars after creating a small grid.

@bogdad

This comment has been minimized.

Copy link
Contributor Author

bogdad commented Feb 19, 2018

was able to build a smaller playground example

when run in release it fails with illegal instruction
https://play.rust-lang.org/?gist=9a0be5d961fb7c00ae79689b8382a50f&version=nightly&mode=release

when run in debug it produces correct output
https://play.rust-lang.org/?gist=9a0be5d961fb7c00ae79689b8382a50f&version=nightly&mode=debug

if println on line 346 is uncommented, it works in release also.

@o01eg

This comment has been minimized.

Copy link
Contributor

o01eg commented Feb 20, 2018

I suppose it should be reported as rust issue.

@Paul-E

This comment has been minimized.

Copy link
Contributor

Paul-E commented Feb 21, 2018

I was able to create a smaller failing example using the example @bogdad provided:

https://play.rust-lang.org/?gist=8ac595bd97b3cebd3c7df7beb9acfd14&version=nightly&mode=release

It looks like even running test.deref().len(); will cause the illegal instruction error.

Looking at the documentation for std::slice::from_raw_parts, which is used in CompactVec::deref, we see the text:

p must be non-null, even for zero-length slices, because non-zero bits are required to distinguish between a zero-length slice within Some() from None. p can be a bogus non-dereferencable pointer, such as 0x1, for zero-length slices, though.

But we have the (simplified here) code from PointerToMaybeCompact::ptr

pub unsafe fn ptr(&self) -> *const T {
    match self.inner {
        Inner::Free(ptr) => ptr,
        Inner::Uninitialized => ::std::ptr::null(),
    }
}

changing deref to

fn deref(&self) -> &[T] {
    if self.len == 0 {
        unsafe { ::std::slice::from_raw_parts(0x1 as *const T, self.len) }
    } else {
        unsafe { ::std::slice::from_raw_parts(self.ptr.ptr(), self.len) }
    }
}

Gives the same behavior in release and debug.

So I think that is our culprit.

@bogdad

This comment has been minimized.

Copy link
Contributor Author

bogdad commented Feb 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.