Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM unreachable instruction #11

Closed
hudson-ayers opened this issue Sep 25, 2020 · 7 comments
Closed

LLVM unreachable instruction #11

hudson-ayers opened this issue Sep 25, 2020 · 7 comments

Comments

@hudson-ayers
Copy link
Contributor

While symbolically executing a function, Haybale threw the following error:

'UnreachableInstruction`: Reached an LLVM 'Unreachable' instruction

Should I interpret this to mean the code under analysis is somehow invalid?

@cdisselkoen
Copy link
Collaborator

This means that LLVM doesn't think the instruction should be reachable, based on LLVM's own semantics; but Haybale was able to reach it. That's unexpected, as Haybale is more-or-less intended to provide exactly LLVM's semantics. I think it's safe to assume that any LLVM code generated by a production compiler (clang, rustc, etc) is valid, so the problem is somewhere in Haybale or maybe in whatever code you have that's sitting on top of Haybale. Off the top of my head here's a couple possible causes:

  • A bug in Haybale
  • There's some known limitations with how Haybale handles LLVM Invoke/Resume that may cause Haybale to explore paths which LLVM thinks aren't possible. If this is the case in your example, then the path with this error should just be ignored.
  • A call to an external function that never returns (e.g., a function related to panic handling in whatever system/language you're analyzing), but the hook in your Config returned something other than ReturnValue::Abort
  • Maybe LLVM has a function parameter marked nonnull, and therefore thinks some code is unreachable because the parameter would have to be null in order to get there. Haybale doesn't currently pay attention to the nonnull attribute

@hudson-ayers
Copy link
Contributor Author

Thanks for your response! I will look into this some more and see if I can figure out which of these causes is responsible.

@hudson-ayers
Copy link
Contributor Author

I dug into one of the examples where I was hitting this issue. here is the function being executed (disable()):

#[derive(Copy, Clone, Debug)]
pub enum Clock {
    HSB(HSBClock),
    PBA(PBAClock),
    PBB(PBBClock),
    PBC(PBCClock),
    PBD(PBDClock),
}

impl ClockInterface for Clock {
    fn disable(&self) {
        match self {
            &Clock::HSB(v) => mask_clock!(HSB_MASK_OFFSET: hsbmask & !(1 << (v as u32))),
            &Clock::PBA(v) => mask_clock!(PBA_MASK_OFFSET: pbamask & !(1 << (v as u32))),
            &Clock::PBB(v) => mask_clock!(PBB_MASK_OFFSET: pbbmask & !(1 << (v as u32))),
            &Clock::PBC(v) => mask_clock!(PBC_MASK_OFFSET: pbcmask & !(1 << (v as u32))),
            &Clock::PBD(v) => mask_clock!(PBD_MASK_OFFSET: pbdmask & !(1 << (v as u32))),
        }
    }
}

And here is the LLVM IR generated for that function:

; Function Attrs: minsize nofree norecurse nounwind optsize
define internal fastcc void @"_ZN75_$LT$sam4l..pm..Clock$u20$as$u20$kernel..platform..chip..ClockInterface$GT$7disable17h5694fc505bd2f03dE"({ i8, i8 }* noalias nocapture noundef readonly align 1 dereferenceable(2) %0) unnamed_addr #8 !dbg !77370 {
  call void @llvm.dbg.value(metadata { i8, i8 }* %0, metadata !77372, metadata !DIExpression()), !dbg !77393
  %2 = getelementptr inbounds { i8, i8 }, { i8, i8 }* %0, i32 0, i32 0, !dbg !77394
  %3 = load i8, i8* %2, align 1, !dbg !77394, !range !77395
  %4 = zext i8 %3 to i32, !dbg !77394
  switch i32 %4, label %5 [
    i32 0, label %6
    i32 1, label %12
    i32 2, label %18
    i32 3, label %24
    i32 4, label %30
  ], !dbg !77396

5:                                                ; preds = %1
  unreachable, !dbg !77394

6:                                                ; preds = %1
  %7 = getelementptr inbounds { i8, i8 }, { i8, i8 }* %0, i32 0, i32 1, !dbg !77397
  %8 = load i8, i8* %7, align 1, !dbg !77397, !range !77398
...

Haybale is reaching the "unreachable" in basic block 5. As you can see, LLVM seems to be using basic block 5 as the default label for the switch statement, as it should be impossible for the input integer (%4) to be anything other 0-4 based on the definition of the enum. However, Haybale is apparently unaware of this constraint on the input integer, and thus considering bb %5 as a reachable path. Notably, I have not tried executing Haybale on just this function, I am reaching it as part of a larger execution (not sure if that matters). Any thoughts on why this might be happening?

@hudson-ayers
Copy link
Contributor Author

I tried just executing this method directly and get the same result

@cdisselkoen
Copy link
Collaborator

There is nothing in the LLVM IR (other than the unreachable itself) that communicates the restriction that %4 must be in the range 0-4. So I don't see how Haybale could know this. As Haybale is designed to follow LLVM IR semantics, Haybale is correct in reporting that bb %5 is reachable.

This seems to be a compelling example to motivate squashing the UnreachableInstruction errors in your code. We could add a setting to Haybale to have it squash them itself, but then that would raise the question should we have a similar setting for all the other error types, or perhaps a user-defined lambda that takes an error and returns a bool whether to squash it? That quickly becomes a slippery slope.

It seems much simpler to me to leave this outside of the scope of Haybale. Haybale iterates over all the paths in the LLVM IR, which includes this one; and it's up to Haybale's caller to decide what to do with each path. Callers are free to do anything they want with paths that end in errors, based on the particular error type or any other information they might know. In your case, I might recommend that your calling code just ignore paths that resulted in UnreachableInstruction because they are impossible (assuming that this example generalizes).

@hudson-ayers
Copy link
Contributor Author

Thanks, this makes a lot of sense. I will try to take a look at a couple more examples to confirm this generalizes, then go forward with ignoring those paths.

@hudson-ayers
Copy link
Contributor Author

Ignoring these paths has been sufficient for my purposes, thanks for the guidance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants