Removing assertion while reading/appending dwarf #5553

thaystg · 2023-01-09T20:53:32Z

While loading dwarf from this wasm file attached here in dotnet.zip, I got this error:

thread 'main' panicked at 'assertion failed: range_start < range_end', crates\cranelift\src\debug\transform\expression.rs:689:13

I tried to remove the assert and check if without it the debugging experience on lldb works correctly and as far as I saw in my tests, everything is working fine after removing the assert.

dotnet.zip

If you think this is not the correct fix, please let me know, I can close the PR and open an issue.

jameysharp · 2023-01-09T21:27:38Z

This bug has previously been reported as #3999 as well as several duplicates, but we haven't had any suggestions for how to fix it before. Thanks for this PR!

I don't know whether removing this assert is correct. It might be! But I think the effect is that the for i in (i..j).rev() loop below doesn't run in that case because j will also be less than i. I'm suspicious that removing the assert may just be hiding the bug.

As noted in other issues about this bug:

We currently don't have anyone on the project who understands our DWARF-handling code well and has time to work on it; so while this is definitely a bug, it's not likely to have a fast resolution. We do have an intent to have someone eventually focus on this, as priorities allow, so we should keep this issue open. Just wanted to give some context on the current situation...

thaystg · 2023-01-10T15:42:43Z

Minimal repro:

#include <stdio.h>

void
sample_func (int type, int parm)
{
    printf("sample_func\n");
    switch (type){
    case 1:
        if (type + parm!= 50)
            break;
    case 2:
        if (type + parm == 10 || type + parm == 15)
            return;
        break;
    default:
        break;
    }
}

int main() {
    int i = 2;
    sample_func (i, i);
    return 0;
}

Steps to Reproduce

wasi-sdk-14.0/bin/clang -g -O0 test.c -o test.wasm
lldb -- wasmtime run -g test.wasm

thaystg · 2023-01-10T16:15:01Z

This bug has previously been reported as #3999 as well as several duplicates, but we haven't had any suggestions for how to fix it before. Thanks for this PR!

I don't know whether removing this assert is correct. It might be! But I think the effect is that the for i in (i..j).rev() loop below doesn't run in that case because j will also be less than i. I'm suspicious that removing the assert may just be hiding the bug.

As noted in other issues about this bug:

We currently don't have anyone on the project who understands our DWARF-handling code well and has time to work on it; so while this is definitely a bug, it's not likely to have a fast resolution. We do have an intent to have someone eventually focus on this, as priorities allow, so we should keep this issue open. Just wanted to give some context on the current situation...

Can't we just print a warning, remove the assertion and exit the function?
It would be better then assertion probably, because we are still able to debug if we skip this "wrong" label.

jameysharp · 2023-01-10T21:30:44Z

Can't we just print a warning, remove the assertion and exit the function? It would be better then assertion probably, because we are still able to debug if we skip this "wrong" label.

@alexcrichton and @fitzgen, do either of you have opinions on this suggestion?

If you can spare some more time to investigate this bug, I'd be happier if we better understood its cause.

Looking at process_label, I see that it's FunctionFrameInfo::value_ranges which has an invalid range for some label.
As far as I can tell, the values in that map are only set by get_function_frame_info.
That comes from CompiledFunction::value_labels_ranges in the wasmtime-cranelift crate.
That comes from CompiledCodeBase::value_labels_ranges in the cranelift-codegen crate.
That comes from target-specific code implementing TargetIsa::compile_function.
Every backend implements that by calling VCode::emit and getting the EmitResult::value_labels_ranges field.
That's computed by VCode::compute_value_labels_ranges.

At that point there's a somewhat complex interaction between the inst_offsets array constructed in cranelift-codegen and the debug_locations array constructed by regalloc2.

I think a good first troubleshooting step would be to assert that from_offset < to_offset before pushing a new ValueLocRange in VCode::compute_value_labels_ranges. Hopefully that assert fails instead of the assert in process_label; otherwise we'd need to investigate more to understand where these ranges are coming from.

Then we could check why these ranges are out of order. I think the only possibilities are either that regalloc.debug_locations has pairs that are out of order, or inst_offsets is out of order. If it's the former then we can switch to investigating regalloc2; otherwise we can look at VCode::emit.

Either way we'd know more than we know now and be closer to fixing this particular bug.

fitzgen · 2023-01-10T22:20:33Z

Can't we just print a warning, remove the assertion and exit the function? It would be better then assertion probably, because we are still able to debug if we skip this "wrong" label.

@alexcrichton and @fitzgen, do either of you have opinions on this suggestion?

I am generally very suspicious of sweeping-the-bugs-under-the-rug kind of thing. If it was a bug that was purely within the DWARF transform, then I would be more amenable to it since that code as-is has no future and needs a rewrite, but since it looks like this is bottoming out in Cranelift, I think it would be better to debug the Cranelift issue like you laid out.

alexcrichton · 2023-01-11T14:51:30Z

I personally know so little about the DWARF transformations Wasmtime does that I don't think I'd have much to offer over what @jameysharp has already mentioned.

pavelsavara · 2023-01-11T15:19:03Z

The dwarf parser in chrome doesn't fail on it.

Speculations (I know little about all this yet):
Maybe if we add assert earlier in the wasmtime pipeline, to validate input dwarf, we could learn that the flipped start/end is in the input by LLVM ?
And maybe it's how it should be, for break, which is a jmp to label, but in wasm it's some nested block for some reason ?

jameysharp · 2023-01-11T16:47:26Z

The dwarf parser in chrome doesn't fail on it.

Speculations (I know little about all this yet): Maybe if we add assert earlier in the wasmtime pipeline, to validate input dwarf, we could learn that the flipped start/end is in the input by LLVM ? And maybe it's how it should be, for break, which is a jmp to label, but in wasm it's some nested block for some reason ?

That sounds possible to me but, like you, I know little about all of this.

I think it's unlikely, if I'm reading the Wasmtime/Cranelift source correctly. I think the point of the process_label function is to combine Cranelift's data with the input DWARF from LLVM, but I think this assert is only looking at data that was generated by Cranelift and regalloc2, not from the input DWARF. So I think it's most likely that this is detecting a bug in Cranelift or regalloc2.

brendandburns · 2023-01-12T18:53:12Z

fwiw, when I kicked on this a little bit, llvm-dwarfdump is perfectly happy to read out the dwarf data, so at some level, the data in the wasm file is "legal" as far as llvm is concerned, but I will also mention that when I tried exactly this fix, it just caused a panic in a different place (for my WASM file at least)

squillace · 2023-01-28T17:57:05Z

I've asked @thaystg to have a look at this again if she has time, @jameysharp, per our conversation the other day about debugging with the crew.

william-stacken · 2023-03-23T09:33:16Z

fwiw, when I kicked on this a little bit, llvm-dwarfdump is perfectly happy to read out the dwarf data, so at some level, the data in the wasm file is "legal" as far as llvm is concerned, but I will also mention that when I tried exactly this fix, it just caused a panic in a different place (for my WASM file at least)

Same here, removing the assertion did not fix the problem, but changing the if statement above to continue if range_start is greater than range_end (not just when it is equal) did fix it and I am able to finally debug my Wasm module. Given that this bug is severe and that developers have complained about it for a long time, and that no one knows the cause making this bug difficult to fix for a long time, I think that it should be "swept-under-the-rug" ASAP. Then the root cause for the bug can be fixed later. At least this is better than debugging Wasm modules not being possible.

bjorn3 · 2023-03-23T09:42:31Z

but changing the if statement above to continue if range_start is greater than range_end (not just when it is equal) did fix it and I am able to finally debug my Wasm module.

I'm pretty sure that can make it impossible to single step at several places and will cause incorrect assignments from machine instructions to source lines.

william-stacken · 2023-03-28T07:45:23Z

but changing the if statement above to continue if range_start is greater than range_end (not just when it is equal) did fix it and I am able to finally debug my Wasm module.

I'm pretty sure that can make it impossible to single step at several places and will cause incorrect assignments from machine instructions to source lines.

I was able to both single step and set breakpoints, and I didn't observe any strange instruction to source line assignments. Now I know absolutely nothing about dwarf, and I'm sure you're right that there will be bugs. But from my perspective an unstable debugger is better than no debugger at all. At least for the time being.

alexcrichton · 2023-10-25T20:45:52Z

In the time since this PR was opened I think #6931 addresed the underlying issue. I think this is no longer necessary so I'm going to close this, but please let me know if I'm wrong.

Removing assert

f115ab4

thaystg changed the title ~~Removing assert while reading/appending dwarf~~ Removing assertion while reading/appending dwarf Jan 9, 2023

fitzgen mentioned this pull request Apr 10, 2023

Removes debug assertion that was related to issue 796 #6175

Merged

dbanks12 mentioned this pull request Apr 24, 2023

Assertion in debugger mode (left < right) #3999

Closed

alexcrichton closed this Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removing assertion while reading/appending dwarf #5553

Removing assertion while reading/appending dwarf #5553

thaystg commented Jan 9, 2023 •

edited

jameysharp commented Jan 9, 2023

thaystg commented Jan 10, 2023 •

edited

thaystg commented Jan 10, 2023

jameysharp commented Jan 10, 2023

fitzgen commented Jan 10, 2023

alexcrichton commented Jan 11, 2023

pavelsavara commented Jan 11, 2023

jameysharp commented Jan 11, 2023

brendandburns commented Jan 12, 2023

squillace commented Jan 28, 2023

william-stacken commented Mar 23, 2023 •

edited

bjorn3 commented Mar 23, 2023

william-stacken commented Mar 28, 2023

alexcrichton commented Oct 25, 2023

Removing assertion while reading/appending dwarf #5553

Removing assertion while reading/appending dwarf #5553

Conversation

thaystg commented Jan 9, 2023 • edited

jameysharp commented Jan 9, 2023

thaystg commented Jan 10, 2023 • edited

thaystg commented Jan 10, 2023

jameysharp commented Jan 10, 2023

fitzgen commented Jan 10, 2023

alexcrichton commented Jan 11, 2023

pavelsavara commented Jan 11, 2023

jameysharp commented Jan 11, 2023

brendandburns commented Jan 12, 2023

squillace commented Jan 28, 2023

william-stacken commented Mar 23, 2023 • edited

bjorn3 commented Mar 23, 2023

william-stacken commented Mar 28, 2023

alexcrichton commented Oct 25, 2023

thaystg commented Jan 9, 2023 •

edited

thaystg commented Jan 10, 2023 •

edited

william-stacken commented Mar 23, 2023 •

edited