Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cmd/casm-inspect disasm utility #183

Merged
merged 2 commits into from
Feb 23, 2024
Merged

add cmd/casm-inspect disasm utility #183

merged 2 commits into from
Feb 23, 2024

Conversation

quasilyte
Copy link
Contributor

@quasilyte quasilyte commented Jan 17, 2024

This is a tool I made while studying the casm bytecode format. Since making thoth disassembler work requires some extra steps, I figured it would be handy to have a version that relies on our assembler package and supports the exact versions of input files that are relevant to this project.

I tried to produce the correct Cairo0 program when disassembling. If done carefully (and with metadata provided from the compiled json file), we can use it to test the assembler in an encode-decode style (basically we can use the disassembler output as an assembler parser input).

Given this cairo0 source file:

%builtins output

from starkware.cairo.common.serialize import serialize_word

func div2(x: felt) -> felt {
    return x / 2;
}

func main{output_ptr: felt*}() {
    alloc_locals;
    local x = 42;
    local y = x + 1;
    local z = div2(x);
    if (y == 0) {
        serialize_word(z);
    } else {
        serialize_word(y);
    }
    ret;
}

And a compiled casm bytecode produced from it (output.json), we can disassemble it into the following:

// func entry pc=0
// [fp-3] => word: felt
// [fp-4] => output_ptr: felt* (implicit arg)
func starkware.cairo.common.serialize.serialize_word{output_ptr: felt*}(word: felt) {
    assert [fp-3] = [[fp-4]];
    assert [ap] = [fp-4] + 1, ap++;
    ret;
}
// func entry pc=4
// [fp-3] => x: felt
func div2(x: felt) -> felt {
    assert [ap] = [fp-3] * 1809251394333065606848661391547535052811553607665798349986546028067936010241, ap++; // div 2
    ret;
}
// func entry pc=7
// [fp-3] => output_ptr: felt* (implicit arg)
func main{output_ptr: felt*}() {
    nop; // alloc_locals; ap += 3
    assert [fp] = 42;
    assert [fp+1] = [fp] + 1;
    assert [ap] = [fp], ap++;
    call rel -10; // func div2; ap += 2
    assert [fp+2] = [ap-1];
    jmp rel 8 if [fp+1] != 0; // targets L1
    assert [ap] = [fp-3], ap++;
    assert [ap] = [fp+2], ap++;
    call rel -21; // func starkware.cairo.common.serialize.serialize_word; ap += 2
    jmp rel 6; // targets L3
  L1:
    assert [ap] = [fp-3], ap++;
    assert [ap] = [fp+1], ap++;
    call rel -27; // func starkware.cairo.common.serialize.serialize_word; ap += 2
  L3:
    ret;
}

This disassembler annotates some lines with recognized patterns like division operations (note the comment to the right of the big number multiplication instruction). It does not include any hints-related information (yet?)

@quasilyte quasilyte force-pushed the quasilyte_casm_disasm branch 2 times, most recently from a486bc3 to 83beaaa Compare January 17, 2024 06:52
@quasilyte quasilyte requested a review from cicr99 January 17, 2024 07:00
@quasilyte
Copy link
Contributor Author

The Cairo1 compiler doesn't provide the same amount of debug info (e.g. identifiers) in its casm output anymore.
It will be impossible to do a human-readable disassembly like that.
Without real func names, the best thing we can do is to assign some auto names like func1, func2 to every location.

@quasilyte
Copy link
Contributor Author

This is how thoth utility works with a new Cairo compiler casm files:

// Function 0
func unknown_function{}()

offset 0:          JNZ                 7                   # JMP 7             
offset 2:          ASSERT_EQ           [AP], [FP-6] + 0x100000000000000000000000000000000
offset 2:          ADD                 AP, 1               
offset 4:          ASSERT_EQ           [AP-1], [[FP-8]]    
offset 5:          JUMP_REL            140                 # JMP 145           
offset 7:          ASSERT_EQ           [FP-6], [AP] + 0    
offset 7:          ADD                 AP, 1

Note the "unknown function" that is used for the entire file.
None of the cairo1 test files provided by thoth have any identifiers info either:

thoth local token_bridge.casm.json -b | grep CALL
offset 34:         CALL                rel (3180)          
offset 57:         CALL                rel (1053)          
offset 84:         CALL                rel (1058)          
offset 149:        CALL                rel (3180)

@rodrigo-pino
Copy link
Contributor

rodrigo-pino commented Jan 22, 2024

It looks real good and I think the community can benefit from it as well. The disassembler currently is outputting Cairo Zero, what do you think of outputting CASM instead. That means making a much more simplified output where functions don't exists, but information could be still be kept in the form of comments.

Also notice that assert does not exist in CASM. [ap] = [fp + 1] and assert [ap] = [fp + 1] are the same in the current context.

This would allow us to have a bidirectional pipeline from assembler to disassembler since our Assembler currently handles CASM and nothing more complicated than that

@quasilyte quasilyte changed the title add cmd/casm-inspect disasm utility WIP: add cmd/casm-inspect disasm utility Jan 24, 2024
@quasilyte quasilyte force-pushed the quasilyte_casm_disasm branch 5 times, most recently from 1e9cff3 to 285c2dd Compare January 29, 2024 12:26
@quasilyte quasilyte changed the title WIP: add cmd/casm-inspect disasm utility add cmd/casm-inspect disasm utility Jan 30, 2024
pkg/disasm/casm.go Outdated Show resolved Hide resolved
Copy link
Contributor

@cicr99 cicr99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good!
Should we add the steps to build the binary to use these commands in the makefile? A readme section could be useful too for other users

cmd/casm-inspect/disasm.go Show resolved Hide resolved
cmd/casm-inspect/inst_fields.go Show resolved Hide resolved
This is a tool I made while studying the casm bytecode format.
Since making thoth disassembler work requires some extra steps,
I figured it would be handy to have a version that relies on our
assembler package and supports the exact versions of input
files that are relevant to this project.

I tried to produce the correct Cairo0 program when disassembling.
If done carefully (and with metadata provided from the compiled json file),
we can use it to test the assembler in an encode-decode style
(basically we can use the disassembler output as an assembler parser input).

Given this cairo0 source file:

```cairo
%builtins output

from starkware.cairo.common.serialize import serialize_word

func div2(x: felt) -> felt {
    return x / 2;
}

func main{output_ptr: felt*}() {
    alloc_locals;
    local x = 42;
    local y = x + 1;
    local z = div2(x);
    if (y == 0) {
      serialize_word(z);
    } else {
      serialize_word(y);
    }
    ret;
}
```

And a compiled casm bytecode produced from it (output.json), we can disassemble it into the following:

```casm
// func entry pc=0
// [fp-3] => word: felt
// [fp-4] => output_ptr: felt* (implicit arg)
func starkware.cairo.common.serialize.serialize_word{output_ptr: felt*}(word: felt) {
    assert [fp-3] = [[fp-4]];
    assert [ap] = [fp-4] + 1, ap++;
    ret;
}
// func entry pc=4
// [fp-3] => x: felt
func div2(x: felt) -> felt {
    assert [ap] = [fp-3] * 1809251394333065606848661391547535052811553607665798349986546028067936010241, ap++; // div 2
    ret;
}
// func entry pc=7
// [fp-3] => output_ptr: felt* (implicit arg)
func main{output_ptr: felt*}() {
    nop; // alloc_locals; ap += 3
    assert [fp] = 42;
    assert [fp+1] = [fp] + 1;
    assert [ap] = [fp], ap++;
    call rel -10; // func div2; ap += 2
    assert [fp+2] = [ap-1];
    jmp rel 8 if [fp+1] != 0; // targets L1
    assert [ap] = [fp-3], ap++;
    assert [ap] = [fp+2], ap++;
    call rel -21; // func starkware.cairo.common.serialize.serialize_word; ap += 2
    jmp rel 6; // targets L3
  L1:
    assert [ap] = [fp-3], ap++;
    assert [ap] = [fp+1], ap++;
    call rel -27; // func starkware.cairo.common.serialize.serialize_word; ap += 2
  L3:
    ret;
}
```

This disassembler annotates some lines with recognized patterns like division operations.
It does not include any hints-related information (yet?)
@quasilyte
Copy link
Contributor Author

PTAL

@cicr99 cicr99 merged commit 945b885 into main Feb 23, 2024
4 checks passed
@cicr99 cicr99 deleted the quasilyte_casm_disasm branch February 23, 2024 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants