Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Capstone to handle endbr64 #877

Closed
wants to merge 1 commit into from
Closed

Update Capstone to handle endbr64 #877

wants to merge 1 commit into from

Conversation

jacob-baines
Copy link

In testing, I found that retdec doesn't handle endbr64.

To reproduce this issue, I used the standard gcc on Ubuntu 20.04. Here is the version information:

albinolobster@ubuntu:~/retdec$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.3.0-17ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) 

I then wrote the following program and named it test.c:

#include <stdio.h>
#include <stdlib.h>

int main(int p_argc, char* p_argv[])
{
    printf("🦞 hello world! 🦞\n");
    return EXIT_SUCCESS;
}

And compiled it:

albinolobster@ubuntu:~$ gcc -o test test.c
albinolobster@ubuntu:~$ ./test
🦞 hello world! 🦞
albinolobster@ubuntu:~$

Looking at the test in objdump, we can see the very first instruction in main is endbr64:

albinolobster@ubuntu:~$  objdump test -d

0000000000001149 <main>:
    1149:	f3 0f 1e fa          	endbr64 
    114d:	55                   	push   %rbp
    114e:	48 89 e5             	mov    %rsp,%rbp
    1151:	48 83 ec 10          	sub    $0x10,%rsp
    1155:	89 7d fc             	mov    %edi,-0x4(%rbp)
    1158:	48 89 75 f0          	mov    %rsi,-0x10(%rbp)
    115c:	48 8d 3d a1 0e 00 00 	lea    0xea1(%rip),%rdi        # 2004 <_IO_stdin_used+0x4>
    1163:	e8 e8 fe ff ff       	callq  1050 <puts@plt>
    1168:	b8 00 00 00 00       	mov    $0x0,%eax
    116d:	c9                   	leaveq 
    116e:	c3                   	retq   
    116f:	90                   	nop

However, after pushing test through retdec-decompiler (compiled from master on 10/30/2020) like so:

albinolobster@ubuntu:~/pub/retdec/build$ retdec-decompiler /home/albinolobster/test

The resulting test.c, test.ll, and test.dsm all contain erroneous (empty) main functions.

test.c

// Address range: 0x1149 - 0x114a
int main(int argc, char ** argv) {
    // 0x1149
    int64_t result; // 0x1149
    return result;
}

test.ll

define i64 @main(i64 %argc, i8** %argv) local_unnamed_addr {
dec_label_pc_1149:
  %0 = alloca i64
  %1 = load i64, i64* %0
  ret i64 %1
}

test.dsm

; function: main at 0x1149 -- 0x114a
**; data inside code section at 0x114a -- 0x114c**
0x114a:   0f 1e                                              |..              |
; function: function_114c at 0x114c -- 0x116f
0x114c:   fa                        	cli 
0x114d:   55                        	push rbp
0x114e:   48 89 e5                  	mov rbp, rsp
0x1151:   48 83 ec 10               	sub rsp, 0x10
0x1155:   89 7d fc                  	mov dword ptr [rbp - 4], edi
0x1158:   48 89 75 f0               	mov qword ptr [rbp - 0x10], rsi
0x115c:   48 8d 3d a1 0e 00 00      	lea rdi, [rip + 0xea1]
0x1163:   e8 e8 fe ff ff            	call 0x1050 <function_1050>
0x1168:   b8 00 00 00 00            	mov eax, 0
0x116d:   c9                        	leave 
0x116e:   c3                        	ret 

I assumed this was a capstone issue, and I found that they fixed endbr64 in 4.0.1 and further enhanced handling in the latest version 4.0.2.

I also found that retdec downloads an archive from the Capstone repository that dates back to November 2017 (nearly three years old now). By switching the CMake deps to download the most recent release (4.0.2), retdec now produces proper output. Sample output from an updated retdec/capstone:

test.c

// Address range: 0x1149 - 0x116f
int main(int argc, char ** argv) {
    // 0x1149
    __asm_endbr64();
    function_1050();
    return 0;
}

test.ll

define i64 @main(i64 %argc, i8** %argv) local_unnamed_addr {
dec_label_pc_1149:
  %0 = call i64 @__asm_endbr64(), !insn.addr !27
  %1 = call i64 @function_1050(), !insn.addr !28
  ret i64 0, !insn.addr !29
}

test.dsm

; function: main at 0x1149 -- 0x116f
0x1149:   f3 0f 1e fa               	endbr64 
0x114d:   55                        	push rbp
0x114e:   48 89 e5                  	mov rbp, rsp
0x1151:   48 83 ec 10               	sub rsp, 0x10
0x1155:   89 7d fc                  	mov dword ptr [rbp - 4], edi
0x1158:   48 89 75 f0               	mov qword ptr [rbp - 0x10], rsi
0x115c:   48 8d 3d a1 0e 00 00      	lea rdi, [rip + 0xea1]
0x1163:   e8 e8 fe ff ff            	call 0x1050 <function_1050>
0x1168:   b8 00 00 00 00            	mov eax, 0
0x116d:   c9                        	leave 
0x116e:   c3                        	ret 

In #557 @PeterMatula seems to indicate he isn't sure if an updated Capstone is possible, so my change here may have broken things I'm unaware of. Although it does fix my issues with default gcc output (on Ubuntu at least).

@PeterMatula
Copy link
Collaborator

Lets run TC tests.

@PeterMatula
Copy link
Collaborator

Looks like at the moment, our Linux TeamCity build cannot connect to github - linux-build failed, but it is not because of the change in this PR. I will try it later, or after the problem is fixed.

@h4sh5
Copy link

h4sh5 commented Nov 15, 2021

@PeterMatula its been a while, I have bumped into the same issue - can you rerun the TC builds and try to merge this PR?

h4sh5 added a commit to h4sh5/retdec that referenced this pull request Nov 15, 2021
@PeterMatula
Copy link
Collaborator

In #1124 we updated to Capstone 5.0-rc2. ENDBR32 and ENDBR64 are "handled" - translated to NOP at the moment.

@PeterMatula PeterMatula closed this Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants