Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bin2llvmir produces incorrect LLVM IR when decompiling a simple function in an object file #480

Closed
sdasgup3 opened this issue Jan 31, 2019 · 5 comments

Comments

@sdasgup3
Copy link

sdasgup3 commented Jan 31, 2019

Hello Team,
I am trying to decompile a simple function

void reverseArray(int arr[], int start, int end) 
{ 
    while (start < end) 
    { 
        int temp = arr[start];  
        arr[start] = arr[end]; 
        arr[end] = temp; 
        start++; 
        end--; 
    }  
} 
gcc -m32 -g -O0  ~/Junk/XXX/test_1.c  -c -o ~/Junk/XXX/test_1.o

using

sudo docker run --rm -v ${HOME}/Junk/XXX/:${HOME}/Junk/XXX/destination  retdec retdec-decompiler.py ${HOME}/Junk/XXX/destination/test_1.o

And the output I got is

source_filename = "test"
target datalayout = "e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"

define i32 @rvereseArray() local_unnamed_addr {
dec_label_pc_0:
  %v0_6 = call i32 @__x86.get_pc_thunk.ax()
  ret i32 %v0_6
}

define i32 @__x86.get_pc_thunk.ax() local_unnamed_addr {
dec_label_pc_6f:
  %tmp = call i32 @__decompiler_undefined_function_0()
  ret i32 %tmp
}

declare i32 @__decompiler_undefined_function_0() local_unnamed_addr

I am sure that I must be missing something. Can somebody help?
My goal is to play around with the variable recovery feature (like stack variable getting lifted to
allocas ).

@s3rvac s3rvac changed the title Need help/guidance in running the tool bin2llvmir produces incorrect LLVM IR when decompiling a simple function in an object file Jan 31, 2019
@s3rvac
Copy link
Member

s3rvac commented Jan 31, 2019

I was able to reproduce the issue on my system and it seems to be a bug in bin2llvmir. @PeterMatula, can you please verify?

@sdasgup3
Copy link
Author

sdasgup3 commented Mar 17, 2019

Any update on this would be very useful. @s3rvac @PeterMatula

@PeterMatula
Copy link
Collaborator

Well, I can verify it is a problem, we do not handle object files very well at the moment.

I have tasked one of our students to look into objects files decompilation and try to make it better. He will start with #201, but should get around to this as well.

@PeterMatula
Copy link
Collaborator

The problem actually looks very similar to #201. This is our DSM output:

; section: .text
; function: reverseArray at 0x0 -- 0xb
0x0:     55               	push ebp
0x1:     89 e5            	mov ebp, esp
0x3:     83 ec 10         	sub esp, 0x10
0x6:     e8 64 00 00 00   	call 0x6f <__x86.get_pc_thunk.ax>
; data inside code section at 0xb -- 0xc
0xb:     05                                                 |.               |
; dynamically linked function: _GLOBAL_OFFSET_TABLE_ at 0xc -- 0x6f
; section: .text.__x86.get_pc_thunk.ax
; function: __x86.get_pc_thunk.ax at 0x6f -- 0x73
0x6f:    8b 04 24         	mov eax, dword ptr [esp]
0x72:    c3               	ret 

The code of the function did not even got disassembled because of that misdetection of _GLOBAL_OFFSET_TABLE_ at 0xc as a dynamically linked function - the same thing happens in #201. The problem is that in these cases, we handle relocations incorrectly - we make false assumptions about positions of external functions.

PeterMatula added a commit to avast/retdec-regression-tests that referenced this issue Jun 5, 2019
@PeterMatula
Copy link
Collaborator

Fixed by #577. The current output is:

// Address range: 0x0 - 0x6f
int32_t reverseArray(int32_t a1, int32_t result, uint32_t a3) {
    // 0x0
    int32_t v1; // ebp
    __x86_get_pc_thunk_ax(v1);
    if (result >= a3) {
        // 0x6c
        return result;
    }
    int32_t * v2 = (int32_t *)(4 * result + a1); // bp+21
    int32_t * v3 = (int32_t *)(4 * a3 + a1); // bp+44
    *v2 = *v3;
    *v3 = *v2;
    int32_t result2 = result + 1;
    while (result2 < a3 - 1) {
        // 0x12
        a3--;
        result = result2;
        v2 = (int32_t *)(4 * result + a1);
        v3 = (int32_t *)(4 * a3 + a1);
        *v2 = *v3;
        *v3 = *v2;
        result2 = result + 1;
    }
    // 0x6c
    return result2;
}

Work with arrays is not very beautiful, but that could be solved in a dedicated issue. The primary problem with relocations in ELF object files was solved.

Test added in avast/retdec-regression-tests@e132495.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants