Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repz lifting question #15

Closed
enkomio opened this issue May 13, 2019 · 6 comments
Closed

repz lifting question #15

enkomio opened this issue May 13, 2019 · 6 comments

Comments

@enkomio
Copy link
Contributor

enkomio commented May 13, 2019

Hi,

I noticed that if we consider the following instruction:

0x40102E:    F3 A4                          repz movsb

it is lifted to the following statements:

-------------ISMark (40102E, 2)-------------
if(ECX = 0x0:I32) then Jmp (Exit, 0) else Jmp (Continue, 1)
-------------LMark (Continue)-------------
[EDI] := [ESI]:I8
ESI := (ite (DF) ((ESI - 0x1:I32)) ((ESI + 0x1:I32)))
EDI := (ite (DF) ((EDI - 0x1:I32)) ((EDI + 0x1:I32)))
ECX := (ECX - 0x1:I32)
EIP := 0x40102E:I32
-------------LMark (Exit)-------------
EIP := 0x401030:I32
-------------IEMark (401030)-------------

This is conceptually correct but it is a bit weird, it is like if we are executing something as

jmp $

until some condition happens (in this case ECX = 0). Would it be better to have something like this one?

-------------ISMark (40102E, 2)-------------
-------------LMark (If)-------------
if(ECX = 0x0:I32) then Jmp (Exit, 0) else Jmp (Continue, 1)
-------------LMark (Continue)-------------
[EDI] := [ESI]:I8
ESI := (ite (DF) ((ESI - 0x1:I32)) ((ESI + 0x1:I32)))
EDI := (ite (DF) ((EDI - 0x1:I32)) ((EDI + 0x1:I32)))
ECX := (ECX - 0x1:I32)
Jmp (If, _)
-------------LMark (Exit)-------------
-------------IEMark (401030)-------------
@sangkilc
Copy link
Member

Yes, I understand your concern.

It may seem weird, but here, we would like to consider each iteration of a REPZ instruction as a "separate" instruction. In fact, such an assumption is not uncommon. For example, when you are taking an instruction-level execution trace from Pin, it will instrument every iteration of a REPZ instruction too. When you do single-stepping with GDB, the same assumption holds. So we decided to explicitly separate each iteration with EIP := .... This way we can easily align B2R2's execution engine with execution traces obtained from Pin or similar tools.

@enkomio
Copy link
Contributor Author

enkomio commented May 13, 2019

Ok gotcha. Thanks for the clarification :)

@enkomio enkomio closed this as completed May 13, 2019
@enkomio
Copy link
Contributor Author

enkomio commented May 28, 2019

Hi @sangkilc,

sorry to bother you again on this but I struggle to understand how the statements should be interpreted, in particular by considering the following instructions:

0x19146D:    E2 FB                          loop -0x3 ; 0x19146A

which is lifted in the following statements:

-------------ISMark (19146D, 2)-------------
T_710:I32 := ECX
-------------LMark (Loop)-------------
T_710:I32 := (T_710:I32 - 0x1:I32)
if(T_710:I32 != 0x0:I32) then Jmp (Continue, 127) else Jmp (End, 128)
-------------LMark (Continue)-------------
EIP := (EIP + 0x19146A:I32)
Jmp (Loop, 126)
-------------LMark (End)-------------
-------------IEMark (19146F)-------------

and

0x40102E:    F3 A4                          repz movsb

which is lifted in the following statements:

-------------ISMark (40102E, 2)-------------
if(ECX = 0x0:I32) then Jmp (Exit, 114) else Jmp (Continue, 115)
-------------LMark (Continue)-------------
[EDI] := [ESI]:I8
ESI := (ite (DF) ((ESI - 0x1:I32)) ((ESI + 0x1:I32)))
EDI := (ite (DF) ((EDI - 0x1:I32)) ((EDI + 0x1:I32)))
ECX := (ECX - 0x1:I32)
EIP := 0x40102E:I32
-------------LMark (Exit)-------------
EIP := 0x401030:I32
-------------IEMark (401030)-------------

Initially I supposed that we have to emulate the statements in a sequential order and in the first sample it makes sense. But for the second example this strategy doesn't work, since EIP would assume always the same value.

Also, I have to admit that is not very clear to me why the loop instruction is lifted in that way :)

@sangkilc
Copy link
Member

Okay it seems that you found a bug! Especially the loop instruction looks wrong. I will follow up on this soon. Sorry about the brevity.

@sangkilc sangkilc reopened this May 28, 2019
@sangkilc
Copy link
Member

sangkilc commented May 29, 2019

Okay here is what's happening.

EIP := ... may or may not correspond to an InterJmp statement. If you look at https://github.com/B2R2-org/B2R2/blob/master/src/BinIR/LowUIR.fs#L412, there's really no difference between InterJmp and Put statements after pretty-printing, which is bad in my opinion. When there is an InterJmp statement, you should not follow the next statement. Instead, you should change your PC (EIP) value and go to the corresponding next instruction.

Now, if you look at the loop -0x3 case, EIP := (EIP + 0x19146A:I32) this is not an InterJmp, but it is merely a Put statement, which should be handled differently. In this case, you update the EIP, but keep execute the next IR statement. In other words, we stop executing IR statements only when we encounter either InterJmp or InterCJmp. Inter-jumps in B2R2 basically means that we should stop executing/interpreting the current IR statements, and should immediately jump to the next instruction. If you see https://github.com/B2R2-org/B2R2/blob/master/src/BinIR/LowUIR.Eval.fs#L219, those statements are passed to endBlock function for that reason.

Nonetheless, the loop instruction looks wrong, because it should not contain internal (within an instruction) loop as you pointed out. I will create two follow-up issues to handle this problem. First, I don't like the current way of pretty-printing InterJmp statements because it is so confusing and we cannot easily distinguish it with Put statements. Second, the loop instruction semantic is buggy, and we should fix it.

This was referenced May 29, 2019
@enkomio
Copy link
Contributor Author

enkomio commented May 29, 2019

OK understood, I updated my code in order to follow your indication.

Thanks for the explanation :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants