New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If it's possible to make `PC+X` more readable when disasm arm? #780

Open
qc1iu opened this Issue Sep 28, 2016 · 13 comments

Comments

Projects
None yet
4 participants
@qc1iu
Copy link
Contributor

qc1iu commented Sep 28, 2016

Recently, I use capstone to do some ARM disasm. Not like X86 disasm, ARM disasm often use pc as the mem.base since the length of ARM insts are fixed.

Not like arm-linux-gnueabi-objdump, capstone can not distinguish data(constant-pool) from text, all the input be treat as text. This result that the code which capstone disasmed can not be complied directly by gcc or clang.

arm-linux-gnueabi-objdump can recognize .word.

9a78: e24bd010  sub sp, fp, #16
9a7c: e8bd8c70  pop {r4, r5, r6, sl, fp, pc}
9a80: 00000148  .word 0x00000148
9a84: 0000003c  .word 0x0000003c
9a88: 0000f41d  .word 0x0000f41d
9a8c: 0000f40d  .word 0x0000f40d
9a90: 00000080  .word 0x00000080
9a94: 00000178  .word 0x00000178
9a98: 0000f263  .word 0x0000f263
...

capstone treat constant-pool as insts.

39544 sub sp, fp, #0x10
39548 pop {r4, r5, r6, sl, fp, pc}
39552 andeq r0, r0, r8, asr #2
39556 andeq r0, r0, ip, lsr r0
39560 andeq pc, r0, sp, lsl r4
39564 andeq pc, r0, sp, lsl #8
39568 andeq r0, r0, r0, lsl #1
39572 andeq r0, r0, r8, ror r1
39576 andeq pc, r0, r3, ror #4
39580 andeq r1, r1, r0, lsl r8
39584 andeq pc, r0, r7, lsl #5
39588 andeq r1, r1, r4, lsr r8
39592 andeq pc, r0, fp, lsr #5
39596 andeq r1, r1, r8, asr r8
39600 andeq pc, r0, pc, asr #5
39604 andeq r1, r1, ip, ror r8

If it's possible for arm disasm [pc, #4] to absolute address? Or recognize constant-pool like arm-linux-gnueabi-objdump.

@aquynh

This comment has been minimized.

Copy link
Owner

aquynh commented Sep 28, 2016

a quick question: how to know where the constant pool starts in the case above?

@qc1iu

This comment has been minimized.

Copy link
Contributor Author

qc1iu commented Sep 28, 2016

@aquynh This problem also troubles me.
Recently, I have been doing some reserch for VM protect for Android platform, and the key idea is that translate the original ARM instructions which should be protected to VM instructions. Since the VM instructions are not the the primitive place, so all the PC related instrcutions are hard to translate.

@aquynh

This comment has been minimized.

Copy link
Owner

aquynh commented Sep 28, 2016

if this is not easy to decide, then clearly it does not belong to Capstone. as a primitive framework, we try to keep everything simple. looks like what you need must be built on inside your tool, but not in Capstone.

@qc1iu

This comment has been minimized.

Copy link
Contributor Author

qc1iu commented Sep 28, 2016

thanks.

@aquynh

This comment has been minimized.

Copy link
Owner

aquynh commented Sep 28, 2016

btw, you see the difference between Capstone & objdump because you are comparing a framework (Capstone) with a tool (objdump). internally, objdump does a lot of analysis to give you the output you saw. if you want to replicate this, do that in your tool, on top of Capstone.

thanks.

@aquynh aquynh closed this Sep 28, 2016

@radare

This comment has been minimized.

Copy link
Contributor

radare commented Sep 28, 2016

objdump doesnt do any analysis at all, it just grabs the information from the binary headers and mix it with the disasm, but this doesnt requires any analysis step.

The pc+x -> absaddr �can be done directly by capstone, in fact that's what r2 do when enabling e asm.relsub=true. What this option does is replacing the PC+X to something absolute which is then substituted by the symbol name or the higher-level representation that may be more readable by the user.

The reason why objdump shows those .dwords is because this info is encoded in the elf header. not related to any analysis result.

Also, i see that the title of the issue is completely different than the body of the issue, this looks like two different issues to me

@aquynh aquynh reopened this Sep 28, 2016

@aquynh

This comment has been minimized.

Copy link
Owner

aquynh commented Sep 28, 2016

please provide a sample input for PC+X case.

@radare

This comment has been minimized.

Copy link
Contributor

radare commented Sep 28, 2016

0x00008150 24c09fe5 ldr ip, [pc, 0x24]

which turns into ldr ip, [0x817c]

On 28 Sep 2016, at 17:11, Nguyen Anh Quynh notifications@github.com wrote:

please provide a sample input for PC+X case.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub #780 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lif2j5elcEHAZQkqorSG4dIceD3pks5quoOigaJpZM4KIeIi.

@aquynh

This comment has been minimized.

Copy link
Owner

aquynh commented Sep 28, 2016

this is not hard to do, but we need to have a new syntax option, and i am not sure this is worth doing.

@radare

This comment has been minimized.

Copy link
Contributor

radare commented Sep 28, 2016

R2 does that by parsing the plaintext representation of the instruction im not aware of any regression on that, but having this in core capstone would be nice imho.

The resulting info should be generated from the internal representation right inside capstone. That would allow me to skip this step in case the instruction is disassembled this way.

Also. This representation removes the information of absoluteness vs relative instructions, so its not always desired to have them this way. But its for sure mich more readable than the relative one

On 28 Sep 2016, at 17:41, Nguyen Anh Quynh notifications@github.com wrote:

this is not hard to do, but we need to have a new syntax option, and i am not sure this is worth doing.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

@qc1iu qc1iu changed the title If it's possible for arm disasm `[pc, #4]` to absolute address? If it's possible to make `PC+X` more readable when disasm arm? Sep 29, 2016

@qc1iu

This comment has been minimized.

Copy link
Contributor Author

qc1iu commented Sep 29, 2016

thanks all.
I think there are two ways. The one is that make capstone recognize constants-pool from instructions, so that all the PC related instruction can make sense. The other way I think is replacing the PC+X to something absolute.

As radare said, we can learn the constant-pool related info from ELF header. Since the original design of capstone is for disasm a code segment rather than entire ELF file, so I think the latter may be good by support a option.

@perks

This comment has been minimized.

Copy link

perks commented Mar 3, 2017

I'd be interested on working on this if other people are still up for it

@aquynh

This comment has been minimized.

Copy link
Owner

aquynh commented Mar 3, 2017

yes you can go ahead. i dont think anybody is working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment