Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sifting instruction encodings on ARM64, many capstone unsupported encodings discovered #2150

Closed
watbulb opened this issue Aug 27, 2023 · 6 comments
Labels
Milestone

Comments

@watbulb
Copy link
Contributor

watbulb commented Aug 27, 2023

Hello,

I am working on a project to locate undefined instructions on various ARM64 processors, and attempt to attribute them to hardware.

In my code, I do a naïve masked increment to search the encoding space from 00 00 00 00 to ff ff ff ff, however, before I run the incremented mask as a instruction, I first pass the instruction to execute to capstone in-order to first check if the encoding is known by some disassembler, before attempting to execute the instruction and checking various pieces of the processor state if executed/decoded.

Doing this increment, disassemble, check loop has resulted in creating a corpus of instructions that decode properly using LLVM 16.0.6 objdump, however, capstone has no knowledge of such encodings. Some of these are due to missing extension support in capstone, which is fine, I can filter and work around that. The instructions I am concerned about are instructions that are in the base ISA for Aarch64 that LLVM handles, but capstone does not.

I wanted to start a discussion here about how I should go about working with the capstone contributors here and which way would be the best to report these decoding inconsistencies. I can upload a corpus set with instructions that are not part of a extension set for Aarch64 which capstone does not decode, but LLVM does. Would this be the best way forward? Unfortunately, I'm not terribly familiar with the capstone codebase, but I'm quite familiar with TableGen, I'd be happy to try and diagnose this if its indeed an issue and i'm not crazy or doing something stupid 😆. I apologize if this is just a bunch of noise that will be fixed in #2026. I can also try @Rot127's auto-sync-aarch64 branch now and report if these have been fixed, if at all helpful.

Thank you!

Below I'll include a couple examples of these instructions:

LDRSB
LLVM objdump 16.0.6

1809d38: 38de27de      ldrsb   w30, [x30], #-0x1e

cstool 5.0.1:

./cstool -d arm64 '38de27de'
ERROR: invalid assembly code
./cstool -d arm64 'de27de38'
ERROR: invalid assembly code

LDXRB
LLVM objdump 16.0.6

2324: 0d 02 40 08   ldxrb   w13, [x16]

cstool 5.0.1:

./cstool -d arm64 '0d024008'
ERROR: invalid assembly code
./cstool -d arm64 '0840020d'
ERROR: invalid assembly code

LDTR
LLVM objdump 16.0.6

60121e4: 42 f8 5e f8   ldtr    x2, [x2, #-17]

cstool 5.0.1

./cstool arm64 '42f85ef8'
ERROR: invalid assembly code
./cstool arm64 'f85ef842'
@Rot127
Copy link
Collaborator

Rot127 commented Aug 27, 2023

Using my branch is currently the best option you have. Because it will take a while until everything is merged into next and v6 is released (see: #2015 for tasks left + the current problem that the maintainers don't seem to have much time).

I'll still work on it though, so there might be some things missing (but there shouldn't be many) and I will push stuff to it. But for a simple check if a instruction decodes, it is enough. Last time I checked the whole encoding space (0x0 - 0xffffffff) was decoded without segfaults. Especially if you do not decode the details.

Regarding your overall research: Are you aware of this PR? It adds detailed encoding of each instruction to detail (as detailed as LLVM is, which is sometimes great and sometimes meh).

@watbulb
Copy link
Contributor Author

watbulb commented Aug 27, 2023

@Rot127 Thanks for the quick response!

I'll start right away to implement your branch into my project, I'll let you know sometime tomorrow what the results are and if anything is remaining / issues I might have encountered.

Yes I am aware of that PR, and I started to incorporate it into my work last week. Appreciate you pointing it out though!

Thanks for all the hard work.

Cheers

@Rot127
Copy link
Collaborator

Rot127 commented Aug 27, 2023

I'll start right away to implement your branch into my project, I'll let you know sometime tomorrow what the results are and if anything is remaining / issues I might have encountered.

Great! I am happy about any feedback! There hasn't been many eyes on it yet. So suggestions about improvements and issues are very welcome!

@watbulb
Copy link
Contributor Author

watbulb commented Aug 28, 2023

Hi @Rot127 👋

I made a PR against your repo for some changes that were required to build the whole project on the latest ARM64 macOS, and maybe some cleanups. I'm a noob in this codebase though, so I apologize if I implemented things incorrectly. Happy to make any changes needed.

So far the branch is working well 🎉

 0  de 27 de 38  ldrsb   w30, [x30], #-0x1e
        ID: 583 (ldrsb)
        op_count: 3
                operands[0].type: REG = w30
                operands[0].access: WRITE
                        Vector Arrangement Specifier: 0x0
                        Vector Index: 0
                operands[1].type: MEM
                        operands[1].mem.base: REG = x30
                operands[1].access: READ | WRITE
                        Vector Arrangement Specifier: 0x0
                        Vector Index: 0
                operands[2].type: IMM = 0xffffffffffffffe2
                operands[2].access: READ
                        Vector Arrangement Specifier: 0x0
                        Vector Index: 0
        Write-back: True
        Registers read: x30
        Registers modified: x30 w30

I'm going to keep this open for a little longer until I've ran my tool a couple times through.

Thanks

@Rot127 Rot127 added this to the v6 milestone Apr 26, 2024
@Rot127 Rot127 added the AArch64 Arch label Apr 26, 2024
@Rot127
Copy link
Collaborator

Rot127 commented Apr 26, 2024

Any more things you needed? Otherwise we can close this. For AArch64 we come up with an update to LLVM 18 soon: #2298

@Rot127
Copy link
Collaborator

Rot127 commented May 16, 2024

@watbulb Close this for now. Please let me know if your find more missing instructions which were added in LLVM 18 or earlier.

@Rot127 Rot127 closed this as completed May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants