Add support for Cavium Octeon MIPS instructions, and general MIPS fixes #5615

cryptwhoa · 2024-06-16T04:41:06Z

Bear with me if I'm doing anything weird, this is literally the first time I've ever made a pull request on github. I also do most of my work on a non-internet-connected machine (whose git user name is noone), so hopefully nothing weird that came from that either.

This pull request adds support for Cavium MIPS instructions; these processors are also MIPS64 processors, so I added missing MIPS64 instructions that make Linux disassemble more nicely. While I was doing work here, I figured I might as well fix some lifting bugs as I found them.

Explanations for some potentially head-scratching decisions:

The way to turn on Cavium MIPS decoding is a settings option; it would be nice to have separate architectures, but the way that ELFs notify the presence of Cavium instructions is through the "options" field, which doesn't influence how the architecture is chosen, which means that without using settings, either no ELFs use Cavium MIPS or all ELFs use Cavium, which could break people's existing MIPS bndbs.
Registers were added for all of the coprocessor 2 registers, which really blows up the register space. As far as I know, this is the only way to put strings into intrinsics (unless you also want to blow up the intrinsic space). It would be nice to add support for something like an enum, that just maps numbers to strings, which can be passed to intrinsics, to help cut down on the number of registers that have to be added to do things like this.
Unaligned memory accesses (LWL/LWR/SWL/SWR/LDL/LDR/SDL/SDR) were turned into intrinsics, at least when they weren't part of a pair of adjacent instructions. This was a compromise between what would be really ugly lifted code and the previously-existing incorrect implementation.

Un-done work:

The cavium instructions ULW/ULD/USW/USD aren't decoded; their opcodes collide with LWL/LWR/SWL/SWR/LDL/LDR/SDL/SDR, but variably so, depending on the value of a bit in a cavium-specific register (which also makes some of those LWL/etc. into NOPs); this shouldn't be a huge deal, since they do the same thing (unaligned reads and writes), just in one instruction instead of 2.
VMM0, VMULU, ZCB, and ZCBT aren't lifted due to not having any code samples that use them. They should still disassemble fine though.
"unknown unknowns"; the cavium datasheet I found is from a pretty old generation, so if there are any instructions added in later versions, they're not handled.

An unsolicited suggestion would be that we don't need all of the MIPS version decode tables for 1 through 6; only a few are used anyways (the one for MIPS32, MIPS64, and cavium64), and as far as I can tell there aren't any forwards-incompatible instructions; the one example I saw was MIPS_LHI that only exists in version 1, which isn't an instruction in the documentation and only seems to be in SPIM documentation.

This makes it easier to add opcodes, and also is a little easier on the eyes. This also makes it more explicit that there won't be any out-of-bound read weirdness if accessing a row/column of an array that hasn't been specified.

These were some instructions that were added in MIPS III, but also exist in V, VI, but weren't in the decode tables.

This is to prepare for adding an option to support Cavium MIPS instructions.

These were in version 4 for some reason, but not version 6...

This was using the wrong shift direction, and also an unintuitive order in the AND masking.

This wasn't performing the shift at all, and was masking the wrong value for the register whose value to keep. The order of the ANDs was also unintuitive, starting with the constant.

There are a bunch of these in the linux kernel I'm working with, and this helps cut down on the number of "unimplemented LLIL" to sift through.

This also sets the second immediate to 0 to clarify that cop2 only has one immediate field.

Note that we don't have any code samples that use these instructions yet, but they're basically the same as their 32-bit counterparts in terms of disassembly.

Bitwise instructions (AND, ANDI, NOR, OR, ORI, XOR, XORI...there may be others, they're not conveniently listed anywhere) don't sign-extend the result, so for 64-bit architectures the whole register gets used.

Previously, this didn't account for a carry bit from the add to LO going to the add of HI, and it's a bit clearer with register splitting anyways.

These just sign extend to the width of the register.

With the exception of the unaligned store/loads, this should be all of the cavium instructions. We don't have any code samples for VMM0 or VMULU, so we leave them unlifted for now.

This seemed to have assumed that if the code reaches the point where it's lifting a lone LWL/LWR/SWL/SWR, then it didn't get caught by the code that checked for pairs of LWL/LWR/SWL/SWR, and so it must be a 2-byte read...however, this assumption doesn't hold for code like this: ``` swl v0, (v1) <random other instructions> swr v0, 3(v1) ``` In this case, the intent is still an unaligned store, just split up for whatever reason the compiler did it, and this isn't a 2-byte store, which is what the LLIL was. The actual LLIL is going to be a lot trickier, but in the meantime it's probably better to admit we can't lift this code, rather than lifting it incorrectly.

If BEQZ gets to exist, it only makes sense to make BNEZ too.

CLAassistant · 2024-06-16T04:41:12Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

noone seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Note that LWL/LWR hasn't been tested as much, since the only places these are used in the binary sample I have are right next to each other, so they get treated as a pair and lifted together. There didn't seem to be a clean way to lift these instructions that wouldn't involve masking pointers, which isn't friendly to users or automated analysis, so I settled on intrinsics. One fidelity trade-off is that the LLIL indicates that the entire 32-bit/64-bit value is read/written to for each instruction. The decision to be optimistic about how many bytes get accessed was chosen to help make sure structure xrefs get noticed. For example, suppose there's a 4-byte value at offset 7 of a struct; when this gets set by SWL rX, 7(rY)/SWR rX, 0xa(rY), we want to make sure it gets caught by each instruction, since each one writes that offset. An unfortunate side effect is that in this example, the LLIL indicates that offset 0xb also gets written to by the SWR instruction, which is not true. However, it's generally more annoying when xrefs aren't originally found than having to filter through xrefs that do exist, so this trade-off seemed worth it to me.

plafosse · 2024-06-16T19:53:12Z

So I do believe we can support this as a separate architecture using some of the newer platform callbacks. I can look them up when I get to a computer, but we do this now for a couple of architectures.

This uses a way less hacky way to recognize whether or not to use the octeon MIPS architecture versus the vanilla MIPS architecture than a settings menu.

emesare · 2024-07-08T17:08:50Z

arch/mips/il.cpp

-							il.Const(registerSize, (1<<op4.immediate)-1),
-							il.ShiftLeft(registerSize, ReadILOperand(il, instr, 2, registerSize),
-								il.Const(1, op3.immediate)))));
+							il.LogicalShiftRight(registerSize,


Nice catch!

emesare · 2024-07-08T19:47:02Z

As it stands right now this does not cause any regressions and the changes which do impact normal MIPS binaries are positive, I am inclined to merge this as-is. The Cavium specific instructions are not easily reviewable, however given the other changes to the MIPS architecture were correct and high-quality, the lifting of those instructions are not a cause for concern.

emesare · 2024-07-08T22:43:42Z

Added in 4.1.5671, thank you again for your patience!

rssor · 2024-07-17T19:05:44Z

There's one minor change I'm tempted to make to this after the next stable is out, that would be to look into swr/swl and lwr/lwl and have one be close to a nop and the other just the memory access (which is which depending on the endianness), potentially with the lifter looking ahead to make sure regs are the same (falling back to current behavior if they are not).

For the samples I've seen, I think would be generally viable and clean up intrinsics somewhat, but my sample size hasn't been enormous. You have any thoughts about this? This is strictly speaking less precise but possibly more useful.

noone added 30 commits June 11, 2024 04:44

Reformat decode tables

f51a02f

This makes it easier to add opcodes, and also is a little easier on the eyes. This also makes it more explicit that there won't be any out-of-bound read weirdness if accessing a row/column of an array that hasn't been specified.

Add instructions from MIPS III to V, VI

aecce09

These were some instructions that were added in MIPS III, but also exist in V, VI, but weren't in the decode tables.

Reformat another table row

0f7d007

Add special2, special3 tables for mips64

733aa22

Add DINSx instruction decode, disassembly

51e7b78

Add DEXTx instruction decode, disassembly

592886a

Add DCLZ, DCLO instruction decode, disassembly

26eb1a7

Replace non-existent instruction DSLV with DSRLV

11cf2b1

Add DSBH, DSHD instruction decode, disassembly

6222d04

Fix use of upper 32 bits for JAL in MIPS64

3b3fea9

Fold m_enablePseudoOps into generic flags field

dabfe8c

This is to prepare for adding an option to support Cavium MIPS instructions.

Give MIPS64 LDL, LDR instructions

4e30759

These were in version 4 for some reason, but not version 6...

Add cavium instruction decode, disassembly

3284a11

Add LLIL for CNMIPS_BBITx instructions

27e1a5f

Add LLIL for DSUBx instructions

e7f52ca

Add non-intrinsic LLIL for cavium instructions

be9f0d3

Add LLIL for LLD instruction

11ba7cc

Add LLIL for 64-bit shifts

56e6ba7

Use only lower 32 bits of variable shift in LLIL

bd9b712

Add LLIL for LWU instruction

ec715fe

Fix LLIL for LHU instruction

ffe6259

Fix LLIL for EXT instruction

10c7829

This was using the wrong shift direction, and also an unintuitive order in the AND masking.

Add LLIL for DEXTx instructions

efe7018

Fix LLIL for INS instruction

08dd72b

This wasn't performing the shift at all, and was masking the wrong value for the register whose value to keep. The order of the ANDs was also unintuitive, starting with the constant.

Add LLIL for DINSx instructions

8795e0a

Add LLIL for 64-bit multiply instructions

f6c31c7

Add decode, disassembly for cavium-specific syncs

ab78cc2

Add LLIL for DDIVx instructions

f2169e3

Add LLIL for sync instructions

dbd6932

There are a bunch of these in the linux kernel I'm working with, and this helps cut down on the number of "unimplemented LLIL" to sift through.

Add LLIL for some miscellaneous instructions

05f7dac

noone added 16 commits June 14, 2024 06:24

Add cavium cop0 registers

4b14e22

Add LLIL for cavium multiplication instructions

ece7ee5

Add intrinsic LLIL for PREF, CACHE instructions

d650965

Add DMTC2/DMFC2 LLIL stub

114c018

This also sets the second immediate to 0 to clarify that cop2 only has one immediate field.

Add cavium cop2 register names

32faf6d

Use consistent brace-on-next-line format

daa69ca

Add decode, disassembly for DROTx instructions

ea8c195

Note that we don't have any code samples that use these instructions yet, but they're basically the same as their 32-bit counterparts in terms of disassembly.

Implement SLL's sign-extension special case

8448955

Stop sign-extending result of some instructions

796c4df

Bitwise instructions (AND, ANDI, NOR, OR, ORI, XOR, XORI...there may be others, they're not conveniently listed anywhere) don't sign-extend the result, so for 64-bit architectures the whole register gets used.

Rework MADD, MADDU LLIL

f175dc4

Previously, this didn't account for a carry bit from the add to LO going to the add of HI, and it's a bit clearer with register splitting anyways.

Avoid double sign-extend in SEB, SEH

0f72a6a

These just sign extend to the width of the register.

Add decode, disassembly for other cavium mults

c3d09cd

With the exception of the unaligned store/loads, this should be all of the cavium instructions. We don't have any code samples for VMM0 or VMULU, so we leave them unlifted for now.

Implement LLIL for pairs of LDL/LDR and SDL/SDR

887090a

Fix shift to get op of branch instructions

58fbc9c

Add BNEZ pseudo-opcode

4826a08

If BEQZ gets to exist, it only makes sense to make BNEZ too.

galenbwill self-assigned this Jun 16, 2024

galenbwill added the Arch: MIPS Issues with the MIPS architecture plugin label Jun 16, 2024

noone added 2 commits June 19, 2024 08:27

Add linux-mips64 platform

67bc05f

Move cnmips64 to its own architecture

fc0a0e3

This uses a way less hacky way to recognize whether or not to use the octeon MIPS architecture versus the vanilla MIPS architecture than a settings menu.

emesare self-requested a review July 8, 2024 15:57

emesare reviewed Jul 8, 2024

View reviewed changes

emesare closed this Jul 8, 2024

XVilka mentioned this pull request Jul 18, 2024

MIPS: updating capstone and RzIL rizinorg/rizin#4007

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Cavium Octeon MIPS instructions, and general MIPS fixes #5615

Add support for Cavium Octeon MIPS instructions, and general MIPS fixes #5615

cryptwhoa commented Jun 16, 2024

CLAassistant commented Jun 16, 2024

plafosse commented Jun 16, 2024

emesare Jul 8, 2024

emesare commented Jul 8, 2024

emesare commented Jul 8, 2024

rssor commented Jul 17, 2024

Add support for Cavium Octeon MIPS instructions, and general MIPS fixes #5615

Add support for Cavium Octeon MIPS instructions, and general MIPS fixes #5615

Conversation

cryptwhoa commented Jun 16, 2024

CLAassistant commented Jun 16, 2024

plafosse commented Jun 16, 2024

emesare Jul 8, 2024

Choose a reason for hiding this comment

emesare commented Jul 8, 2024

emesare commented Jul 8, 2024

rssor commented Jul 17, 2024