Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting the address of a varnode (aka instruction operand) #4606

Open
kkaempf opened this issue Sep 19, 2022 · 21 comments · May be fixed by #4681 or #4812
Open

Getting the address of a varnode (aka instruction operand) #4606

kkaempf opened this issue Sep 19, 2022 · 21 comments · May be fixed by #4681 or #4812
Assignees

Comments

@kkaempf
Copy link
Contributor

kkaempf commented Sep 19, 2022

(rephrased to better match sleigh terminology)

I'm working on a processor description for VAX and would need to get the address of an instruction operand.

VAX has one-byte opcodes followed by operands with variable (1 to 5 bytes) length.

Examples (not exact mnemonics)

  1. one-byte opcode, two one-byte operands

00000000: 90 01 50 - MOVE.B S^1, R0

  1. one-byte opcode, one two-byte operand, one four-byte operand

00000000: 90 CF 34 12 E0 78 56 34 12 - MOVE.B (PC+0x1234), (R0 + 0x12345678)

Example 2 is the problem. The first operand ("CF 34 12") is PC-relative, it computes PC+0x1234, where PC is right after the final "12" value. In the example above, that would result in 0x1238.

Problem

To compute PC-relative offsets correctly, I need to know the operands memory address. However, neither inst_start, nor inst_next are usable here:

  • I can't use inst_start because the operand might be second and I don't know the size of the first operand.

  • I can't use inst_next because the operand might be first and I don't know the size of the second operand.

Are there any other options ?

@kkaempf kkaempf changed the title Getting the address of a varnode (aka instruction parameter) Getting the address of a varnode (aka instruction operand) Sep 20, 2022
kkaempf added a commit to kkaempf/ghidra-vax that referenced this issue Sep 20, 2022
Example for NationalSecurityAgency/ghidra#4606

It should compute the EA after the W^ offset has been read (PC at 0x04)

Signed-off-by: Klaus Kämpf <kkaempf@gmail.com>
@gtackett
Copy link
Contributor

Wow! Implementing the VAX instruction set in Ghidra sounds like a very large task., certainly well beyond my level. But as a longtime VMS user and hobbyist, I'd sure love to see the end product.

(Likewise for the PDP-11, if anyone is interested in taking up that architecture.)

@kkaempf
Copy link
Contributor Author

kkaempf commented Sep 23, 2022

Wow! Implementing the VAX instruction set in Ghidra sounds like a very large task., certainly well beyond my level. But as a longtime VMS user and hobbyist, I'd sure love to see the end product.

😊

(Likewise for the PDP-11, if anyone is interested in taking up that architecture.)

It's on my list (now that I have some basic understanding of sleigh 😆 )

@GhidorahRex
Copy link
Collaborator

Since this is static within the instruction, I think context might work well here. Use something like op_addr = (0,3) noflow

For the opcode, use [ op_addr = 1; ]

hen for each operand, you know how big it is, so you can increment op_addr each time. [ rel_addr = inst_start+op_addr; op_addr = op_addr + bytes; ] with bytes being however many bytes are consumed by the currend operand.

@kkaempf
Copy link
Contributor Author

kkaempf commented Sep 23, 2022

Thanks, something similar was my initial attempt.

During parsing, the op_addr was computed correctly but it seems as if the final computation is done after the instruction is completely matched. So ever operand came out with the same (set by the last operand) op_addr.

However, I didn't specify noflow - need to check this out.

Thanks again ! Will report here 😉

kkaempf added a commit to kkaempf/ghidra-vintage that referenced this issue Sep 23, 2022
Signed-off-by: Klaus Kämpf <kkaempf@gmail.com>
@GhidorahRex
Copy link
Collaborator

You may need to add a globalsetas well? There's possibly some other shenanigans that may need to be employed, but I'm pretty confident this can be done with just context.

@GhidorahRex
Copy link
Collaborator

Were you able to get this to work?

@kkaempf
Copy link
Contributor Author

kkaempf commented Oct 3, 2022

Sorry, I was out last week.

For the opcode, use [ op_addr = 1; ]

Not clear where to use this, as the opcode is a field (and I can't add disassembly actions to it, can I ? 🤔)

I tried setting the offset (aka op_addr) for each instruction in a branch. Resetting the op_addr for each instruction works that way, but makes disassembly like 10 times slower :-(

@kkaempf
Copy link
Contributor Author

kkaempf commented Oct 3, 2022

For the opcode, use [ op_addr = 1; ]

Not clear where to use this, as the opcode is a field (and I can't add disassembly actions to it, can I ? thinking)

Solved this with a non-visible operand

op_code: epsilon is epsilon [ op_addr = 1; ] { export epsilon; }

Works nicely, as the op_addr value gets reset when I add ..; op_code; .. to the bit pattern section.

However, when computing operands, every operand gets the final op_addr value (after all operands are parsed) instead of the value at the respective operand position.

@kkaempf kkaempf linked a pull request Oct 21, 2022 that will close this issue
@kkaempf
Copy link
Contributor Author

kkaempf commented Oct 21, 2022

I now created a minified VAX processor description to visualize the problem better.

I use lifting-bits disassembler with this binary:

81 af 00 af 00 af 00

It should disassemble to

ADDB3 B^0x3, B^0x5, B^0x7

but doesn't.

Each operand disassembles to the same value. 😞

@kkaempf
Copy link
Contributor Author

kkaempf commented Oct 31, 2022

I've now tried all kind of combinations of context, noflow, globalset etc. All give the same result: When exporting the result, I get the final value (after all operand have been processed) and not the intermediate ones.

This doesn't come as a surprise to me since ghidra has to process all operands twice. Once for computing inst_next and then again for computing the disassembled values (which might include inst_next).

@kkaempf
Copy link
Contributor Author

kkaempf commented Oct 31, 2022

I've solved it now by introducing an operand_offset variable.

(Adding _printf_s to Ghidra pointed me to the right places, esp. showing that ParserWalker's value retrieval functions where called twice - once reading 4-byte-value to match against the disassembler spec and once reading correctly-sized values to compute the correct disassembly values)

See f9a8788 for the C++ part and ecc24c7 for the Java part.

operand_offset is modeled like inst_start but with a different getValue() implementation:

inst_start has

Address addr = walker.getAddr();
return addr.getAddressableWordOffset();

operand_offset has

return walker.getOffset(-1);

This works nicely and fixes the issue at hand.

@kkaempf kkaempf linked a pull request Dec 11, 2022 that will close this issue
@kkaempf
Copy link
Contributor Author

kkaempf commented Mar 21, 2023

Will #4812 be considered now ? 🥺

@ryanmkurtz ryanmkurtz reopened this Mar 21, 2023
@ryanmkurtz ryanmkurtz removed the Type: Question Further information is requested label Mar 21, 2023
@ryanmkurtz
Copy link
Collaborator

Sorry, I didn't realize this was tied to those PR's.

@jbglaw
Copy link

jbglaw commented Jun 27, 2023

I found this issue because I searched for "VAX". I'm interested in this! However, I don't yet have any clue about Ghidra or Java. Is there any way to help with VAX support?

Just to add a comment: The VAX ISA does have one-byte and two-byte opcodes. So relying on them as one-byte long would be wrong. And then there's a ton of addressing modes. Plus the oddity of the CASE* instructions. Alas... I really would love to help here. It would be great to have something that helps to dissect machine ROMs or system binaries.

@kkaempf
Copy link
Contributor Author

kkaempf commented Jun 28, 2023

Hey @jbglaw , Ghidra VAX support is (mostly) done - except for #4812 😞 .

If you want to build from source, check out the vintage branch at https://github.com/kkaempf/ghidra-vintage

I'm also maintaining RPM packages for iopenSUSE Tumbleweed

@kkaempf
Copy link
Contributor Author

kkaempf commented Jun 28, 2023

I found this issue because I searched for "VAX". I'm interested in this! However, I don't yet have any clue about Ghidra or Java. Is there any way to help with VAX support?

Please check out and contribute to https://github.com/kkaempf/ghidra-vax 😉

Just to add a comment: The VAX ISA does have one-byte and two-byte opcodes. So relying on them as one-byte long would be wrong. And then there's a ton of addressing modes. Plus the oddity of the CASE* instructions.

This all should be working in ghidra.vax

Alas... I really would love to help here. It would be great to have something that helps to dissect machine ROMs

I'm already working on ROMs and I'd be happy to collaborate on http://ghidra-server.org/

@jbglaw
Copy link

jbglaw commented Jun 28, 2023

Well, I just requested an account on ghidra-server.org. Let's see.

OTOH, as I'm a 100% newbie to Ghidra, my first step should be to get it running. Source builds seem to be not too trivial with Debian as it's missing build tools (at least in the requested version.) And then there's that one outstanding patch. Are there chances those will be merged? At least it doesn't look as if it would break anything else.

@kkaempf
Copy link
Contributor Author

kkaempf commented Jun 28, 2023

Well, I just requested an account on ghidra-server.org. Let's see.

🤞🏻

OTOH, as I'm a 100% newbie to Ghidra, my first step should be to get it running. Source builds seem to be not too trivial with Debian as it's missing build tools (at least in the requested version.)

If you're not afraid of downloading binaries (like gradle) on your machine, building should be as simple as

gradle \
  -Dfile.encoding=UTF-8 \
  --project-prop finalRelease=true \
  buildNatives_linux64

This will give you a .tar file which you can extract locally and start ghidra from there.

And then there's that one outstanding patch. Are there chances those will be merged?

Ghidra (the project) is generally slow in merging outside contributions :-/

At least it doesn't look as if it would break anything else.

Certainly not. It's just exposing a value that is already tracked internally.

@GhidorahRex GhidorahRex assigned caheckman and unassigned GhidorahRex Jun 28, 2023
@jbglaw
Copy link

jbglaw commented Jun 28, 2023

gradle is the issue here. But I think I'll give it a try in a Docker container. Maybe wrap a script around it to have a nice receipt for getting the final tarball.

@jbglaw
Copy link

jbglaw commented Jun 28, 2023

So let's hope that this other PR is merged, and thereafter maybe the VAX CPU description. I'll try to get it working locally. :)

@jbglaw
Copy link

jbglaw commented Jun 29, 2023

Successfully built Ghidra (plain upstream sources, though with buildGhidra instead of buildNatives_linux64. The resulting ZIP file contains a working Ghidra afterwards. Next step is to pull in your patch and the VAX CPU description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants