Skip to content

Update RISCV-64 sleigh files to support vector, bit manipulation, and crypto extensions #5778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 85 commits into
base: master
Choose a base branch
from

Conversation

thixotropist
Copy link
Contributor

Add several RISCV Instruction Set extensions to Ghidra, following discussion #5744. This pull request tracks the tip of the binutils testsuite for vector, bitmap, and crypto instructions. You can verify the content by importing sample binaries from https://github.com/thixotropist/ghidra_import_tests. Import the RISCV-64 gas test suite, assemble to binary, then iterate on the Ghidra sleigh files until Ghidra and objdump give essentially the same disassembled output.

The sleigh files do not yet include pcode semantics. Recent updates to GCC-14 and libssl using RISCV vector and crypto extensions may give us sample binaries to work with, to see what pcode semantics actually add value with complex instructions like these.

@jobermayr
Copy link
Contributor

To fix build errors:

diff --git a/Ghidra/Processors/RISCV/certification.manifest b/Ghidra/Processors/RISCV/certification.manifest
index 569138783..b498068db 100644
--- a/Ghidra/Processors/RISCV/certification.manifest
+++ b/Ghidra/Processors/RISCV/certification.manifest
@@ -40,6 +40,9 @@ data/languages/riscv.rvc.sinc||GHIDRA||||END|
 data/languages/riscv.rvv.sinc||GHIDRA||||END|
 data/languages/riscv.table.sinc||GHIDRA||||END|
 data/languages/riscv.zi.sinc||GHIDRA||||END|
+data/languages/riscv.zvbb.sinc||GHIDRA||||END|
+data/languages/riscv.zvkng.sinc||GHIDRA||||END|
+data/languages/riscv.zvksg.sinc||GHIDRA||||END|
 data/languages/riscv32-fp.cspec||GHIDRA||||END|
 data/languages/riscv32.cspec||GHIDRA||||END|
 data/languages/riscv32.dwarf||GHIDRA||||END|

@thixotropist
Copy link
Contributor Author

I expect to fill in some gaps in this PR shortly. Scalar crypto extensions were skipped even though vector crypto extensions were added. openssl can use RISCV scalar crypto AES extension instructions but not (yet?) the vector crypto extensions. I also hope to add minimalist pcode semantics to allow decompilation of the simplest GCC-14 RISCV builtin intrinsic vector function examples - as used in rvv_memcpy, rvv_strncpy, rvv_matmul, and rvv_reduce.

Ghidra developers will have some serious design questions to thrash out when GCC-14 autovectorization support lands some time next year.


# Thead semi's extensions currently recognized by binutils objdump
# and documented in https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.0.0/xthead-2022-09-05-2.0.0.pdf
@include "riscv.xthead.sinc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be guarded by ifdef, with the define in a new xhead slaspec file

Copy link
Contributor Author

@thixotropist thixotropist Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's reasonable. Does the new xthead slaspec file get named in riscv.ldefs so the user can invoke it, or do you suggest we generalize riscv.opinion to look for the Tag_RISCV_arch ELF attribute, recognize the current composition of extensions, and set finer-grain inclusion tags?

Tag_RISCV_arch: "rv64i2p1_m2p0_a2p1_f2p2_d2p2_v1p0_zicsr2p0_zifencei2p0_zmmul1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0_xtheadfmemidx1p0"

Apparently binutils 2.41 concatenates all of the march extensions passed to gcc, so the tail end of this attribute reads something like:

"This binary requires an ISA supporting:"

  • x - a vendor specific Instruction Set Architecture module not currently part of a proposed standard profile
  • thead - the lower case vendor name publishing the extension set
  • fmemidx - the extension set module name
  • 1p0 - the version of this extension set, likely 1.0

Composing standard and vendor extension profiles raises lots of Ghidra import design questions. Apparently questions the binutils team has already addressed - do their answers work for Ghidra?

Update: These design decisions may lead to significant refactoring or code bloat.

  1. Does every known RISCV-based CPU get its own slaspec file and 5 MB sla file in the Ghidra distribution?
    • even this doesn't always work with chips containing heterogeneous cores
  2. Should Ghidra scan user-specific directories for additional compiled sla and slaspec files for any RISCV CPU extension combinations individuals find useful?
  3. Are Ghidra build-time decisions generating slaspec files moved to run-time actions generating temporary sla files after parsing the import's ELF attributes?
  4. Are a new set of runtime ifdef statements allowed in build-time sla files to enable specific extensions at run time?

Personal opinion:

  • ratified extensions with non-conflicting opcode codepoints should continue to be included in the baseline 32 bit and 64 bit slaspec and sla files
  • Ghidra does not want to be responsible for recognition of proprietary extensions, but these will surely exist. Searching user directories for slaspec and sla 'plugins' should be enabled just as java and python user directories are enabled.
  • The baseline ELF importer may be extended to expose ELF attributes on imports, such as the Tag_RISCV_arch file attribute containing aggregate extensions used in each compilation unit.

@thixotropist
Copy link
Contributor Author

THead extensions are now collected into separate slaspec files, which are now referenced in riscv.ldefs. I've only tested the 64 bit version. Thanks to @mumbel for making this suggestion.

  • each of the 10 THead extensions is guarded by its own versioned ifdef flag derived from the name embedded in Tag_RISCV_arch. That should give a bit more granular control.
  • riscv.opinion remains unchanged. The presence of ISA extensions is identified in ELF by appending to Tag_RISCV_arch, rather than setting a bit in a fixed ELF header bitfield e_flags.
  • users will only see the THead language option on import or change-language if they show all RISCV languages, not just the recommended language based on e_machine and e_flags

@madushan1000
Copy link

@thixotropist I stubbed out some risc-v packed simd instructions and implemented some thead instructions here to fix some issues had with my work, feel free to cherry-pick the commits if you want.

@thixotropist
Copy link
Contributor Author

@madushan1000: Those look good - I'll be happy to cherrypick them into the branch. Have you any suggestions for RISCV integration tests to add to https://github.com/thixotropist/ghidra_import_tests? It's currently very weak in 32 bit and microcontroller exemplars, as I've been leaning towards linux-capable 64 bit examples.

@madushan1000
Copy link

madushan1000 commented Nov 4, 2023

This sdk I'm working with has a bunch of rv32 examples, https://github.com/bouffalolab/bouffalo_sdk/tree/master/examples.
bl602, bl70x have sifive e24 cores. and bl61x and bl808 has various t-head cores. the examples have a small readme with the build instructions. they all generate .elf files.

@gamelaster
Copy link

I just tried latest version of thixotropist:isa_ext on BL808 BootROM, which is E907 (rv32 thead), and so far, everything looks okay. I still need to play with it more, but so far it was enough for everything I needed. Thanks everyone for this effort, I hope it will be possible to merge it at some point.

@GhidorahRex
Copy link
Collaborator

This is a very large PR that has had some very active periods of development. I've been waiting to make sure that it was complete and stable enough before reviewing it. Looking through it now though, it looks like it should be ready for a review.

@thixotropist
Copy link
Contributor Author

Thanks for the comments - I'd like to see this merged too. The PR may still be in triage because it implicitly makes a lot of design decisions regarding pcode op typing and emulation, as well as ISA extension handling. The developers may need more discussion - public and internal - before they are willing to go down that path.

The current state of the PR is stable. There are some newer RISCV ISA extensions for fractional floating point ops and saturating math - I don't plan on adding these to the existing PR, so it can be reviewed as is.

As a discussion example, what does the Ghidra community want to see in the decompiler window when working with functions like:

// compiled with RISCV march=rv64gcv, -O3, and -ffast-math
void test_1_ref(unsigned long long *in, unsigned long long *out, unsigned int size)
{
    int i;
    int upper_index = size - 1;
    for (i=0; i < size; i++) {
        out[i] = in[upper_index - i];
    }
}

SIMD or vector extensions can turn simple loops over structures into something not so simple. Type inference, compilable C extraction, and emulation in general are all abandoned with the design approach used in this PR.

@thixotropist
Copy link
Contributor Author

Is there anything we can do to help @GhidorahRex with this review? For instance

  • Documentation
  • Test cases
  • Design notes and tradeoff analyses

@GhidorahRex GhidorahRex added Status: Prioritize This is currently being prioritized and removed Status: Triage Information is being gathered labels Feb 7, 2025
The associated semantics are only suggestive of the action taken by these instructions
@jobermayr
Copy link
Contributor

frs1D and frs2D are guarded. To keep #6390 building:

diff --git a/Ghidra/Processors/RISCV/data/languages/riscv.zfa.sinc b/Ghidra/Processors/RISCV/data/languages/riscv.zfa.sinc
index 6f5921e565..490a96806c 100644
--- a/Ghidra/Processors/RISCV/data/languages/riscv.zfa.sinc
+++ b/Ghidra/Processors/RISCV/data/languages/riscv.zfa.sinc
@@ -786,6 +786,7 @@ define pcodeop fmax_m;
     frd=fmax_m(frs1S,frs2S);
 }
 
+@if ((FPSIZE == "64") || (FPSIZE == "128"))
 :fminm.d frd,frs1D,frs2D is frs1D & frd & frs2D & op0001=0x3 & op0204=0x4 & op0506=0x2 & funct3=0x2 & funct7=0x15
 {
     frd=fmin_m(frs1D,frs2D);
@@ -795,6 +796,7 @@ define pcodeop fmax_m;
 {
     frd=fmax_m(frs1D,frs2D);
 }
+@endif
 
 :fminm.h frd,frs1S,frs2S is frs1S & frd & frs2S & op0001=0x3 & op0204=0x4 & op0506=0x2 & funct3=0x2 & funct7=0x16
 {
@@ -806,6 +808,7 @@ define pcodeop fmax_m;
     frd=fmax_m(frs1S,frs2S);
 }
 
+@if ((FPSIZE == "64") || (FPSIZE == "128"))
 :fminm.q frd,frs1D,frs2D is frs1D & frd & frs2D & op0001=0x3 & op0204=0x4 & op0506=0x2 & funct3=0x2 & funct7=0x17
 {
     frd=fmin_m(frs1D,frs2D);
@@ -876,6 +879,7 @@ define pcodeop fmax_m;
     local tmp:$(XLEN) = trunc(frs1D);
     rd = sext(tmp);
 }
+@endif
 
 # like the FLE.* and FLT.* instructions, except that quiet NaN inputs do not cause the invalid operation exception flag to be set
 :fleq.s rd,frs1S,frs2S is frs2S & frs1S & rd & op0001=0x3 & op0204=0x4 & op0506=0x2 & funct3=0x4 & funct7=0x50
@@ -888,10 +892,12 @@ define pcodeop fmax_m;
         rd = zext(frs1S f== frs2S);
 }
 
+@if ((FPSIZE == "64") || (FPSIZE == "128"))
 :fleq.d rd,frs1D,frs2D is frs2D & frs1D & rd & op0001=0x3 & op0204=0x4 & op0506=0x2 & funct3=0x4 & funct7=0x51
 {
         rd = zext(frs1D f<= frs2D);
 }
+@endif
 
 :fltq.d rd,frs1S,frs2S is frs2S & frs1S & rd & op0001=0x3 & op0204=0x4 & op0506=0x2 & funct3=0x5 & funct7=0x51
 {

@thixotropist
Copy link
Contributor Author

@jobermayr is correct. I've added guards as suggested, with additional guards sensitive to quad floating point support. I'll commit these changes shortly. In general, I've used single-precision registers for half-precision FP ops and double-precision registers for quad-precision FP ops as interim semantics.

This raises more general questions:

  • What set of RISCV language definitions and test object data should we be using in testing? I'm concentrating on RISCV-64 processors with vector 1.0 and RVA-23 support. I'm happy to test against other platforms if we have a consensus.
    • test cases in the style of ghidra/Ghidra/Features/Decompiler/src/decompile/datatests/*.xml would be nice
  • What's a better approach to half-precision FP semantics? There are RISCV extensions defined for both IEEE and bfloat types of half-precision numbers, and no current support for either within the Ghidra decompiler or sleigh builtins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Processor/RISC-V Status: Prioritize This is currently being prioritized
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants