Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardizing memcpy/memset/etc. as pcode operations? #5769

Open
nneonneo opened this issue Sep 11, 2023 · 4 comments
Open

Standardizing memcpy/memset/etc. as pcode operations? #5769

nneonneo opened this issue Sep 11, 2023 · 4 comments
Assignees
Labels
Feature: PCODE Status: Internal This is being tracked internally by the Ghidra team Type: Enhancement New feature or request

Comments

@nneonneo
Copy link
Contributor

Many processors have bulk memory instructions: x86 has "rep movs/stos" and "repe cmps", AArch64 has "CPY*/SET*", WebAssembly has "memory.copy/memory.fill", etc. Some also have string operations, for example "rep scas" (strlen) in x86.

There are two implementation strategies for such operations. One is to create a small loop, which will be translated into a loop in the decompiler; x86 takes this approach, for example. The downside is that the decompiled output becomes harder to read. The second is to use a custom pcodeop, which is much more readable but then requires custom emulation and is opaque to the decompiler. In particular, when memcpying to the stack, not all effects will be visible to the decompiler.

Describe the solution you'd like

I propose to have a dedicated set of core pcodeops to implement bulk memory operations. These would be understood by SLEIGH and by the decompiler, allowing us to have much nicer decompilation for such constructs.

As an additional benefit, a dedicated pcodeop could be the target for a memory operation fusing pass in the decompiler, which recognizes common memory operation idioms (e.g. copying in a loop, or unrolled assignment to consecutive memory addresses) and replaces them with a single bulk memory op. This would enable further simplifications and optimizations. The decompiler could then properly model the effects of bulk memory operations.

I would suggest, at minimum, to have four bulk operations:

  • copy(dst, src, unit_size, count): copy from src to dst, unit_size bytes at a time, a total of count times
  • fill(dst, value, count): fill dst with count copies of value; the unit size is taken from the size of the value varnode
  • read_effect(src, size): a no-op that indicates that the memory region [src, src+size) is read from. Used to support decompilation of custom pcodeops.
  • write_effect(dst, size): a no-op that indicates to the decompiler that the memory region [dst, dst+size) is written to. Used to support decompilation of custom pcodeops.

The latter two are added so that the decompiler can precisely model the memory effects of custom pcodeops.

Other operations are possible, e.g. memcmp, strlen, strcpy, but they are more situational and may be harder to implement, and thus are not part of this proposal.

Describe alternatives you've considered

As noted above, alternative implementation approaches include an inline loop or a custom pcodeop, but both have downsides.

Additional context
Add any other context or screenshots about the feature request here.

@slippycheeze
Copy link

FWIW, I've been trying to work out what I'd like to do about some of the ... challenges, shall we say, that the MSVC Compiler "intrinsic" pragma pose in understanding decompiled code. The code in question is compiled with /Oi, so uses the "intrinsic" forms of memcpy et al. Assorted notes related to this request and/or general problem space follow.

These are NOT "bug report" or "feature request" quality observations yet, but I think they /may/ add some value to your request. I fully expect incremental progress toward all that lot being "solved" too. :)

  1. the (MSVC, at least) optimizer can see the internals of the unrolled function, and use them
    • in several cases it:
      1. copied N-16 bytes of a structure using XMM0, and
      2. copied one qword in the middle using XMM2, and
      3. went ahead to use one dword float in XMM2 after the intrinsic memcpy was done.
  2. the decompiler will sometimes recognise and translate memcpy from the loop, which is great.
  3. the Split combined * decompiler analysis options break this pretty much unconditionally.
  4. assigning a type to any part of the target often, but not always, breaks that recognition.
    • this isn't just Split combined ... getting in the way, and
    • it isn't related to Alias Blocking, but
    • I still haven't narrowed it down to a reproducible case, even with places where I can assign a type to a single local (eg: undefined4float) and see it break memcpy recognition.
  5. I suspect a SLEIGH extension might be able to do this for me, but I can't figure out how.
  6. the limitation of one DataType per stack slot really seems to cripple this recognition, probably because of whatever makes assigning a type to a variable or structure break it.

My expectations and/or desires around this:

  1. intrinsic memcpy etc should take priority over Split combined stores; the fact it is a combined write is a feaure, not a bug.
  2. in-place comments for "weird side-effects" would be great, like the content leak above
  3. seeing memcpy in place of ten lines of XMM register load/store is great, but
  4. the intrinsic versions should be identifiable; I suggest __intrinsic_memcpy or __builtin_memcpy — the later matching the GCC name, but anything distinguishing is fine.

@nneonneo
Copy link
Contributor Author

Re "the decompiler will sometimes recognise and translate memcpy from the loop, which is great.": I don't think I've ever seen this happen! IDA will often do this, but I wasn't aware that Ghidra had the same functionality. Would you happen to have a sample that demonstrates this? I'd be really curious to see how that works :)

@astrelsky
Copy link
Contributor

Re "the decompiler will sometimes recognise and translate memcpy from the loop, which is great.": I don't think I've ever seen this happen! IDA will often do this, but I wasn't aware that Ghidra had the same functionality. Would you happen to have a sample that demonstrates this? I'd be really curious to see how that works :)

It does not.

The limitation of one Data type per stack slot cripples a lot of things. It would be nice to have variables that aren't at function scope (let's just declare 8000 variables at the start of the function before writing our code like in the 80s)...

@ryanmkurtz ryanmkurtz added Type: Enhancement New feature or request Feature: PCODE Status: Triage Information is being gathered labels Sep 15, 2023
@ryanmkurtz ryanmkurtz assigned caheckman and unassigned emteere Sep 18, 2023
@ryanmkurtz ryanmkurtz added Status: Internal This is being tracked internally by the Ghidra team and removed Status: Triage Information is being gathered labels Sep 18, 2023
@ryanmkurtz
Copy link
Collaborator

Relates to #4461

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: PCODE Status: Internal This is being tracked internally by the Ghidra team Type: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants