Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relocations: Implement removal & events #5010

Closed

Conversation

boricj
Copy link

@boricj boricj commented Feb 19, 2023

As part of #4922, I need the ability to add/modify/remove relocations from the relocation table. Just like in #4938, I don't know what's eligible or not for upstreaming, so I went with the smallest set of changes I can get away with (namely, removing one relocation in the table). I've also wired up relocation events so that the relocation table panel display is refreshed and added the ability to remove relocations from the clear plugin.

While this works for my own purposes, I haven't really worked on the GUI/event side of Ghidra before, so I probably got things wrong in there. Furthermore, given that the only source of relocations currently available within Ghidra is the initial import, maybe the Edit > Clear With Options... dialog shouldn't default to clearing out relocations?

It's fairly obvious the current core relocation table code in Ghidra was only designed to display the initial set of relocations from the imported files ; it's not really geared for on-the-fly modifications or integrated with the rest of the program model. Maybe my wacky idea of resynthesizing relocations to unlink program back into relocatable object files (https://github.com/boricj/ghidra/tree/feature/elfrelocatebleobjectexporter) warrants an overhaul of the relocation table design to better fit my own use-case, but I'm not going down that rabbit hole just to make my prototype work.

@ryanmkurtz ryanmkurtz added the Status: Triage Information is being gathered label Feb 19, 2023
@ghidra1
Copy link
Collaborator

ghidra1 commented Feb 21, 2023

Relocation's had always been intended to occur during initial import and to be immuteable whcih made such relation events unneccesary. In addition, there is no current handling for merge and conflict handling which can be problematic for multi-user environments. Although, I have already added support for the "add relocation" event in a branch that is in review, although this was done primarily to facilitate testing of new relocation support in script form. It is intended that all relocation processing be incorporated into a Loader at this time. In addition, we do not currently intend on supporting complex exports (e.g., ELF, PE) other than via the original file bytes export for simple patching which relies on restoring the original FileBytes where relocations have been applied. Likewise, our header classes are intended to facilitate parsing only and not for build-up or modification of such headers (e.g., ElfHeader). Our simplistic header classes and processor-specific extensions are not well suited for the general case of header creation/modification such as a compiler/linker would produce. For simiar reasons, PR #4938 is also unlikely to be accepted.

@ryanmkurtz ryanmkurtz added Reason: Won't support This will not be supported by the Ghidra team. and removed Status: Triage Information is being gathered labels Feb 22, 2023
@ryanmkurtz ryanmkurtz closed this Feb 22, 2023
@boricj
Copy link
Author

boricj commented Aug 6, 2023

Relocation's had always been intended to occur during initial import and to be immuteable whcih made such relation events unneccesary. In addition, there is no current handling for merge and conflict handling which can be problematic for multi-user environments. Although, I have already added support for the "add relocation" event in a branch that is in review, although this was done primarily to facilitate testing of new relocation support in script form. It is intended that all relocation processing be incorporated into a Loader at this time.

@ghidra1 @ryanmkurtz I'd like to appeal that decision if possible.

It took me a long time, but I've finally publicly documented a reverse-engineering workflow that allows one to unlink a program inside Ghidra and spit out relocatable object files: parts 7 to 9 on my series of articles about reverse-engineering on my blog. I've made a point to demonstrate it with vanilla Ghidra 10.2 to allow easy reproduction.

Simply put, this technique requires unapplying relocations to restore the original object file section bytes while exporting the object file: part 8 demonstrates this after abusing Ghidra into linking a complete program in a manner that preserves the required relocation table data ; part 9 shows how to reconstruct that data when dealing with a "normal" executable generated with a standard linker. In real life, I've industrialized this process using custom analyzers and exporters on top of a modified Ghidra (to the point where spitting out working ELF object files from a Ghidra database requires just a couple of clicks and no special knowledge or skills), but the same basic principles still apply.

The problem is that the Ghidra relocation table is append-only, it is not possible to modify or delete entries. This makes this approach on a vanilla Ghidra borderline impossible in practice, because any mistake or out-of-sync information in that table simply can't be corrected. If I maintain a separate, private data structure for this, then there would be essentially two relocation table implementations inside Ghidra for mostly the same information. That's not composable with the rest of Ghidra's framework and that would be a lot of duplicated code (for example, that data still needs to be displayed to the user for auditing purposes, which would basically be a copy-pasting of the existing relocation table UI), which is why I believe that the existing relocation table should be improved to handle this use-case.

I've merely demonstrated basic patching by swapping out a function in these articles, but the real potential of this technique is nearly limitless. For example, using my Ghidra fork I have successfully unlinked the archive code of the PlayStation game Tenchu: Stealth Assassins released in 1998, which is basically an executable in a proprietary a.out format, into an ELF relocatable object ; then wrote an utility in C that links with it and glibc to make a program that extract files from the game's proprietary archive data format, without fully reverse-engineering the game's archive code or file format itself. The utility does have to be compiled and linked as a Linux MIPS little-endian ELF executable for this to work, but I can just use QEMU's user-mode emulation to run it on my computer. In other words, as long as you match compatible instruction sets, can express the relocations into the output's file format and don't slice across a symbol (and thunk mismatched calling conventions if necessary), you can sew together a working chimera from whatever parts you want.

I'm aware this raises some hard questions about a data structure that was never meant for this kind of (ab)use. I'm not asking for Ghidra to include this unlinking functionality out-of-the-box (although that would be very dope), but I do want at the very least to make this possible with external plugins on top of a vanilla Ghidra instance without duplicating the functionality of the relocation table, hence my appeal to revisit that topic. Hopefully with this wall of text and my homework done this time, I've presented a case convincing enough to merit further discussion.

I'll let you decide if this warrants opening an issue or a discussion for extra visibility. For what it's worth, I have anecdotal evidence that this highly uncommon technique does exist and is used in the wild by some reverse-engineers besides me (namely, a comment on Hacker News and some private emails). While I didn't invent it, I think the unlinking technique is both practical and very powerful ; it should be a tool that any reverse-engineer has access to rather than a confidential skill mastered by few and Ghidra can have a role in democratizing it.

Note: this is not about improving Ghidra's ELF support code to make it suitable for producing artifacts. I'd rather leverage that code in my exporter if I can, but I can write all of that code myself and spit my ELF object file exporter out as an external plugin if I have to.

@ghidra1
Copy link
Collaborator

ghidra1 commented Aug 7, 2023

Recent improvements were made to the relocation table to better track what was actually applied. However, the current implementation assumes relocations are only manipulated during import since there is no exclusive lock use or reconciling any fallout if we were to support removal/reversal of applied relocations (e.g., instruction/data/reference repair).

The Original Binary Exporter will apply changes made to the program while also reverting applied relocations (i.e., unlinking). This exporter is intended for simple patching cases only and would not handle patching of instructions where relocations were applied or the addition of memory blocks.

@ryanmkurtz
Copy link
Collaborator

I'm confused about why you need to alter the relocation table. Aren't you ultimately exporting object files to disk? Why do you need to modify the original program database to do that? Our Original File exporter uses the relocation table to get the original bytes, and then writes the appropriate values to disk during the export. Why is there a need to do anything more than something similar to that?

@boricj
Copy link
Author

boricj commented Aug 7, 2023

I'm confused about why you need to alter the relocation table. Aren't you ultimately exporting object files to disk? Why do you need to modify the original program database to do that? Our Original File exporter uses the relocation table to get the original bytes, and then writes the appropriate values to disk during the export. Why is there a need to do anything more than something similar to that?

Because I'm not actually exporting the original bytes (as in, whatever Ghidra saw when importing the executable that I'm analyzing), but rather what were the original bytes (as in, the section bytes of the object files that the linker used to generate the executable that I'm analyzing). This is the cornerstone of the unlinking technique: by reconstructing the relocations that were present in the original object files, we can restore the original section bytes of the object files (and synthesize working object files by adding the symbol table and relocation table). I'm using Ghidra's relocation table to store the reconstructed relocations because it seems like a good fit for it.

One way of visualizing this is what I did in part 7: I trick Ghidra into linking an executable, but in a manner that populates the relocation table with the relocations of the object files. What's inside the Ghidra database is a real executable (which I demonstrate by writing some ELF file structure metadata to encapsulate the raw memory blocks from Ghidra and then execute it), but the contents of the relocation table allows me to restore the section bytes of the object files when I synthesize an object file by hand in part 8. Of course, in real-life standard linkers discard the relocations from the object files once they are applied during the creation of an executable, but they can be reconstructed through analysis as shown in part 9.

Note that I'm not recreating the original object files. Rather, I'm synthesizing object files, so the way I slice up the executable may or may not match what the original object files were. In fact, we can't know for sure what were the initial pieces if we just have the final executable to analyze, but that doesn't actually matter as long as we generate valid, working object files at the end.

But what I'm really doing is even tricker, because for my reverse-engineering project I'm synthesizing ELF object files from what is essentially an a.out executable. The original ECOFF (?) toolchain from Sony's PlayStation 1 SDK used to create this executable never could've generated the ELF object files that I'm exporting. However, as long as I can express the original relocations within the ELF formalism, it does work out (by luck the ABIs also happens to match, but thunks could have dealt with mismatched ABIs if needed).

Now, the reason I'm doing all of that is that I want to decompile Tenchu: Stealth Assassins. Rather than dealing with multiple ~500 KiB executables in one go, I wanted to divide-and-conquer the problem, which led me to this Ship of Theseus-style approach. I do have results in private (the bit about the game's archive code in my last comment), but this unlinking stuff has been quite the side-quest.

I understand that this whole thing is both very tricky to understand and very unusual ; I'm probably not that great of an explainer too. Please, feel free to ask questions if there's stuff that's not clear about any of this.

@ryanmkurtz
Copy link
Collaborator

Thank you for the explanation. I think you've already answered this, but I'll ask again to be sure. Do you want the relocation table to be modifiable simply to avoid having to make another map of Java objects that represent your new relocations? You talk about not wanting to duplicate what the relocation is already doing, but what it is doing is somewhat trivial...just storing some addresses and metadata.

@boricj
Copy link
Author

boricj commented Aug 7, 2023

Recent improvements were made to the relocation table to better track what was actually applied. However, the current implementation assumes relocations are only manipulated during import since there is no exclusive lock use or reconciling any fallout if we were to support removal/reversal of applied relocations (e.g., instruction/data/reference repair).

If I deal with the locking issues and the rest, is a mutable relocation table something that the Ghidra team would be interested in merging? I'm trying to gauge if there's a way to make everyone happy here or if I should just keep using my fork of Ghidra on my own. I didn't insist before because I figured I'd need to do my homework first by publicly documenting what I was doing (took me much longer to do that than I anticipated).

(I wasn't actually requesting to merge this PR as-is, but rather aiming to restart this conversation about a mutable relocation table.)

The Original Binary Exporter will apply changes made to the program while also reverting applied relocations (i.e., unlinking). This exporter is intended for simple patching cases only and would not handle patching of instructions where relocations were applied or the addition of memory blocks.

Hmm... If we want the Original Binary Exporter to work correctly after modifying the relocation table, then we need to keep the original relocations no matter what. We'll need a new relocation status to denote a relocation that came after the initial program loading and modify the Original Binary Exporter to ignore those. My own tooling could work with this with a bunch of flags to ignore/override load-time relocations as needed.

Maybe we'll also need a new column to store the relocation's file type (ELF, Mach-O...). I'm actually reconstructing ELF-style relocations on what is essentially an a.out executable, but it's statically-linked so I don't have any load-time relocations. But I can see two ways of ending up with mixed relocation file types in the relocation table:

  1. Loading a relocatable file (which would generate load-time relocations) and then synthesizing relocations for another file format like I do ;
  2. Synthesizing relocations for different file formats at the same time, if I want for example to export both Mach-O and ELF object files for some reason.

This is probably not something that we have to worry about right now though. Also, opening a new discussion would probably be a better place to debate all of this than reusing a six month old closed PR.

This exporter is intended for simple patching cases only and would not handle ... the addition of memory blocks.

Note that I've added memory blocks in part 8 for demonstration purposes only. In my real workflow, my ELF object file exporter doesn't actually modify the program, I synthesize everything I need on-the-fly based on the contents of the program's database.

@boricj
Copy link
Author

boricj commented Aug 7, 2023

Thank you for the explanation. I think you've already answered this, but I'll ask again to be sure. Do you want the relocation table to be modifiable simply to avoid having to make another map of Java objects that represent your new relocations? You talk about not wanting to duplicate what the relocation is already doing, but what it is doing is somewhat trivial...just storing some addresses and metadata.

There are several reasons why I think using the relocation table to store that data is the best option:

  • If I import an ELF object file, then I can actually start exporting ELF object files directly without any intermediate steps. The relocation table already contains all the data I need, so this tells me it's the right place to store it.
  • I'm reconstructing relocations and their original bytes, which is what the relocation table already stores. It's just that these relocations are not directly for the program itself but rather for the object files one might export at a later date.
  • I want to be able to audit the relocations I recreate, so I need a way to display them. Adding a Window > Relocation Table for Object Files menu item next to the Window > Relocation Table for a table with the same columns seems silly.
  • I'd like these relocations to persist when I close Ghidra and I didn't want to duplicate the whole relocation table code stack just to store and manipulate them like relocations.
  • If it's not stored in the relocation table, then it's not composable with any plugins or code snippets that use the relocation table. If it's stored somewhere private, then it's not composable with anything besides my own stuff.

I might be wrong because I've been eyeballing the whole thing since the beginning (I didn't exactly plan all of this ahead of time) and I'd like to be proven wrong, but I haven't yet found a compelling reason not to put that data in the relocation table (besides the fact that it's currently not built for mutability after initial loading, but that can be solved with improvements to it).

@ryanmkurtz
Copy link
Collaborator

ryanmkurtz commented Aug 7, 2023

I'm reconstructing relocations and their original bytes, which is what the relocation table already stores. It's just that these relocations are not directly for the program itself but rather for the object files one might export at a later date.

To me, it seems wrong to store relocations that aren't for the current program in the current program's relocation table. I feel like a separate table should be created and managed by the 3rd party extension that is providing the export capability.

@boricj
Copy link
Author

boricj commented Aug 8, 2023

I'm reconstructing relocations and their original bytes, which is what the relocation table already stores. It's just that these relocations are not directly for the program itself but rather for the object files one might export at a later date.

To me, it seems wrong to store relocations that aren't for the current program in the current program's relocation table. I feel like a separate table should be created and managed by the 3rd party extension that is providing the export capability.

After giving this some thought, now I fear that properly fleshing out that whole unlinking stuff to its logical conclusion, so that we can get to the bottom of this argument, would involve falling down a very, very deep rabbit hole.

I've already spent an absurd amount of time making this hack work with statically-linked PS-EXE (and ELF) executables to ELF object files, just for the 32-bit little-endian MIPS architecture and O32 ABI ; generalizing this hack into something industrial-grade (an unlinking theory and implementation suitable for arbitrary file formats, instruction sets and ABIs) to get the model right inside Ghidra's framework is starting to sound more and more like a full-blown thesis.

Instead, I'll take the path of least resistance and stuff my bespoke data elsewhere private, because I no longer have any idea what the relocation table (or the whole model for that matter) should look like in that generalized context. Thank you for your insights, I'll make sure to create a show-and-tell discussion if there are new developments on that front.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reason: Won't support This will not be supported by the Ghidra team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants