Skip to content

Feature: UNIX A.out Loader#5004

Merged
ryanmkurtz merged 1 commit intoNationalSecurityAgency:masterfrom
colinbourassa:unix-aout-loader
Mar 20, 2025
Merged

Feature: UNIX A.out Loader#5004
ryanmkurtz merged 1 commit intoNationalSecurityAgency:masterfrom
colinbourassa:unix-aout-loader

Conversation

@colinbourassa
Copy link
Contributor

This is a feature addition -- it adds a Loader for the old-style UNIX A.out format, which was used by BSD, SunOS, VxWorks, and other UNIX derivatives. This Loader was tested with a number of different A.out files across several architectures, although there are many combinations so future work may be required to address specific combinations of processor, executable type, and OS that require a different file offset (or load address) for .text/.data.

This would close issue #4943.

@ryanmkurtz
Copy link
Collaborator

Thanks! Would it be possible for you to attach some sample binaries to aid in our testing?

@ryanmkurtz ryanmkurtz self-assigned this Feb 18, 2023
@ryanmkurtz ryanmkurtz added Feature: Loader Status: Triage Information is being gathered labels Feb 18, 2023
@colinbourassa
Copy link
Contributor Author

@ryanmkurtz - Certainly. The attached archive contains three different A.out examples:

  • NetBSD executable for i386
  • NetBSD executable for SPARC
  • VxWorks object file for MC68020
    • This one is dynamically loaded by executable code in a VxWorks monolith, and therefore depends on a number of external symbols that will not be found when it is loaded in Ghidra. The Loader generates log entries notifying the user of this.

aout-test-files.zip

@ryanmkurtz
Copy link
Collaborator

Thanks!

@nightlark
Copy link

nightlark commented Mar 29, 2023

@ryanmkurtz - Certainly. The attached archive contains three different A.out examples:

  • NetBSD executable for i386

  • NetBSD executable for SPARC

  • VxWorks object file for MC68020

    • This one is dynamically loaded by executable code in a VxWorks monolith, and therefore depends on a number of external symbols that will not be found when it is loaded in Ghidra. The Loader generates log entries notifying the user of this.

aout-test-files.zip

The C3413 (Green CPU) firmware from https://www.sage-rtu.com/downloads.html is another example of a file in the a.out format (little-endian 32-bit for x86/VxWorks) -- the loader in this PR does not currently recognize it as an a.out file.

@colinbourassa
Copy link
Contributor Author

@nightlark - Thanks very much for the suggestions. That all looks very reasonable so I've incorporated it in a new commit for this PR. During testing of the C3413 Green CPU firmware, I also found that my loader was not generating a .bss block when the A.out indicates a nonzero .bss size. This has also been resolved.

@jobermayr
Copy link
Contributor

diff --git a/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/UnixAoutLoader.java b/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/UnixAoutLoader.java
index 41ff1cc7a..c165ebe2e 100644
--- a/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/UnixAoutLoader.java
+++ b/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/UnixAoutLoader.java
@@ -25,6 +25,7 @@ import ghidra.app.util.bin.ByteProvider;
 import ghidra.app.util.bin.format.aout.UnixAoutHeader.ExecutableType;
 import ghidra.app.util.importer.MessageLog;
 import ghidra.app.util.opinion.AbstractProgramWrapperLoader;
+import ghidra.app.util.opinion.Loader;
 import ghidra.app.util.opinion.LoadSpec;
 import ghidra.framework.store.LockException;
 import ghidra.program.flatapi.FlatProgramAPI;

Fixes:

> Task :createJavadocs
/tmp/ghidra/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/UnixAoutLoader.java:50: warning: Tag @link: reference not found: Loader
 * A {@link Loader} for processing UNIX-style A.out executables
     ^
/tmp/ghidra/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/UnixAoutLoader.java:50: warning: Tag @link: reference not found: Loader
 * A {@link Loader} for processing UNIX-style A.out executables
     ^
/tmp/ghidra/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/UnixAoutLoader.java:50: warning: Tag @link: reference not found: Loader
 * A {@link Loader} for processing UNIX-style A.out executables
     ^
/tmp/ghidra/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/UnixAoutLoader.java:50: warning: Tag @link: reference not found: Loader
 * A {@link Loader} for processing UNIX-style A.out executables
     ^
4 warnings

@nightlark
Copy link

nightlark commented Apr 18, 2023

The .bss fix seems to be causing an issue with Ghidra identifying some symbols that are located it that section -- specifically, it seems now two overlapping .bss sections are being created (one read-only and the other marked as read+write), which I guess is confusing Ghidra so it doesn't show the symbol names in the decompiler/disassembly views whereas before the .bss change it was.

@colinbourassa
Copy link
Contributor Author

The .bss fix seems to be causing an issue with Ghidra identifying some symbols that are located it that section -- specifically, it seems now two overlapping .bss sections are being created (one read-only and the other marked as read+write), which I guess is confusing Ghidra so it doesn't show the symbol names in the decompiler/disassembly views whereas before the .bss change it was.

I think I see the problem. Previously, I was creating the .bss section simply based on the total size of all the symbols that were marked N_UNDF in the symbol table, as this is apparently an indication that they should be given a location in .bss at the next available address. I neglected to change that logic when I added the fixed-size .bss allocation, so that explains why multiple overlapping sections are being created.

It seems that we probably need a hybrid of these approaches: .bss may need to be created after summing the sizes of the assigned symbols in that section, with that size being added to the .bss size given in the header. The dynamically allocated symbols (N_UNDF) would then be placed toward the end of the section (after the fixed size given in the header) while the .bss symbols that are given explicit addresses can be labeled at those addresses.

@nightlark
Copy link

Does the .bss segment size defined in the file include enough "empty" space that the N_UNDF files might fit in it? I also noticed that reserving the space in bss only happens if n_value is nonzero, but didn't see any checks for the size when creating the label for entries in possibleBssSymbols.

Given how N_UNDF seems to be used for linking to external symbols, it would be interesting if eventually other libraries could be loaded and those symbols could be linked correctly.

@colinbourassa
Copy link
Contributor Author

colinbourassa commented Apr 20, 2023

With this latest commit, the intended concept for handling .bss is as follows:

  • Read the fixed .bss size from the file's header, but don't create .bss yet.
  • Walk through the symbol table, using hash tables to keep track of symbols that have either of these flags:
    • N_BSS: a symbol that will be in the .bss section
    • N_UNDF: a symbol that may be in the .bss section (only if it does not appear in the relocation table as a result of being supplied by an external library)
  • Compute a total size for all the N_UNDF symbols and add this to the .bss size given in the header. This is the maximum size that .bss would need to be to accommodate everything.
  • Create .bss as an uninitialized block with this newly computed size.
  • Walk through the list of N_BSS symbols that we created earlier, and place the label for each one at the given address in .bss.
  • Create a "next available address" variable that will keep track of the first unallocated location in .bss beyond the initial fixed-size block. (This variable will be initialized with the .bss size from the header.)
  • While walking through the relocation tables, use .bss space for any N_UNDF symbol that was not found in any global or local symbol table (i.e. any symbol that was not provided by another binary that was previously loaded.) Increment the "next available address" as we go.

I believe this approach will allow any combination of .bss symbols that are given an explicit location, and .bss symbols that need to be dynamically allocated to the first available space in .bss.

The auto-formatter in Eclipse made a mess of some long lines in these modules. I need to find a Ghidra style guide and/or the right beautifier settings to fix this.

@nneonneo
Copy link
Contributor

Another a.out file, this time for Linux-i386. It's the Netscape 0.96 Beta, as found in PlaidCTF 2020 (https://github.com/bluepichu/ctf-challenges/blob/master/plaidctf-2020/back_to_the_future/problem/worker/files/netscape).

netscape.zip

@nightlark
Copy link

I built a copy based on the latest changes -- it looked like it may still be struggling to create a .bss section correctly; from the C3413 test file, there is an a.out file named vxWorks in the OperatingSystem subfolder. The a.out loader printed a message saying it failed to create the .bss section (I believe in one of the earlier iterations several symbols were created in the .bss section), and I think the application firmware a.out file .bss section symbols might not be getting created now.

@nneonneo
Copy link
Contributor

For what it's worth: my netscape binary is not loaded correctly. According to the Linux a.out loader, the ZMAGIC format will not include the first page (0x400 bytes) in the loaded image, i.e. this.txtOffset should be 0x400, not 0. The incorrect offset breaks string references and such (they're off by 0x400).

@colinbourassa
Copy link
Contributor Author

For what it's worth: my netscape binary is not loaded correctly. According to the Linux a.out loader, the ZMAGIC format will not include the first page (0x400 bytes) in the loaded image, i.e. this.txtOffset should be 0x400, not 0. The incorrect offset breaks string references and such (they're off by 0x400).

UnixAoutHeader::determineTextOffset() was returning 0 for the file offset to the .text section content -- it was incorrectly assuming that all ZMAGIC files started their .text content at file offset 0. I think this error accounts for the 0x400 byte displacement you were seeing.

I don't know of an "official" way to differentiate between the UNIX/Solaris ZMAGIC executables that start their .text at file offset 0, and the Linux ZMAGIC executables that start their .text at file offset 0x400. This netscape binary is the first example I've seen of the Linux style.

The netscape binary lists a .text size of 0x199000 and a .data size of 0x89000, which agree with the file size as long as we allow the header to be padded out to the 0x400 boundary (which certainly appears to be the case), plus a single 32-bit word containing the value 4 at the end of the file. This last word is the size of the symbol string table including the word itself, so a string data length of zero.

I'm working on a solution that involves checking the file size against the total size of all the sections given by the header. If it's determined to be the Linux style, I will start reading the .text content from the file at 0x400 rather than 0. I should have time within the next few days to push an update.

@colinbourassa
Copy link
Contributor Author

colinbourassa commented Apr 30, 2023

I built a copy based on the latest changes -- it looked like it may still be struggling to create a .bss section correctly; from the C3413 test file, there is an a.out file named vxWorks in the OperatingSystem subfolder. The a.out loader printed a message saying it failed to create the .bss section (I believe in one of the earlier iterations several symbols were created in the .bss section), and I think the application firmware a.out file .bss section symbols might not be getting created now.

I tried loading the file you mentioned and I see the same problem. It's caused by the symbol table being read improperly; specifically, the Loader was defaulting unknown symbol types to N_UNDF, which are treated as candidates for inclusion in .bss when their n_value is nonzero. Defaulting to N_UNDF is a bug, because it causes debugger-specific symbols to be incorrectly counted among these .bss potentials. The n_value fields for the debugger symbols are typically addresses, but were being accumulated for the required size of .bss (which ended up being larger than Ghidra could allocate, causing the failure you described.)

I will make an update to skip the symbol table entries for which n_type & 0xe0 != 0, as this indicates a stab symbol (debugger only.)

Edit: furthermore, this file highlights the possibility of the object code being loaded at an address other than the conventional 0, PAGE_SIZE, or 0x400. With a VxWorks installation set up to use a flat memory model, the content of the a.out will certainly need to be placed higher in memory. This particular file appears to be loaded at 0x00108000. I will probably be adding an option to the Loader to change the base address manually, as that information is not included in the a.out file.

@nneonneo
Copy link
Contributor

nneonneo commented May 3, 2023

Thinking about the design a bit: perhaps we could add options for specifying the load address, and a drop-down for a “variant” when ambiguous? For example, if ZMAGIC is detected, you could provide a drop-down to pick Linux ZMAGIC or UNIX ZMAGIC, etc., defaulting to whichever one makes the file size work out. I feel like, given the number of different a.out implementations in existence, a variant selector will probably ultimately be necessary.

I would personally prefer making this configurable when trying to apply heuristics that may not be precise, e.g. because the file is padded or contains extra data etc.

@colinbourassa
Copy link
Contributor Author

I'm not sure we need an option for Linux vs UNIX ZMAGIC. The difference is not explicit in the header -- they're both just called ZMAGIC (0x10B) -- but the filesize will conclusively determine which of the two ZMAGIC variants is being used. Also, we do not have drop-down lists available for Loader options. This was proposed by issue #1157 but has not yet been implemented.

If, in the future, we come across some A.out variants that introduce ambiguity which can't be resolved by analyzing the file, then I absolutely agree that a user-settable option field should be added at that point.

It definitely makes sense to add an option for the base load address, though. Since the base address can't be determined by analyzing the file header/contents, it makes sense to give the user the option to provide it. In most cases, it can be left at the default of 0 (i.e. when analyzing binaries intended for OSes with protected mode addressing) but it will be useful on systems that use a global address space. I tested it out on this vxworks binary, and the symbol table entries match up nicely to functions in .text when loading to a base of 0x00108000.

@colinbourassa
Copy link
Contributor Author

Latest commit automatically differentiates between Linux and UNIX style ZMAGIC, and also adds a "base address" option for loading.

@nneonneo
Copy link
Contributor

nneonneo commented May 4, 2023

@colinbourassa works for the netscape binary!

@boricj
Copy link

boricj commented Sep 7, 2023

Another set of artifact this can be tested against would be Debian Buzz (1.1)'s a.out toolchain files. These can be found at https://archive.debian.org/debian/dists/buzz/main/binary-i386/devel/ and Debian packages can be safely extracted with dpkg --extract file.deb dir/ (I wouldn't recommend trying to install these fossils from the 90's onto a recent Debian installation).

Note that trying to import /usr/i486-linuxaout/lib/libc.a from libc4-dev-4.6.27-15.deb for example won't work without a bunch of fixes I've submitted in this PR: colinbourassa#1

@boricj
Copy link

boricj commented Oct 5, 2023

I've made further improvements to the a.out loader in this PR: colinbourassa#2

I've used that version to successfully load an a.out object file, specifically an amalgamation of Slackware 2.3's libc.a which I've then used as a source for Ghidra's Version Tracking tool while reverse-engineering old a.out executables. Details (and the files) can be found in this forum thread: https://forums.atariage.com/topic/354341-porting-the-original-atari-jaguar-sdk-to-elf/

@calmsacibis995
Copy link

This improved a.out loader works on a custom build that I made.

@colinbourassa
Copy link
Contributor Author

I've been successfully using this A.out loader for a while now. @ryanmkurtz - could this be considered for merging back to master?

@ryanmkurtz ryanmkurtz added Status: Prioritize This is currently being prioritized and removed Status: Triage Information is being gathered labels Feb 25, 2025
@ryanmkurtz
Copy link
Collaborator

I'll discuss it with the team this week.

@ryanmkurtz ryanmkurtz added Status: Internal This is being tracked internally by the Ghidra team and removed Status: Prioritize This is currently being prioritized labels Feb 27, 2025
@ryanmkurtz
Copy link
Collaborator

@colinbourassa Can you rebase and force push this? I'm ready to start taking a look at integrating it. Thanks!

@ryanmkurtz
Copy link
Collaborator

Actually a squash and a rebase would be ideal, so it removes the ticket references in the commit messages. I can also do this stuff too if you'd prefer, and you don't mind me rewriting your commit (you'd still be author).

@colinbourassa
Copy link
Contributor Author

Still need to sync my fork and rebase. I should be able to get this done later today.

With fixes/improvements from Jean-Baptiste Boric:
* fix package declarations
* don't special-case defined symbols with zero value
a.out object files can define symbols at the very start of a section.
* mark undefined symbols with non-zero value as bss candidates
* use FSRL to get filename

This is required when invoking loaders on subsets of files, such as
bulk-importing object files from static archives.

* don't use filename in memory block names
* reformat Unix Aout loader
* rename UnixAoutRelocation class
* rename UnixAoutSymbol class
* rework Unix Aout loader
@colinbourassa
Copy link
Contributor Author

@ryanmkurtz - sync'd, squashed, and rebased

@ryanmkurtz
Copy link
Collaborator

Thanks, taking a look today. I think the main thing i may want to add is using the .opinion files to help in the selection of processing/compiler spec, if possible.

public UnixAoutHeader(ByteProvider provider, boolean isLittleEndian) throws IOException {
this.reader = new BinaryReader(provider, isLittleEndian);

this.a_magic = reader.readNextUnsignedInt();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it ever make sense to read the magic as little endian? The spec says it's always big endian.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some systems ostensibly use little endian ordering for the magic bytes: #5004 (comment)

Address nextFreeAddress = defaultAddressSpace.getAddress(0);

if (header.getTextOffset() != 0 || header.getTextSize() < 32) {
addInitializedMemorySection(null, 0, 32, otherAddress, "_aoutHeader", false, false, false, null, false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you find that you actually needed the MemorySectionResolver, or were you just modeling things off of the ElfLoader? If you just called MemoryBlockUtils.createInitializedBlock() directly here, were you running into conflicts? If there was not real issue with conflicts, I'll probably remove the resolver for simplicity. That was really only introduced because ELF is so complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember there being any conflicts that drove this choice. It's been a while, but I recall modeling this off of another loader, so yes, removing the resolver for simplicity should be OK.

@ryanmkurtz ryanmkurtz added this to the 11.4 milestone Mar 19, 2025
@ryanmkurtz ryanmkurtz linked an issue Mar 19, 2025 that may be closed by this pull request
@ryanmkurtz ryanmkurtz merged commit c9ab679 into NationalSecurityAgency:master Mar 20, 2025
ryanmkurtz added a commit that referenced this pull request Mar 20, 2025
'origin/GP-3182_ryanmkurtz_PR-5004_colinbourassa_unix-aout-loader'
(Closes #4943, Closes #5004)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature: Loader Status: Internal This is being tracked internally by the Ghidra team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A.out format support

7 participants