Feature: UNIX A.out Loader#5004
Conversation
|
Thanks! Would it be possible for you to attach some sample binaries to aid in our testing? |
|
@ryanmkurtz - Certainly. The attached archive contains three different A.out examples:
|
|
Thanks! |
The C3413 (Green CPU) firmware from https://www.sage-rtu.com/downloads.html is another example of a file in the a.out format (little-endian 32-bit for x86/VxWorks) -- the loader in this PR does not currently recognize it as an a.out file. |
Ghidra/Features/Base/src/main/java/ghidra/app/util/bin/format/unixaout/UnixAoutMachineType.java
Outdated
Show resolved
Hide resolved
Ghidra/Features/Base/src/main/java/ghidra/app/util/bin/format/unixaout/UnixAoutHeader.java
Outdated
Show resolved
Hide resolved
Ghidra/Features/Base/src/main/java/ghidra/app/util/bin/format/unixaout/UnixAoutHeader.java
Outdated
Show resolved
Hide resolved
Ghidra/Features/Base/src/main/java/ghidra/app/util/bin/format/unixaout/UnixAoutHeader.java
Outdated
Show resolved
Hide resolved
Ghidra/Features/Base/src/main/java/ghidra/app/util/bin/format/unixaout/UnixAoutHeader.java
Outdated
Show resolved
Hide resolved
Ghidra/Features/Base/src/main/java/ghidra/app/util/bin/format/unixaout/UnixAoutHeader.java
Outdated
Show resolved
Hide resolved
Ghidra/Features/Base/src/main/java/ghidra/app/util/bin/format/unixaout/UnixAoutHeader.java
Outdated
Show resolved
Hide resolved
0d2ead5 to
25e7e11
Compare
|
@nightlark - Thanks very much for the suggestions. That all looks very reasonable so I've incorporated it in a new commit for this PR. During testing of the C3413 Green CPU firmware, I also found that my loader was not generating a .bss block when the A.out indicates a nonzero .bss size. This has also been resolved. |
Fixes: |
25e7e11 to
b6200f5
Compare
|
The .bss fix seems to be causing an issue with Ghidra identifying some symbols that are located it that section -- specifically, it seems now two overlapping .bss sections are being created (one read-only and the other marked as read+write), which I guess is confusing Ghidra so it doesn't show the symbol names in the decompiler/disassembly views whereas before the .bss change it was. |
I think I see the problem. Previously, I was creating the .bss section simply based on the total size of all the symbols that were marked N_UNDF in the symbol table, as this is apparently an indication that they should be given a location in .bss at the next available address. I neglected to change that logic when I added the fixed-size .bss allocation, so that explains why multiple overlapping sections are being created. It seems that we probably need a hybrid of these approaches: .bss may need to be created after summing the sizes of the assigned symbols in that section, with that size being added to the .bss size given in the header. The dynamically allocated symbols (N_UNDF) would then be placed toward the end of the section (after the fixed size given in the header) while the .bss symbols that are given explicit addresses can be labeled at those addresses. |
|
Does the .bss segment size defined in the file include enough "empty" space that the N_UNDF files might fit in it? I also noticed that reserving the space in bss only happens if Given how N_UNDF seems to be used for linking to external symbols, it would be interesting if eventually other libraries could be loaded and those symbols could be linked correctly. |
66d3dd4 to
7011702
Compare
|
With this latest commit, the intended concept for handling .bss is as follows:
I believe this approach will allow any combination of .bss symbols that are given an explicit location, and .bss symbols that need to be dynamically allocated to the first available space in .bss. The auto-formatter in Eclipse made a mess of some long lines in these modules. I need to find a Ghidra style guide and/or the right beautifier settings to fix this. |
d7869a0 to
e218340
Compare
|
Another |
|
I built a copy based on the latest changes -- it looked like it may still be struggling to create a |
|
For what it's worth: my |
I don't know of an "official" way to differentiate between the UNIX/Solaris ZMAGIC executables that start their .text at file offset 0, and the Linux ZMAGIC executables that start their .text at file offset 0x400. This The I'm working on a solution that involves checking the file size against the total size of all the sections given by the header. If it's determined to be the Linux style, I will start reading the .text content from the file at 0x400 rather than 0. I should have time within the next few days to push an update. |
I tried loading the file you mentioned and I see the same problem. It's caused by the symbol table being read improperly; specifically, the Loader was defaulting unknown symbol types to I will make an update to skip the symbol table entries for which Edit: furthermore, this file highlights the possibility of the object code being loaded at an address other than the conventional 0, PAGE_SIZE, or 0x400. With a VxWorks installation set up to use a flat memory model, the content of the a.out will certainly need to be placed higher in memory. This particular file appears to be loaded at 0x00108000. I will probably be adding an option to the Loader to change the base address manually, as that information is not included in the a.out file. |
|
Thinking about the design a bit: perhaps we could add options for specifying the load address, and a drop-down for a “variant” when ambiguous? For example, if ZMAGIC is detected, you could provide a drop-down to pick Linux ZMAGIC or UNIX ZMAGIC, etc., defaulting to whichever one makes the file size work out. I feel like, given the number of different a.out implementations in existence, a variant selector will probably ultimately be necessary. I would personally prefer making this configurable when trying to apply heuristics that may not be precise, e.g. because the file is padded or contains extra data etc. |
|
I'm not sure we need an option for Linux vs UNIX ZMAGIC. The difference is not explicit in the header -- they're both just called ZMAGIC (0x10B) -- but the filesize will conclusively determine which of the two ZMAGIC variants is being used. Also, we do not have drop-down lists available for Loader options. This was proposed by issue #1157 but has not yet been implemented. If, in the future, we come across some A.out variants that introduce ambiguity which can't be resolved by analyzing the file, then I absolutely agree that a user-settable option field should be added at that point. It definitely makes sense to add an option for the base load address, though. Since the base address can't be determined by analyzing the file header/contents, it makes sense to give the user the option to provide it. In most cases, it can be left at the default of 0 (i.e. when analyzing binaries intended for OSes with protected mode addressing) but it will be useful on systems that use a global address space. I tested it out on this |
|
Latest commit automatically differentiates between Linux and UNIX style ZMAGIC, and also adds a "base address" option for loading. |
|
@colinbourassa works for the netscape binary! |
2fc2436 to
2e88353
Compare
2e88353 to
479d8fe
Compare
|
Another set of artifact this can be tested against would be Debian Buzz (1.1)'s a.out toolchain files. These can be found at https://archive.debian.org/debian/dists/buzz/main/binary-i386/devel/ and Debian packages can be safely extracted with Note that trying to import |
92e6284 to
8182f99
Compare
|
I've made further improvements to the a.out loader in this PR: colinbourassa#2 I've used that version to successfully load an a.out object file, specifically an amalgamation of Slackware 2.3's |
|
This improved a.out loader works on a custom build that I made. |
|
I've been successfully using this A.out loader for a while now. @ryanmkurtz - could this be considered for merging back to master? |
|
I'll discuss it with the team this week. |
|
@colinbourassa Can you rebase and force push this? I'm ready to start taking a look at integrating it. Thanks! |
|
Actually a squash and a rebase would be ideal, so it removes the ticket references in the commit messages. I can also do this stuff too if you'd prefer, and you don't mind me rewriting your commit (you'd still be author). |
d183a71 to
26bdbf6
Compare
|
Still need to sync my fork and rebase. I should be able to get this done later today. |
With fixes/improvements from Jean-Baptiste Boric: * fix package declarations * don't special-case defined symbols with zero value a.out object files can define symbols at the very start of a section. * mark undefined symbols with non-zero value as bss candidates * use FSRL to get filename This is required when invoking loaders on subsets of files, such as bulk-importing object files from static archives. * don't use filename in memory block names * reformat Unix Aout loader * rename UnixAoutRelocation class * rename UnixAoutSymbol class * rework Unix Aout loader
26bdbf6 to
c9ab679
Compare
|
@ryanmkurtz - sync'd, squashed, and rebased |
|
Thanks, taking a look today. I think the main thing i may want to add is using the |
| public UnixAoutHeader(ByteProvider provider, boolean isLittleEndian) throws IOException { | ||
| this.reader = new BinaryReader(provider, isLittleEndian); | ||
|
|
||
| this.a_magic = reader.readNextUnsignedInt(); |
There was a problem hiding this comment.
Does it ever make sense to read the magic as little endian? The spec says it's always big endian.
There was a problem hiding this comment.
Some systems ostensibly use little endian ordering for the magic bytes: #5004 (comment)
| Address nextFreeAddress = defaultAddressSpace.getAddress(0); | ||
|
|
||
| if (header.getTextOffset() != 0 || header.getTextSize() < 32) { | ||
| addInitializedMemorySection(null, 0, 32, otherAddress, "_aoutHeader", false, false, false, null, false, |
There was a problem hiding this comment.
Did you find that you actually needed the MemorySectionResolver, or were you just modeling things off of the ElfLoader? If you just called MemoryBlockUtils.createInitializedBlock() directly here, were you running into conflicts? If there was not real issue with conflicts, I'll probably remove the resolver for simplicity. That was really only introduced because ELF is so complicated.
There was a problem hiding this comment.
I don't remember there being any conflicts that drove this choice. It's been a while, but I recall modeling this off of another loader, so yes, removing the resolver for simplicity should be OK.
This is a feature addition -- it adds a Loader for the old-style UNIX A.out format, which was used by BSD, SunOS, VxWorks, and other UNIX derivatives. This Loader was tested with a number of different A.out files across several architectures, although there are many combinations so future work may be required to address specific combinations of processor, executable type, and OS that require a different file offset (or load address) for .text/.data.
This would close issue #4943.