Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CYT 341 Add basic Mach-O file info extractor #184

Merged
merged 54 commits into from
Oct 7, 2024
Merged

Conversation

Czatar
Copy link
Collaborator

@Czatar Czatar commented Apr 21, 2024

Summary

If merged this pull request will add an info extractor for Mach-O files

Proposed changes

For now, all "MACHO" file types are covered, but I haven't looked into splitting these into cases (line 17).
Out of the ones I've looked at, the LIEF library looks more promising than macholib, macholibre, machofile, and kaitaistruct because of the ability to easily iterate through the various binaries in the FAT file.

@Czatar
Copy link
Collaborator Author

Czatar commented Apr 25, 2024

Sample output so far:

{
      "OS": "MacOS",
      "numBinaries": 1,
      "binaries": [
        {
          "format": "MACHO",
          "cpuType": "x86_64",
          "cpuSubtype": 2147483651,
          "fileType": "EXECUTE",
          "flags": [
            "TWOLEVEL",
            "NOUNDEFS",
            "DYLDLINK",
            "PIE"
          ],
          "numCommands": 16
         }
      ]
}

Haven't tested on those that have multiple binaries, but it's set up to output a list of binaries in the "binaries" key

@nightlark nightlark added the enhancement New feature or request label May 6, 2024
@Czatar
Copy link
Collaborator Author

Czatar commented May 9, 2024

I've been using https://github.com/JonathanSalwan/binary-samples to test. Let me know if there are any more good mach-o files to run this on.

@Czatar
Copy link
Collaborator Author

Czatar commented May 29, 2024

out.txt
This is a sample output file after testing it on the 7 Mach-O files in here and main_sha256 from here.

@nightlark
Copy link
Collaborator

Can you also generate an SBOM from running it in the files from this HELICS archive? https://github.com/GMLC-TDC/HELICS/releases/download/v3.5.2/Helics-3.5.2-macOS-universal2.zip

@Czatar
Copy link
Collaborator Author

Czatar commented May 30, 2024

Can you also generate an SBOM from running it in the files from this HELICS archive? https://github.com/GMLC-TDC/HELICS/releases/download/v3.5.2/Helics-3.5.2-macOS-universal2.zip

output.zip
The process took about 3 minutes and the output file is almost 23 MB large. The longest part was waiting for LIEF to load the files into Binary objects.

I haven't tested much with large files and entire projects. Is this roughly how long it should take? From the output, the bindings and exports are by far the largest sections. Should including that be a user configurable setting as well or keep it in?

@nightlark
Copy link
Collaborator

nightlark commented May 30, 2024

Looking at the output, I think the bindings and exports are definitely things that should be a user configurable setting that is off by default -- my take from looking at them is that the addresses alone aren't very useful without knowing what symbol they resolve to.

For the Linux and Windows HELICS releases, Surfactant takes less than a second to run -- macOS universal binaries are basically two binaries in one so I'd expect it to take longer, but probably still less than 2 seconds.


I did notice issues in the lief repository about performance regressions. As a test, I ran pip install lief==0.13.2 so I wouldn't get the latest 0.14 release -- with bindings and exports it took 10 seconds, without bindings and exports it took 1.6 seconds. With the 0.14.1 release of lief, the same bindings + exports SBOM took 2 minutes, 50 seconds... close to 20x slower.

I think the quick option is to pin lief to version 0.13.2, and replace use of __name__ attributes with name, though it should probably be made an optional dependency because lief doesn't provide a source distribution, and the binary distributions are specific to specific Python versions (+ architectures/OSes). I get that the author is trying to minimize issues about lief not working on unsupported systems, but in doing so created some (IMO) major issues:

  • new versions of Python (e.g. 3.13) won't work until they upload new binary distributions built for that new version (and old lief versions will probably never get update wheels)
  • users are forced to upgrade their Python version to use newer lief versions when lief drops support for an older Python version (e.g. 3.8)
  • if the lief maintainer(s) ever become inactive for an extended length of time, the lief Python bindings will gradually stop working as users upgrade to new Python versions, but lief doesn't release new binary distributions (there was a ~1 year gap in development around 2020, and it looks like there may only be a single maintainer that can publish updates)

@nightlark nightlark self-requested a review June 24, 2024 20:14
@nightlark nightlark marked this pull request as ready for review June 24, 2024 20:14
@nightlark nightlark requested a review from shaynakapadia July 15, 2024 20:18
nightlark and others added 19 commits July 22, 2024 13:00
Co-authored-by: Shayna Kapadia <shaynahkapadia@gmail.com>
…ch-O support, and add links to the settings documentation page.
@nightlark nightlark merged commit 43009a2 into main Oct 7, 2024
13 checks passed
@nightlark nightlark deleted the CYT-341-mach-o-support branch October 7, 2024 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants