Dylib Derivation & Scanning #77

fosterbrereton · 2024-05-01T19:03:29Z

This PR adds a new mode to ORC, called "dylib scanning" or "dylib derivation" mode. This mode allows ORC to examine the dependencies of a post-link artifact, scanning for ODRVs between the host executable and its dylibs as best it can.

Dylib scanning mode should be seen as a compliment to and not a replacement of ORC's "classic" mode. Any final artifact is made "ODR clean" by the linker, which assumes ODR was upheld during the creation process. Thus, any ODRVs within an artifact cannot be discovered by this mode. However, ORC's "classic" mode does not scan across final artifacts (e.g., an executable and the dylibs it depends upon), which is a common distribution method for applications. Therefore, this mode is the second of a one-two punch intended to more thoroughly detect ODR violations in an application.

…es `LC_SYMTAB` to find the related object files, aka the debug map.

src/fat.cpp

…b scanning

fosterbrereton · 2024-05-03T19:50:30Z

include/orc/async.hpp

+// Enqueue a task for (possibly asynchronous) execution. If the `parallel_processing` setting in the
+// ORC config file is true, the task will be enqueued for processing on a background thread pool.
+// Otherwise, the task will be executed immediately in the current thread.
+void do_work(std::function<void()>);


We were passing do_work all over the code base as the scheduler for any ORC task we may want to do asynchronously. (With the parallel_processing setting set to false, the tasks are run immediately- that's how we're able to implement that switch.)

Given that do_work is the only scheduler we were using, I have removed it as a callback (more on the death of callbacks in a separate comment) and put it into its own header that the sources can reference themselves. This has made our asynchronous story much simpler, in terms of how we, you know, do work.

fosterbrereton · 2024-05-03T19:51:30Z

include/orc/orc.hpp

-std::vector<odrv_report> orc_process(const std::vector<std::filesystem::path>&);
+std::vector<odrv_report> orc_process(std::vector<std::filesystem::path>&&);
+
+namespace orc {


I've started putting ORC calls into an orc namespace. We really need to address this more completely as part of issue #20.

fosterbrereton · 2024-05-03T19:55:27Z

include/orc/parse_file.hpp

+    odrv_reporting,
+};
+
+struct macho_params {


macho_params surrenders to the fact that all this file scanning, in the end, is Mach-O based. At one point I was writing ORC more from the perspective that it shouldn't matter what the ABI is and the ORC file scanners should be more generic. Perhaps I was thinking we'd be able to incorporate Windows more easily at some point in the future? Who knows.

At any rate, the added genericity resulted in passing around the callbacks block everywhere, as some kind of pseudo-delegate pattern that wasn't very well thought through. I have since removed it and replaced it with macho_params. This is a parameter block that is created at the top level orc_process call and is sent into the file scanning system to tell the Mach-O parser exactly what it should be looking out for. Hopefully this setup is clearer and more extensible than the callbacks we had before.

I like the idea of windows support, but at this point, I think that would be more about abstracting out the ORC core from the parser. ORC would begin to look more like a multi-platform app, where you have the "core" parts and the "platform/compiler" parts. At that point, we would need to abstract the macho params.

But it's all very theoretical - especially w/o a MSVC implementation, so I think the approach of streamlining what we actually have is the correct one.

fosterbrereton · 2024-05-03T19:56:34Z

justfile

@@ -0,0 +1,31 @@
+# For documentation on `just`, see https://github.com/casey/just


I just (ha!) discovered the just tool. It lets us write little scripts in this justfile, and then we can run them from a command line e.g., just gen, and it will spin off the proper CMake incantations so I don't have to remember them anymore. I'm an overnight fan.

fosterbrereton · 2024-05-03T19:56:58Z

src/async.cpp

@@ -0,0 +1,150 @@
+// Copyright 2024 Adobe


All this is taken from the orc file.

fosterbrereton · 2024-05-03T20:06:23Z

src/orc.cpp

@@ -148,61 +148,6 @@ auto& global_die_map() {

 /**************************************************************************************************/

-void register_dies(dies die_vector) {


I moved this to further down this file, out of the anonymous namespace, but otherwise unmodified. That let me wrap it in the orc namespace without the compiler complaining about orc being inside an anonymous one.

fosterbrereton · 2024-05-03T20:07:31Z

src/orc.cpp

@@ -211,114 +156,14 @@ struct cmdline_results {

 /**************************************************************************************************/

-struct work_counter {


All this moved to async.cpp otherwise unmodified.

fosterbrereton · 2024-05-03T20:09:11Z

src/orc.cpp

-    // First stage: process all the DIEs
+std::vector<odrv_report> orc_process(std::vector<std::filesystem::path>&& file_list) {
+    // First stage: (optional) dependency/dylib preprocessing
+    if (settings::instance()._dylib_scan_mode) {


This is the main engine for ORC, and this new block adds the dylib scan mode as the first step (when asked for.)

leethomason

It's a big PR - I left some comments, but I think testing will have to be the primary check.

leethomason · 2024-05-14T16:27:00Z

include/orc/parse_file.hpp

+    odrv_reporting,
+};
+
+struct macho_params {


I like the idea of windows support, but at this point, I think that would be more about abstracting out the ORC core from the parser. ORC would begin to look more like a multi-platform app, where you have the "core" parts and the "platform/compiler" parts. At that point, we would need to abstract the macho params.

But it's all very theoretical - especially w/o a MSVC implementation, so I think the approach of streamlining what we actually have is the correct one.

src/async.cpp

leethomason · 2024-05-14T16:37:00Z

src/macho.cpp

+    const macho_params _params;
+    std::vector<std::string> _unresolved_dylibs;
+    std::vector<std::string> _rpaths;
+    struct dwarf _dwarf; // must be last


why? that's a little scary - is there a c header / struct thing going on?

Within the macho_reader constructor, the constructor for _dwarf takes copies of some other fields within the macho_reader:

macho_reader(std::uint32_t ofd_index, freader&& s, file_details&& details, macho_params&& params) : _ofd_index(ofd_index), _s(std::move(s)), _details(std::move(details)), _params(std::move(params)), _dwarf(ofd_index, copy(_s), copy(_details)) { ...

Since _dwarf took copies from other member fields in the same struct, I added the comment to make sure it is constructed after those fields. I guess it doesn't have to be last, so the comment is a bit misleading.

leethomason

Looks good - thanks for all the hard work on this one.

fosterbrereton added 9 commits May 1, 2024 12:03

wip

6f89bea

first pass at dependency scanning. For locally build objects, travers…

b4b03b5

…es `LC_SYMTAB` to find the related object files, aka the debug map.

adding justfile

f6d9f5e

justfile tweaks

d60d48b

refactoring to keep the right work in the right sources

3de9c00

checked in a typo

cf3dd79

cleanup

6d0fbad

cleanup

0e8dee7

cleanup

5ccc0a2

fosterbrereton commented May 2, 2024

View reviewed changes

src/fat.cpp Show resolved Hide resolved

fosterbrereton added 3 commits May 2, 2024 11:13

cleanup

c516b87

reverting a change

4b6a52f

cleanup

9d3aa33

fosterbrereton requested a review from leethomason May 2, 2024 18:33

fosterbrereton changed the title ~~Dylib derivation & Scanning~~ Dylib Derivation & Scanning May 2, 2024

fosterbrereton added 7 commits May 2, 2024 16:43

more improvements and cleaning up the code so I can do recursive dyli…

f365a0c

…b scanning

more improvements and cleaning up the code so I can do recursive dyli…

7ca396b

…b scanning

more improvements and cleaning up the code so I can do recursive dyli…

d050908

…b scanning

fix up orc_test

2978d1c

replacing callbacks with a macho_params block

c6a1114

got the right executable_path wired up during dylib resolution

5907afb

recursive dependency derivation is working

e6ae50f

fosterbrereton commented May 3, 2024

View reviewed changes

src/async.cpp

@@ -0,0 +1,150 @@

// Copyright 2024 Adobe

Copy link

Contributor Author

fosterbrereton May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this is taken from the orc file.

fosterbrereton commented May 3, 2024

View reviewed changes

fosterbrereton marked this pull request as ready for review May 3, 2024 20:09

fosterbrereton requested a review from bheath-adobe May 3, 2024 20:09

fosterbrereton added 2 commits May 7, 2024 14:13

tweak

e60946b

Merge branch 'main' into fbrereto/dylib-scanning-mode

041b105

leethomason reviewed May 14, 2024

View reviewed changes

fosterbrereton added 5 commits May 14, 2024 11:46

performance improvements

e4dd201

review change from @leethomason

5841535

review change from @leethomason

a4987e7

Yet another build break

2c1d210

typo

654d789

leethomason approved these changes May 15, 2024

View reviewed changes

fosterbrereton merged commit ecf7de4 into main May 15, 2024
3 checks passed

fosterbrereton deleted the fbrereto/dylib-scanning-mode branch May 15, 2024 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dylib Derivation & Scanning #77

Dylib Derivation & Scanning #77

fosterbrereton commented May 1, 2024 •

edited

fosterbrereton May 3, 2024 •

edited

fosterbrereton May 3, 2024

fosterbrereton May 3, 2024 •

edited

leethomason May 14, 2024

fosterbrereton May 14, 2024

fosterbrereton May 3, 2024

fosterbrereton May 3, 2024

fosterbrereton May 3, 2024 •

edited

fosterbrereton May 3, 2024 •

edited

fosterbrereton May 3, 2024

leethomason left a comment

leethomason May 14, 2024

leethomason May 14, 2024

fosterbrereton May 14, 2024 •

edited

leethomason left a comment

		@@ -0,0 +1,31 @@
		# For documentation on `just`, see https://github.com/casey/just

		@@ -148,61 +148,6 @@ auto& global_die_map() {

		/**************************************************************************************************/

		void register_dies(dies die_vector) {

		@@ -211,114 +156,14 @@ struct cmdline_results {

		/**************************************************************************************************/

		struct work_counter {

Dylib Derivation & Scanning #77

Dylib Derivation & Scanning #77

Conversation

fosterbrereton commented May 1, 2024 • edited

fosterbrereton May 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fosterbrereton May 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fosterbrereton May 3, 2024 • edited

Choose a reason for hiding this comment

fosterbrereton May 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leethomason left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fosterbrereton May 14, 2024 • edited

Choose a reason for hiding this comment

leethomason left a comment

Choose a reason for hiding this comment

fosterbrereton commented May 1, 2024 •

edited

fosterbrereton May 3, 2024 •

edited

fosterbrereton May 3, 2024 •

edited

fosterbrereton May 3, 2024 •

edited

fosterbrereton May 3, 2024 •

edited

fosterbrereton May 14, 2024 •

edited