Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MacOS private loader #1285

Open
derekbruening opened this issue Nov 28, 2014 · 5 comments
Open

Implement MacOS private loader #1285

derekbruening opened this issue Nov 28, 2014 · 5 comments

Comments

@derekbruening
Copy link
Contributor

From derek.br...@gmail.com on October 06, 2013 11:58:28

Splitting from issue #58 as this is not necessary for core DR and will be implemented later.

Xref issue #1284 which is refactoring the core module code, including some of the loader code, but is not filling in MacOS implementations for the loader.

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=1285

@derekbruening
Copy link
Contributor Author

Xref DynamoRIO/drmemory#1673 where we hit problems due to the lack of a private loader. We should at least ask for up-front binding in clients for now.

@vit9696
Copy link
Contributor

vit9696 commented Feb 5, 2022

@derekbruening after you mentioned private loader in #5325 (comment), I started to wonder why is that needed on macOS. Unlike Linux, which uses a flat namespace, macOS dyld uses two-level namespace, i.e. a library and a symbol name. Having two symbols in two different libraries does not lead to a conflict as macOS, as the application always links to a particular library. Do I miss something important here?

@derekbruening
Copy link
Contributor Author

@derekbruening after you mentioned private loader in #5325 (comment), I started to wonder why is that needed on macOS. Unlike Linux, which uses a flat namespace, macOS dyld uses two-level namespace, i.e. a library and a symbol name. Having two symbols in two different libraries does not lead to a conflict as macOS, as the application always links to a particular library. Do I miss something important here?

I assume you mean, could you take all the 3rd-party libraries a tool uses, make a copy of each, give each a different name, and use the existing application loader to load them? This would help, but I would call it the start of a private loading scheme, so it's a step toward a private loader. It does not solve everything:

  • There are global resource conflicts: e.g., the brk used for UNIX heap. These are not solved by only copying and renaming libraries since both copies will compete for the same thing and in some cases assume each has a monopoly on it.
  • Renamed libraries still import from the originals. The imports need to be changed. If we add changing all the imports to the task of making copies: that is moving more and more toward a complete private loader.
  • We would like DR to take over at the very first usermode instruction, unlike today where it has to wait for the app's loader to load it and so misses a bunch of init code. To run a tool very early in a new process, we can't rely on the app loader, which is not initialized.

See also https://dynamorio.org/transparency.html

@vit9696
Copy link
Contributor

vit9696 commented Feb 7, 2022

@derekbruening, I am afraid it could be even more complicated than that, and my opinion is that trying to resolve it the "usual" way will not work. One would have to go the Mac way.

  1. Unlike Linux, macOS does have forward API and ABI compatibility. There exist deprecations and removals, but until the API has not been deprecated, it is supposed to work in future versions of macOS. System calls on macOS are not an API despite you being able to find them. If you want to write code in userspace, you must link against libSystem.dylib, which proxies libsystem_kernel.dylib that exports C functions to system calls.

  2. Shared linking is the only possible way of linking on macOS. Even for console applications. In theory you can try to write your own crt based on macOS source code. In theory you can try to find syscall numbers and can even statically link some custom C-library. However, this is not guaranteed to work even over point macOS releases, let aside major updates. Since Apple releases OSS components about half a year after the macOS release nobody bothers to even try.

  3. Shared linking on macOS will always use dyld_shared_cache (DSC). DSC is basically a container with almost all system libraries linked against each other to save space and initialisation time. It is loaded early by the dynamic linker and mapped into every executable with the kernel support. Over the last two years almost all original system libraries were removed from macOS. I.e. in macOS 11 and newer the original library files simply do not exist, only DSC does, and this one is mapped by the dynamic linker (dyld) as a whole, which behaviour you cannot replicate as it is kernel-supported with one process having just one DSC.

I will need to recheck macOS to be 100% sure of the rest, but I do not think there would be a change with the rest of the wording:

  • sbrk does not exist on macOS. I think the main way to allocate memory is libmalloc (https://github.com/apple-oss-distributions/libmalloc), which uses mach_vm_map (https://github.com/apple-oss-distributions/libmalloc/blob/main/src/vm.c). This is a system call.
  • Altering DSC read-only memory on macOS is no longer possible for whatever reasons, memory protection changes will simply fail. In some past there was copy-on-write support with private copies, but eventually that one disappeared.
  • Codesigning is a must at least on ARM macOS, and the dynamic loader is part of that chain. One will not be able to access most GUI APIs without using macOS builtin dynamic loader.

To my eyes, there are some hard limitations on macOS, which DR will not be able to overcome at all. My opinion is that the private loader simply cannot be written with adequate functionality on macOS. However, I do believe that DR can work around this if it tries and adapts to macOS specifics, likely with some limitations.

  1. DR can work as early as constructors (e.g. Objective-C runtime) start (https://github.com/apple-oss-distributions/dyld/blob/4de7eaf4cce244fbfb9f3562d63200dbf8a6948d/src/dyld2.cpp#L1738-L1765).
  2. Target application imports can be rebound with e.g. fishhook (https://github.com/facebook/fishhook).
  3. DR runtime can be limited to use a small subset. One could rebound application imports not to use this subset by proxying it.

That is probably it. I would like to know your opinion based on this information, which might not have been clear earlier. Perhaps, a completely different route should be taken than what is described in this issue. To be able to provide a more clear description I would also need better understanding of what DynamoRIO wants to do, rather than how it tries to achieve it on Linux or any other OS.

@derekbruening
Copy link
Contributor Author

The two primary goals are transparency and ease of writing tools.

Transparency

Transparency is discussed in the docs, slides, and academic papers in depth: e.g., https://dynamorio.org/transparency.html The main issue is isolation. The more isolated both DR and its client are, the better. In general, DR or a client using the same user library used by the application is asking for trouble. Many functions are not re-entrant and are thus unsafe to use. A simple wrapper around a system call that is known to have zero state might be safe; but who's to say it won't change in the next version and then be non-re-entrant? The maintenance pain of using raw system calls would have to be carefully balanced against the stability risks of using app libraries. It might be that using low-level mostly-stateless known-to-be-rentrant user library functions is the best choice on Mac.

On Windows, the situation is similar to Mac: the system call interface is not documented. But the vast majority of user library functions cannot be shared with the application. Even the simple system call wrappers that you would think would be safe for DR to use turn out to have problems due to hooks that software places on them, forcing DR to use raw system calls in many cases. Sometimes DR does use simple functions in low-level Windows user libraries shared with the app when there is no better alternative.

Ease of writing tools: library support!

I'm not too worried about getting core DR to operate transparently: it already is pretty self-contained and just needs a way to call mmap and other operations. It already easily avoids libc and other libraries. The hard part is supporting a good tool-writing platform. Nobody wants to write a tool that can't leverage existing libraries. These days, most people do not want to write a tool in pure C and have to re-invent data structures: they want C++ and STL. How can that be supported without a way to auto-magically load the C++ shared library and isolate it from the application? Not easily. Statically linking the C++ and C libraries is possible, but not easy: we've looked at in the past and always rejected it.

We've worked in scenarios where libraries were not available: when DR and its client are statically linked into the application, and thus there is no nice shared library boundary where isolation can be applied. It is painful to do anything. You have to use custom allocators or use DR's allocator with placement new and be extremely careful about everything you do: make sure anything isn't going to allocate memory or touch other resources under the hood. It is very easy to mess up and have mysterious bugs that are very difficult to track down. It does not make for a good tool platform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants