Make DynELF more robust to different base addresses #933
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The semantics of some fields in ELF headers (notable examples are the entry point, p_vaddr for PHDRs, and d_ptr in Elf{32,64}_Dyn, but there are probably more) depend on the ELF Type. 'EXEC' type ELFs (non-PIE executables) will use actual addresses, while 'DYN' type ELFs (shared libraries or PIE executables) will use offsets from the ELF base address. However, the run-time linker may then fix up these offsets to be real addresses. Currently DynELF checks for this by decreeing that values smaller than 0x400000 are offsets, while above that the value is likely to be an actual virtual address.
This works for executables produced with gcc, as the default linker script that comes with the GNU linker, ld, sets the image base address to 0x400000 (or 0x8048000 for 32 bit). However, other linkers might choose different values. For example when linking with the LLVM linker, lld, and using its default linker script the base address is 0x10000.
The result in the lld example is that DynELF adds the base address to pointer values again, ending up with addresses in the 0x20000-0x30000 range, and so my exploit goes into some sort of quasi-infinite loop trying to leak from addresses which are not mapped.
The proposed change instead determines when to add the base address to the value read from these fields by looking at the Type field in the ELF header, and if that indicates that the value may be an offset, making a similar guess as before, but instead of hard-coding the value 0x400000, it uses the library base address.
My current understanding is that the fields mentioned above may contain offsets if and only if the ELF Type is 'DYN'. I am basing this on some superficial reading and a few experiments, so I may easily be wrong.
Testing this change does not seem straightforward to me. I have cooked up a very simple example, for which the new version seems to work with all combinations of PIE, RELRO, and linker. My simple test case can be seen here.
To me this feels like a bugfix and so the pull request is against the stable branch, but I would understand if someone was to argue that the nature of this fix is such that it should be considered an enhancement instead.