-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
c18n: Rework implementation to be interrupt-safe #2090
Conversation
Why is the second commit in here? |
For the first commit, 1384 insertions(+), 1259 deletions(-) is a big pile of diff to review in one chunk. Is there really nothing you can do to split it out into multiple commits that clearly explain what they're actually doing? I don't really know how anyone can be expected to review it by just staring at literally hundreds of lines of assembly diffs mixed in with other diffs all over the place. |
It is not related to the first commit but bundled here because it is a very minor change. |
(Generally speaking, commits should either be large and boringly repetitive, or small and complex, not both) |
Well, don't? As far as I can tell it can be easily rebased onto dev, and if it's a good idea it could have gone in weeks ago with little review effort needed. |
I'm afraid that this change is more of a rewrite of entire parts of the implementation than a collection of incremental improvements. The change in the stack look-up mechanism propagates all over the place, including
The unchanged parts are I have added extensive comments to the assembly code and signal handling code, which are a bit tricky. Please let me know if there's anything unclear. |
There is actually some non-essential dependency between this commit and the previous one ( |
I think the superpages commit can certainly wait for the refactoring to land. The only bit I think can maybe be easily extracted as a separate commit perhaps is the utrace refactoring. |
714b3d0
to
5f8abe8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of nits.
/* | ||
* Add 1 to offset due to capmode. | ||
*/ | ||
PATCH_ADR(cookie_off, size + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deserves a better comment to explain why the + 1 is there even for capmode (and please call it C64; capmode is the RISC-V name). If I understand correctly, it's because you use LR == cookie
to detect a tail call, and since the benchmark ABI only clears the LSB of LR when it returns you need the cookie to match that, even though that's not the actual address it would use for a return.
There's probably a better name for this than cookie
, too, that conveys what it's for. Otherwise it sounds like it's just an arbitrary bit of identifying data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed it to landing
to denote the address in the trampoline that the called function should 'land' back to.
libexec/rtld-elf/rtld_c18n.c
Outdated
.sig = (struct func_sig) { | ||
.valid = true, | ||
.reg_args = 3, .mem_args = false, .ret_args = NONE | ||
for (int n = npagesizes - 1; n >= 0; --n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion included curly braces for a reason... if the inner block uses curly braces so should the outer
libexec/rtld-elf/rtld_c18n.h
Outdated
_Pragma("clang diagnostic pop"); \ | ||
_tf; \ | ||
}) | ||
struct stk_table_sizes_data { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name still doesn't make sense to me given what it is
libexec/rtld-elf/rtld_c18n.h
Outdated
size_t capacity; | ||
struct tcb_wrapper *wrap; | ||
struct stk_table_stk_info trusted_stk; | ||
struct stk_table_stk_info data[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely you can do better than "data" here?
libexec/rtld-elf/rtld_c18n.h
Outdated
struct stk_table_data { | ||
void *stack; | ||
void *reserved; | ||
} data[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
libexec/rtld-elf/rtld_c18n.h
Outdated
struct stk_table_stk_info trusted; | ||
struct stk_table_stk_info compart[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, just name them what they are and put something like stack in the name of the field. stk_table_metadata->compart is not accurate, that sounds like it's getting the compartment itself, not its stack.
Though I have to say I really do not understand why we have stacks here but also a stk_table_entry per compartment in stk_table itself. Which says to me that this needs some non-empty subset of better naming, comments and a different design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the names and added some comments in the struct definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit and one comment to think about for the future. I can't say I've extensively reviewed the whole design, but whilst it remains off-by-default I'm happy enough for it to land once the nit has been addressed.
@@ -832,8 +878,7 @@ _rtld_unw_getcontext_unsealed(uintptr_t ret, void **buf) | |||
struct jmp_args { uintptr_t ret1; uintptr_t ret2; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this is specific to Morello's calling convention, so doesn't really belong in an MI file, but that's a pre-existing problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I'll sort this out while porting to RISC-V.
The trampoline and other parts of RTLD are refactored to be interrupt-safe. The trusted frame is redesigned to allow trampolines to perform tail-calls that do not push a trusted frame. The new design also no longer relies on a region of metadata at the bottom of each compartment's stack.
Note: This is a cleaned-up version of #2079 minus the bits about c18n statistics.
This PR completely refactors the trampoline and how stack switching works. The purecap and benchmark ABI implementations now both use a dedicated register to store the trusted stack (
ddc
andrddc
respectively). This makes the trampolines look identical (modulo register names) on both ABIs. No metadata recording the current top of the stack is stored at the bottom of each compartment's stack. Instead, the stack lookup table now stores that information.The signal handling mechanism has been rewritten to handle (rare) cases where c18n code, in particular trampolines, is interrupted. All c18n code paths that could be interrupted have been audited and it is believed that they can all be handled correctly, although testing for that is hard.