-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Yield-WaitFor syscall #3577
base: master
Are you sure you want to change the base?
Conversation
I think we want something like this. This yield-for makes mixing async and sync code in userspace easier. Currently, it is fairly easy to write all async or all sync code in userspace, but when mixing them async callbacks can lead to unexpected call chains when using The decision to make is whether to go with this lightweight version, or something more integrated (or something else). Instead (or in addition to), we could add yield-for-blocking, which would remove the upcall entirely, and instead pass the upcall arguments to userspace via the return arguments of yield-for-blocking. Advantages of yield-for-blocking:
Disadvantages:
Having articulated this, I see the appeal of yield-for. I think Pat pointed out that the complexity in userspace code is fairly easy to hide in a library layer, so that driver code would look like the yield-for-blocking case. Perhaps the prudent move is to address the complexity of writing sync and async code in userspace with yield-for, and not try to introduce blocking semantics before we know the aysnc version is insufficient. |
A quick note,
There are probably other games that could be played to ease this, but it's all more complex than the lightweight change I think we should start with as-proposed here. |
But the arg1 arg2 arg3 are going back up (kernel to userspace) and driver num and subscribe num are going down (userspace to kernel). |
Oh, you were thinking some more library magic here? I guess that works so long as the lower layer does the remapping and the pointer writing. I don't think there is any way to pass all the pointers for arg1-arg3 in the syscall and have the kernel do the write, but that's not necessary.... I think the below would work? // libtock.c
int yield_for_blocking(struct upcall_id, &arg1, &arg2, &arg3) {
register uint32_t r0 __asm__ ("r0") = YIELD_FOR_BLOCKING_ID;
register uint32_t r1 __asm__ ("r1") = upcall_id.drv_num;
register uint32_t r2 __asm__ ("r2") = upcall_id.sub_num;
register int retval __asm__ ("r0");
register int rv1 __asm__ ("r1");
register int rv2 __asm__ ("r2");
register int rv3 __asm__ ("r3");
__asm__ volatile (
"svc 0"
: "=r" (retval), "=r" (rv1), "=r" (rv2), "=r" (rv3)
: "r" (r0), "r" (r1), "r" (r2)
: "memory");
if (retval & YIELD_FOR_ACTUALLY_YIELDED_FLAG_MASK) {
*arg1 = rv1;
*arg2 = rv2;
*arg3 = rv3;
}
return retval;
} |
Yes that is what I was imagining. |
The C wrapper could just take five arguments though instead of a struct:
|
True, though, since we're size-optimizing, if that function didn't get inlined everywhere, adding the fifth argument would require spilling to the stack [assuming |
This function seems like something that should get inlined everywhere, provided that it is being built with reasonable optimization settings, which should be the case for any project particularly concerned about size. So I would advocate for the five argument function definition, though I don't feel super strongly about it. If we used your approach, wouldn't we need to pass a pointer to upcall_id in order to not spill to the stack? That said, I think that the disadvantages Brad described are valid:
These are both disadvantages that do not exist with
|
Yes, that's what I meant by "[assuming I'm generally wary about relying on compiler optimizations for performance correctness, but I think we can shelve this side-hypothetical for now.
Why is that limitation in place? That's a pretty crippling limitation for a lot of applications — can't have something like a periodic timer and a event-based sensor subscription live at the same time? That's not really an acceptable or realistic limitation for the long term. If the issue is concurrent execution concerns of callbacks, I'm not opposed to something of the spirit of |
One thing I had been ruminating on was whether there would be value in something closer to a In practice, I think the ergonomics around first |
I'm not sure it is? In my little dabble yesterday it seems like subscribe in libtock-rs is essentially the
It's really hard to discuss blocking command without something like this PR. What exactly is blocking command? Critically, how does a capsule give the blocking command its return data? Also, can someone write its It seems like the major disadvantage is going to be: when writing a capsule driver, after my interrupt fires, do I call Imagine we implement It seems like it might not be too big of a step to implement command-yield-for-blocking, which is a command immediately followed by a yield-for-blocking in a single system call (how exactly to do that tbd). I say this because that to me seems more plausible than re-writing all of our capsules to have both an async and a sync version (if in fact that is what is required for blocking_command). |
There is more context on the decision to not support independent tasks at tock/libtock-rs#334. The short version is that supporting that requires making a lot of important design decisions that I did not have the data to make at the time. I also want to re-evaluate whether futures are acceptable in |
That state of libtock-rs is much more sane. w.r.t. to an eventual more async world, there's some more complex stuff in libtock-c apps, but really nothing that would demand more capability than what is sounds like libtock-rs already supports. I do think one of the good outcomes from TockWorld6 is a push for the core team to get more parity in libtock-c / libtock-rs examples, so if I'm wrong, I suspect we'll see soon :). |
doc/reference/trd104-syscalls.md
Outdated
_Note:_ In the case that the specified upcall is set to the Null Upcall, | ||
no upcall will execute, but Yield-WaitFor will still return. In this | ||
case, the contents of r0-r3 upon return are UNDEFINED and should not be | ||
relied on in any way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is great except why not have r0
-r3
be the values that would have been passed a non-null upcall? This would allow many cases to avoid callbacks entirely and just process the results directly, and doesn't cost anything (I don't think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is yield-for-blocking
. Or a variant at least. I see what you are asking. But, given
The Tock kernel MUST NOT invoke the Null Upcall.
what would those be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so, the reason I didn't do this is because I couldn't decide the right behavior for r3
.
TRD104 starts by describing upcalls abstractly as something that have (up to) 4 return arguments passed in r0-r3. We have one type of upcall the kernel actually emits right now, namely the upcall produced in response to subscribe [n.b., nothing in this TRD forbids other upcall signatures in the future*]. For clarity, I will use SubscribableUpcall
to refer to an upcall invoked on a pointer passed via Subscribe.
Much to my surprise, as far as I can tell, the signature of SubscribableUpcall
is not expressly defined anywhere except in §5.2 of this TRD, where the C prototype for a subscribe_upcall
is given as an example.
Indeed, we define the
FunctionCall
object passed to UKB with this note:Struct that defines a upcall that can be passed to a process. The upcall takes four arguments that are Driver and upcall specific, so they are represented generically here.
In practice, the type signature for an upcall is established by the kernel in kernel::upcall::schedule() and an implicit assumption in the definition/creation of an Upcall
object that upcalls come only from subscribes. But that's just an implementation artifact at the moment, not part of our ABI*.
Now, if YieldFor
were to say something like 'sets r0-r3 to the Upcall Arguments if the current Upcall is the NullUpcall', then we would expect all Upcall Arguments to be set, right? Now, for the case of a SubscribableUpcall
r0-r2 are clear, but what do we do with r3, which would be the userdata
pointer? Does YieldFor
need to do different things based on which type of Upcall it's passing?
We could further amend TRD104 to formally specify the function signature for all upcalls, and then define YieldFor
to skip r3. Or define SubscribableUpcall
and specify YieldFor
behavior only for that upcall type.
I was shooting for minimum delta with first RFC, and I don't think we can do this "skip the callback" without also making more changes around Upcall definition. I lean to formalizing SubscribableUpcall
as a no-overhead option for today (since there is only one Upcall type still) while leaving the door open to other Upcall signatures in the future.
I do think the "skip the callback" would be nice. I can update this RFC to capture a version of that.
*Subscribe does start like this:
The Subscribe system call class is how a userspace process registers upcalls with the kernel.
Which if we really read into the "a" there maybe suggests the 1:1 mapping, but I wouldn't blink an eye at a future section that said something like "The GetASignal
system call is another way userspace processes register upcalls with the kernel". If there really is OneTrueUpcallSignature, this TRD needs to say that much more directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Subscribe does start like this:
The Subscribe system call class is how a userspace process registers upcalls with the kernel.
Which if we really read into the "a" there maybe suggests the 1:1 mapping, but I wouldn't blink an eye at a future section that said something like "The
GetASignal
system call is another way userspace processes register upcalls with the kernel". If there really is OneTrueUpcallSignature, this TRD needs to say that much more directly.
But wouldn't we need to add that everywhere? A command syscall is THE ONLY way to issue a command. Yield is THE ONLY way to yield, etc. It seems clear to me that subscribe is the only way to subscribe, since that is all it does.
But yes we need to document the upcall signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's clear that "subscribe is the only way to subscribe".
What I do not think is clear is that "subscribe is the only way to get upcalls". Upcalls are defined completely independently of subscribe in the previous section. As written, subscribe is just one thing (and currently happens to be the only thing) than can generate an upcall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to say anything about subscribe being the only way to set upcalls, any more than we need to say command is the only way to instruct the kernel to start an operation. Commands are documented elsewhere. But, we do need to define what an upcall is, in my opinion.
I'm strongly supportive of something very close to this. High bit
Some more details separated so they can be quoted separatelyMore on 1: it's worth looking through our various libtock-c examples and drivers and seeing where we would actually replace calls to the current library More on 2: with some specialization in userspace as well as some sort of combined system-call system call ("please do a (There are no!) Disadvantages@bradjc points out some disadvantages. These are very well worth considering. I think an evolution of this proposal avoids these pitfalls.
Yes. As suggested later in the thread, either renaming or aliasing
Agree completely. There is a remaining question of what to do with |
Naming versions of the Yield variantWe've now described several variations of the proposed yield variant and I believe I've added to the confusion by conflating names for them above. Let's refer to them as follows:
|
What is the usecase for calling subscribe and using |
One simple example may be logging statistics on how often something happens. If the callback is able to return different values to the syscall site in r0-r3, it might be useful for filtering or customizing the result... Just general modularity. Indeed, I suspect this will rarely be used. |
I've updated this to match the There are some design choices not fully elucidated in the RFC yet, as I'd like us to decide what we want to do about #3577 (comment) before fleshing that text out completely. PoC code is updated (again, compile-tested only) to realize the proposed interface. |
Is |
|
||
The Yield system call class has no return value. This is because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would no longer be true.
With So libtock tries to write: int my_command_sync() {
driver_command();
asm("yield wait-for-callbackifpresent");
// did an upcall execute?
// or is r0-r3 valid?
return ??
} But the then the user at some point called Is this possible or am I missing something? |
As currently written, this is correct, yes. It assumes that userspace tracks whether it has subscribed a callback or not correctly, and if userspace cares that a callback ran, the callback can set a flag. We could use |
I propose we pass all yield return arguments via a pointer specified by Otherwise, I don't know how we would use OR, I prefer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if I was missing something, but it seems with Yield-WaitFor-CallbackIfPresent
a userspace implementation of yield_for is unable to tell if the registers after the yield_for syscall returns are valid or not, without some other tracking state or program knowledge. Marking as blocked to resolve this, because that seems like a problematic interface we do not want.
Okay, I've tried to update this to capture the current discussions here and to a current-best proposal. Start with the summary of design discussion, here: https://github.com/tock/tock/blob/yeild-for/doc/reference/trd104-syscalls.md#design-rationale-and-alternatives
The biggest noteworthy changes:
I think this is getting somewhere pretty reasonable. Not written up in the RFC yet, because it occurred to me as I was writing this comment, but I think that @kupiakos's 'choose which callback when calling yield' can be implemented in userspace with the |
I thought it was the case that if in userspace you have:
How are you addressing this? relevant: #3577 (comment) #3577 (comment) |
Code view:
is closer to truth. |
Do we need a new error code? int my_sync_syscall(int* len) {
uint8_t yield_result
command()
yield_for(upcall_id, &yield_result)
if (yield_result == 0) {
*len = r2
return SUCCESS
} else {
return ???
}
} What happens if sync api is expecting yield to return r0-r3 but it doesn't? |
Oh shoot, I meant to call attention to this too. At the end of §4.1.0 I added another open question:
I went ahead with the first half of that (yield-result written no matter what), but the only two options currently are:
I think a more general approach would have the kernel setting For a generic userspace layer, however, there is still the ugly question of how one function can return "Error-from-Yield" or "Error-from-Command" in a sane way. Not that "Error-from-Yield" should ever happen.... |
**OPEN QUESTION:** ^^ This documents what we currently do, but it is the | ||
case right now that there is no way for userspace to identify an | ||
errorneous call to yield (e.g. invalid value in r0)---should instead | ||
`yeild-result` be written no matter what and an error case defined? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this means. What is an erroneous call to yield? Also, yield-result
is written no matter what, assuming it is a valid process pointer. What would the error case capture? When are you going to enable spell check in your text editor :)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only one I can think of is an illegal yield type, i.e. the invalid value in r0. FWIW, this is what the current TRD says:
which basically kind of just doesn't say anything about what else does or doesn't happen if you make an impossible syscall.
Frankly, I don't really think it matters too much what we do, as so much else has gone wrong likely if you're asking for an invalid Yield, but I do feel like we should define what does or doesn't happen—especially if we have a "r1 is the yield-result" paradigm.
When are you going to enable spell check in your text editor :)?
😬 , never... I learned a long time ago that spellcheck is too distracting for my brain to write/synthesize anything non-trivial. I do try to make a point of spellchecking non-draft PRs before pushing though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, I don't really think it matters too much what we do,
I think we are already clear on what happens:
If userspace tries to invoke a system call that the kernel does not support, the system call will return a Failure result with an error code of NODEVICE or NOSUPPORT (Section 4).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that a system call with class number 0 and r0== is an unknown syscall and not an unknown yield.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that a system call with class number 0 and r0== is an unknown syscall and not an unknown yield.
The problem here is "how does the kernel tell userspace it was an invalid syscall?"
I think we the challenge with the current text / design is that these two parts are at odds:
the system call will return a Failure result with an error code...
The Yield system call class has no return value.
How can a system call class that has no return value return a Failure result?
We're in a pretty deep never-should-happen corner case here, but what I'm struggling with is 'what is the right thing for userspace to do'? Once userspace invokes a syscall with svc 0
, i.e. something in the Yield class, it cannot first 'read the return value to see if the syscall failed' as the Yield class defines different return semantics than every other syscall.
I am opposed to While I articulated the drawback of Instead, I propose we implement Advantages
Disadvantages
|
I remain in favor of Are there any use cases that want to use yield_waitfor_X with upcalls? Amit has mentioned hypothetical ones, but are there any we would implement immediately if we could? Is anyone asking for this? The discord of doing a blocking syscall with an asynchronous callback makes me adverse to any of the upcall yield-waitfor options. Trying to explain that to someone would be difficult. I propose we implement the thing which 1. supports our use case and only use cases we know of and 2. is low impact to implement in the kernel, which is |
@bradjc to clarify your most recent comment, are you now in favor of just If the former, I think I agree. The main point of confusion with |
My current opinion is to add only Yield-WaitFor-NoCallback`. |
I think it's fine if there is an upcall handler subscribed that doesn't get used. Having r0, r1, r2, r3 = yield_wait_for_no_callback();
if (r0 == SUBSCRIPTION_ERROR) {
subscribe(NULL);
r0, r1, r2, r3 = yield_wait_for_no_callback();
} And if you want to be even more correct: old_subscribe = NULL;
r0, r1, r2, r3 = yield_wait_for_no_callback();
if (r0 == SUBSCRIPTION_ERROR) {
old_subscribe = subscribe(NULL);
r0, r1, r2, r3 = yield_wait_for_no_callback();
subscribe(old_subscribe);
} Because having a library secretly unregister your subscription would also be a subtle bug. |
I also support I think it's worth starting an implementation for |
To follow up on the discussion in the call today, I'd suggesting looking at how 4 different system call drivers (userspace and kernel) would change if re-implemented with
Can we quantify "simplify" as "shorter code text", "fewer branches", or "smaller binary"? Are there other considerations we care about? Are there specific userspace complexity examples that we want to use as data points? |
Main text is updated for Yield-WaitFor-NoCallback semantics.
The original / primary motivation discussed at TockWorld was not any quantitative metric. It was correctness-oriented—i.e., no reentrancy concerns for userspace developers. I think the current text captures that as-intended. The subsequent paragraphs already discussed reduction in syscall overhead as a secondary motivation. I also added a paragraph on code size. I think those are all the driving motivations here. Will work on the proposed impl's next... |
The userspace updates are actually pretty substantial if the intent is to switch the library fully to a synchronous interface, so I just did console to start: tock/libtock-c#345 With the enormous caveat that all of these are currently compile-tested only, for the kernel: size before:
size after:
So for rough order-of-magnitude sense of impact, this adds ~1500 bytes to the kernel while removing ~500 bytes from the userspace console implementation — i.e., a code size improvement for systems >3 apps. For simplest case ( For the primary goal of code complexity reduction, the |
That makes sense, but how do you evaluate that? The flipside is that you re-order callbacks, which has its own issues. E.g., if a do a yield-wait-for on packet reception, I can delay a timer upcall indefinitely. This seems a little tricky -- how do you write a userspace library when any other library (particularly one written by a developer who doesn't want to handle re-entrancy) can arbitrarily delay an upcall? Part of the challenge is that this is about the implementation of a library, i.e., which yield call it makes. This makes me skittish because it's an extremely large architectural change (i.e., how upcalls are handled) without a substantive evaluation or tradeoff analysis plan. I buy the arguments -- it seems like it could be a good idea and so is worth exploring. But you have to build it. Is there a way to evaluate the cost of reentrancy concerns? I'm not arguing that this isn't the best solution, just that it's going to create different correctness problems. It seems very likely that those correctness problems are much less bad than the current ones. But this isn't a silver bullet. Perhaps this would call for labeling particularly library functions "non-reentrant", and if you call a non-reentrant function, you are non-reentrant? That way, userspace functions that need re-entrance can know not to call these functions. I.e., if you need to handle a timer, you don't call "nr_receive" or something like that. It's important that the final TRD not include long discussions of design rationales; this isn't an academic paper. Following the command 0 example we could include a link? |
#[derive(Copy, Clone)] | ||
pub enum Task { | ||
/// A task that should not actually be executed. This is used to resume a | ||
/// suspended process without invoking any callbacks in userspace (e.g., | ||
/// when YieldFor resolves to a Null Upcall). | ||
NullSubscribableUpcall(NullSubscribableUpcall), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This addition seems not strictly necessary. If we make fn_ptr
in FunctionCall
optional, we can re-use that enum option.
Perhaps it could be FunctionalCallOrReturn
?
_ => false, | ||
}, | ||
Task::NullSubscribableUpcall(nu) => nu.upcall_id == upcall_id, | ||
Task::IPC(_) => todo!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When benchmarking, keep in mind todo! adds ~100 bytes vs. false
.
RE: @ppannuto's initial evaluation:
and
I buy this generally, but note I don't think this is an apples to apples comparison for putnstr since the current main-brach implementation queues prints to avoid blocking while the modified version doesn't. Most of the complexity in the user-space console driver is from this queuing and can be removed. As an exercise, I did this a while ago and got substantially more concise code and substantial savings, without changing the kernel (I don't think I got 500 bytes, and it was also untested, so I'm not saying there are no savings here, just that this particular comparison probably shouldn't be taken completely at face value). Also, I didn't even share my findings in a place I can now find them, so @ppannuto's exercise is about 1000x more valuable and useful than mine and my comment here should not be interpreted as a complaint. |
Brief summary of our state
I suggest we forge a plan for exploring this completely, which would include a prototype implementation in the kernel (which I believe we already have) and porting several drivers & apps in both C and Rust to use this new interface. |
The relevant text describing
Important properties:
|
From discussion: the arguments originally listed above are incorrect. There should only be three arguments. EDIT: specifically, it's |
This would also make it much easier to mix sync and async code in userspace, wouldn't it? I'm currently having my app crash from calling |
Pull Request Overview
Following the discussion at TockWorld6, this describes the proposed Yield-WaitFor and provides a (untested) rough implementation of how the kernel could easily implement it.
For ease of viewing, this draft PR edits TRD 104 directly so it can be seen as a diff. A final PR would follow the proper, full TRD process.
The primary motivation to move this functionality from a userspace
yield_for
into a specialized system call is to simplify correctness for userspace applications. Userspace upcall handlers do not have to worry about reentrancy if the kernel guarantees that exactly one and only one specific one of userspace's choosing will be called. It becomes an opt-in synchronous API for userspace without reducing the fundamental asynchronous design of Tock.Testing Strategy
Compiling (and no more!).
TODO or Help Wanted
This is currently designed and architected as a minimal-impact change. In particular, if you want the actual status or return value from the upcall, you still need to have supplied a callback function. If (e.g., often in the case of prints) you don't care about the return success/failure, then you can just leave the default Null Upcall in place and this will do what you want.
The implementation is not intended as final. Rather, it's just trying to demonstrate how this can be a very lightweight change in the kernel (indeed, one should not write careful code while also trying to participate in a meeting 😅 ).
Documentation Updated
/docs
, or no updates are required.Formatting
make prepush
.Rendered