-
Notifications
You must be signed in to change notification settings - Fork 61
Description
This is related to #560 and #559 . The following is based on the result of the formal specification work done by @stellamplau. I will link to the longer document in a bit.
If the switcher is restricted to access only the registers, trusted stack and rtos-stack, then threads can run concurrently (either in a multicore setting or with interrupts enabled), even if the thread is in the switcher code handling exceptions. This requires refactoring the switcher code as hinted in #559 . This issue gives the full set of changes to achieve this.
- MTDC is swapped immediately with CSP as soon as exception/interrupt occurs (https://github.com/CHERIoT-Platform/cheriot-rtos/blob/main/sdk/core/switcher/entry.S#L957). This should be switched to use MSCRATCH instead, to avoid clobbering MTDC. We need to maintain the invariant that MTDC always points to the trusted stack of the current thread.
- Another protocol must be used to check if an exception occurs while handling another exception (https://github.com/CHERIoT-Platform/cheriot-rtos/blob/main/sdk/core/switcher/entry.S#L957-L970). Perhaps keep MSCRATCH 0 when outside the exception handler. But, MSCRATCH is also non-zero when handling an interrupt, so an easier protocol is to reserve a byte in the trusted stack or a special CSR that gives the status of the thread - whether it is already in the exception handling state.
- Synchronous exception handling code and asynchronous interrupt handling code share the same entry point and the first few instructions (https://github.com/CHERIoT-Platform/cheriot-rtos/blob/main/sdk/core/switcher/entry.S#L933-L1026). Only after this, we check if it's a synchronous exception (https://github.com/CHERIoT-Platform/cheriot-rtos/blob/main/sdk/core/switcher/entry.S#L933-L1026) and if so, handle it. This should be changed such that we check immediately whether it's a synchronous exception or an asynchronous interrupt. This would require spilling a few registers (I think 2, including CSP, would suffice; this should be in the trusted stack separate from the spill area in the trusted stack - more on this in the next bullet) to test if it's an exception or interrupt and branch accordingly. Alternatively, we could use vectored interrupts, or even better, separate SCRs to store asynchronous interrupt handling code and synchronous exception handling code, with the hardware calling the right code depending on the type of trap. (Side effect: when a trap occurs, the interrupt status can still be enabled in this design.)
- The trusted stack currently contains a register spill area (https://github.com/CHERIoT-Platform/cheriot-rtos/blob/main/sdk/core/switcher/entry.S#L172-L183, https://github.com/CHERIoT-Platform/cheriot-rtos/blob/main/sdk/core/switcher/entry.S#L976) for all the registers that is used to spill for both interrupts and exceptions. First, this spill area should also include MSCRATCH, as this has now become an architecturally visible register because of the previous bullet. Second, this spill area should only be used in case of asynchronous interrupts. On synchronous exceptions, we need to spill only the required number of registers (I think 2, but perhaps more) again on the trusted stack. This need not support re-entrant code and hence can be allocated in the trusted stack exactly once per thread.
- The register context to user-specified error-handlers and installing register context from user-specified error-handlers should both be in the stack or thread-private area so that it is not clobbered by another thread (that can access the shared globals) running concurrently. One cannot rely on CSP to be a valid region to store register contexts (both for spilling and installing - that's what happens currently https://github.com/CHERIoT-Platform/cheriot-rtos/blob/main/sdk/core/switcher/entry.S#L1512-L1551; Note that the comment says
t0points to the untrusted stack https://github.com/CHERIoT-Platform/cheriot-rtos/blob/main/sdk/core/switcher/entry.S#L1522-L1524, but in reality it can as well point to the global memory (according to https://github.com/CHERIoT-Platform/cheriot-rtos/blob/main/sdk/core/switcher/entry.S#L1279) - that's what we want to change), as it can point to globals (similar to the issue in Side effect in switcher's compartment-call exception path #560 ). The complication arises because one could have earlier pivoted the CSP to point to the global while encountering an exception. To allow such compartments, we should require such a compartment to allocate another spill area per compartment entry (if the same compartment is entered multiple times, each entry will create a new allocation) in the stack. This will be used if the CSP is not pointing to the stack. (If this does not exist, or does not point to an area of the stack, it will lead to the compartment unwinding.). The same spill area can be used by the error handler returner to install the context (ensuring that the new context is in the stack).
Sorry for the wall of text; I wanted to write down all the points that we have discussed so far before we forget them.
We should split this into separate issues and tackle each of them separately. Once these issues are fixed, we will have support for multicore implementations (except for the scheduler - that has to adjust the priority queue atomically) and for enabling interrupts at all points (except for handling asynchronous interrupts).