-
Notifications
You must be signed in to change notification settings - Fork 72
Stack fault recovery #356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack fault recovery #356
Conversation
| * overwritten by the porting engineer for the current platform. */ | ||
| #ifndef UVISOR_PAGE_MAX_COUNT | ||
| #define UVISOR_PAGE_MAX_COUNT (16UL) | ||
| #define UVISOR_PAGE_MAX_COUNT (128UL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @meriac wanted this to stay low. This results in (128 + 8 + 31) / 32) = 5 * uint32_t * #boxes (something like 5 * (5 + 1) * 4 = 30 * 4 = 120B).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Patater Yes, 128 is way too high. Why ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to reduce process stack sizes just for the tiny pages tests, so we need enough pages in the page heap for allocating all the process stacks plus the test secure allocator. The test secure allocator needs to have at least as many pages available as the number of subregions available for the page heap, which is probably around 48 (6 regions * 8 subregions). In practice, this means around 95 pages: I rounded that up to the nearest power of two to get 128.
If we add a page invalidation API, instead of trying to allocate so many pages that old ones get pushed out of the MPU configuration, then we can keep this count as low as we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Patater It's not acceptable to increase the pagecount for everybody.
- either we make that value configurable
- or create alternative uVisor-binaries for testing
- or add the page eviction as an uVisor API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This doesn't increase the page count for everybody, only the maximum they could configure it to be. (UVISOR_SET_PAGE_HEAP configures the actual pages and their sizes at the platform configuration level).
Is it unacceptable to increase this maximum just because of the 160 bytes? I'm not understanding why it'd be unacceptable to increase this if the only cost were 160 bytes of RAM. What cost are you seeing that I'm not?
If nobody sets the number pages in the page heap to 128, they, as far as I'm aware, only pay the cost of this bitfield (and not any MPU regions or subregions). If this limit is just in place to prevent "stupid" parameters to UVISOR_SET_PAGE_HEAP, the benefit of better tests outweighs preventing the stupid; as platform configuring people are the ones intended to use UVISOR_SET_PAGE_HEAP (not application writers), this seems fine.
For making the test less fragile, I'm already leaning towards making the page eviction API and have been working on that approach today. The test can also test more interesting cases now, with finer control over the pages loaded into the MPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote the test so we don't need to enable many or small pages yet. We can enable those when we need them.
| extern const void * g_page_heap_start; | ||
| /* Contains the page usage mapped by owner. */ | ||
| extern uint32_t g_page_owner_map[UVISOR_MAX_BOXES][(UVISOR_PAGE_MAX_COUNT + 31) / 32]; | ||
| extern uint32_t g_page_owner_map[UVISOR_MAX_BOXES][UVISOR_PAGE_MAP_COUNT]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
33da77d to
9f0a786
Compare
api/inc/box_config.h
Outdated
| \ | ||
| UVISOR_EXTERN const __attribute__((section(".keep.uvisor.cfgtbl_ptr"), aligned(4))) void * const box_name ## _cfg_ptr = &box_name ## _cfg; | ||
|
|
||
| #define UVISOR_BOX_CFG_PTR_DECL(box_name) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we call this UVISOR_BOX_EXTERN? We know that internally it declares the box cfg ptr, but from the POV of the user it's just extern-declaring the box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| */ | ||
| #define UVISOR_FLASH_LENGTH_MAX 0x9C00 | ||
| #if NDEBUG | ||
| /* Release builds can be up to this big. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work when we preprocess the linker script? The options for the preprocessing step are the following:
LINKER_CONFIG:=\
-D$(CONFIGURATION) \
-I$(PLATFORM_DIR)/$(PLATFORM)/inc \
-include $(CORE_DIR)/uvisor-config.hBut NDEBUG is not set there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I had a look into the release *.linker files and they are using the large debug limit. I'll have to set NDEBUG in the linker script generation step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
api/inc/box_config.h
Outdated
| extern const __attribute__((section(".keep.uvisor.cfgtbl_ptr"), aligned(4))) void * const box_name ## _cfg_ptr = &box_name ## _cfg; | ||
| UVISOR_EXTERN const __attribute__((section(".keep.uvisor.cfgtbl_ptr"), aligned(4))) void * const box_name ## _cfg_ptr = &box_name ## _cfg; | ||
|
|
||
| #define UVISOR_BOX_CFG_PTR_DECL(box_name) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we call this UVISOR_BOX_EXTERN? We know that internally it declares the box cfg ptr, but from the POV of the user it's just extern-declaring the box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
e52c258 to
2c24181
Compare
|
retest |
|
retest uvisor |
|
build uvisor |
2c24181 to
6ececa9
Compare
|
Rebased on latest master |
3fb9267 to
27e16ba
Compare
2057f69 to
ee67526
Compare
core/debug/src/debug_box.c
Outdated
| g_debug_box.initialized = 1; | ||
| } | ||
|
|
||
| /* XXX This is a bit platform specific. Consider moving to an platform specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is meant to stay, please turn into a FIXME.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will move to a FIXME. We'll refactor when we have another architecture to add support for.
core/vmpu/src/armv7m/vmpu_armv7m.c
Outdated
|
|
||
| /* No recovery is possible if the MPU syndrome register is not valid. */ | ||
| if (fault_status != 0x82) { | ||
| if (!((fault_status == 0x82) || (fault_status & 0x18))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment specifying what the magic numbers mean? I actually always wanted to use the CMSIS symbols for these checks, but it would make them much longer. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a commit to switch magic numbers to CMSIS symbols for these fault status bits.
core/vmpu/src/armv7m/vmpu_armv7m.c
Outdated
| DEBUG_FAULT(FAULT_MEMMANAGE, lr, lr & 0x4 ? psp : msp); | ||
| VMPU_SCB_MMFSR = fault_status; | ||
| HALT_ERROR(PERMISSION_DENIED, "Access to restricted resource denied"); | ||
| debug_box_enter_from_priv(ipsr, lr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this line imply that this path is only reachable by a non-recoverable from-privileged MemManage fault? If yes, please specify it as a comment in line 164.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't understand the debug_box_enter_from_priv very well. The debug box is a "hidden feature", meaning that it doesn't have visibility from the uVisor core. Instead, halt or print functions internally revert to the debug box if present. What is this function going to do here? I would expect that this function is called from some debug-box-specific function before returning here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, debug_box_enter_from_priv (if that's the line you mean) is reachable from a non-privileged MemManage fault as well. debug_box_enter_from_priv does nothing when called from privileged mode, letting the ordinary EXC_RETURN (instead of a forged one) enter the debug box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug_box_enter_from_priv never returns if from called with an lr that indicates an exception from privileged mode. It makes a fake EXC_RETURN, to use the psp that debug_deprivilege_and_return made, and executes the EXC_RETURN. When called with an lr that indicates an exception from non-privileged mode, the real EXC_RETURN is used just as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this approach very much, because (i.) it leaves a lr pushed onto the original exception stack frame, and because (ii.) it creates a return shortcut which is not easy to track down (for example if you look at the code from https://github.com/ARMmbed/uvisor/blob/master/core/system/src/system.c#L96).
So after this PR it appears that is not granted anymore that we blindly return to the caller using EXC_RETURN as-is. We can change the isr_default_sys_handler wrapper to do something like:
mov r0, lr \n
mrs r1, MSP \n
bx vmpu_sys_mux_handler \n"(note: No push of lr, bx instead of blx)
The wrapped function (vmpu_sys_mux_handler) would then do the actual return to lr, either as-is or changed.
@meriac What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't like it either. I justified it to myself my thinking, "We are halting anyway. We don't care if we leave anything on the stack."
Your suggestion is an improvement in the sense that the stack is cleaned up. I can see the importance of leaving the uVisor stack clean when we support halting individual boxes.
core/vmpu/src/armv7m/vmpu_armv7m.c
Outdated
| case HardFault_IRQn: | ||
| DEBUG_FAULT(FAULT_HARD, lr, lr & 0x4 ? psp : msp); | ||
| HALT_ERROR(FAULT_HARD, "Cannot recover from a hard fault."); | ||
| debug_box_enter_from_priv(ipsr, lr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k
core/vmpu/src/kinetis/vmpu_kinetis.c
Outdated
| DEBUG_FAULT(FAULT_BUS, lr, lr & 0x4 ? psp : msp); | ||
| VMPU_SCB_BFSR = fault_status; | ||
| HALT_ERROR(PERMISSION_DENIED, "Access to restricted resource denied"); | ||
| debug_box_enter_from_priv(ipsr, lr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments above.
core/system/src/unvic.c
Outdated
| * MemManage, BusFault, and UsageFault, so that we can recovery from | ||
| * stacking MemManage faults more simply. Set DebugMonitor, PendSV, SYSTICK | ||
| * to the same priority as SVC. */ | ||
| NVIC_SetPriority(MemoryManagement_IRQn, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there's an elegant way to do this, but it seems to me that these hard-coded priority values are too decoupled from the __UVISOR_NVIC_MIN_PRIORITY symbol. For example, at least we should check that the highest priority number used here (1) is lower than the __UVISOR_NVIC_MIN_PRIORITY value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
We do validate the entire exception stack frame (context_validate_exc_sf) before reading the xpsr, but additionally use vmpu_unpriv_uint32_read to read the xpsr, just in case.
Fix comments and brace placement.
This brings parity with the K64F vmpu_sys_mux_handler.
ee67526 to
00ab224
Compare
|
Responded to Alessandro's comments |
|
Tests pass on EFM32 |
|
Tests pass on K64F |
adf24a0 to
f2304bd
Compare
f2304bd to
664cf8f
Compare
66e5643 to
3fb59b0
Compare
|
Responded to Alessandro's comments and re-ran tests. Tests are passing on EFM32 and K64F. |
|
|
||
| /* Copy the xPSR from the source exception stack frame. */ | ||
| uint32_t xpsr = ((uint32_t *) src_sp)[7]; | ||
| uint32_t xpsr = vmpu_unpriv_uint32_read((uint32_t) &((uint32_t *) src_sp)[7]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this an overkill? The context_validate_exc_sf function does exactly the same check (reads with the t instruction).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the commit description spells it out as not really necessary, as a non-recoverable fault would happen in context_validate_exc_sf, preventing further progress. As currently written, the unprivileged access instruction is not technically necessary.
However, from the defensive coding perspective, it makes sense to use the unprivileged access instruction. Maybe the context_validate_exc_sf goes away on accident or has a bug introduced on accident. It's also nice to consistently have all places where uVisor accesses unprivileged memory using unprivileged access instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
core/system/src/unvic.c
Outdated
| /* Set the priority of each exception. SVC is lower priority than | ||
| * MemManage, BusFault, and UsageFault, so that we can recovery from | ||
| * stacking MemManage faults more simply. Set DebugMonitor, PendSV, SYSTICK | ||
| * to the same priority as SVC. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment Set DebugMonitor, PendSV, SYSTICK to the same priority as SVC. is outdated now.
core/debug/src/debug_box.c
Outdated
| from_svc = shcsr & SCB_SHCSR_SVCALLACT_Msk; | ||
|
|
||
| /* Make sure SVC is active. */ | ||
| if (!from_svc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen? If not, maybe it's better to turn it into a debug-only runtime assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Debug makes sense because it'd be a stupid uVisor programmer's fault if this happened. If it does happen with a release build, a usage fault (which uVisor chooses not to recover from) will happen.
core/debug/src/debug_box.c
Outdated
|
|
||
| /* Return to Thread mode and use the Process Stack for return. The PSP will | ||
| * have been changed already in debug_deprivilege_and_return via | ||
| * context_switch_in. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure that this comment does not become outdated when we change the code, you could turn it into something more general, as "It's the responsibility of the caller to se the PSP to a valid value.".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
core/vmpu/src/armv7m/vmpu_armv7m.c
Outdated
| /* If we are having an unstacking fault, we can't read the pc | ||
| * at fault. */ | ||
| if (fault_status & (SCB_CFSR_MSTKERR_Msk | SCB_CFSR_MUNSTKERR_Msk)) { | ||
| /* fake pc */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/fake/Fake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
e994db1 to
7e9a227
Compare
| DEBUG_FAULT(FAULT_MEMMANAGE, lr, lr & 0x4 ? psp : msp); | ||
| VMPU_SCB_MMFSR = fault_status; | ||
| HALT_ERROR(PERMISSION_DENIED, "Access to restricted resource denied"); | ||
| lr = debug_box_enter_from_priv(lr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this tested in release mode as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. It works even in release mode. Tests pass on both K64F and EFM32GG in release and debug modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
7e9a227 to
1460319
Compare
1460319 to
dfbc8ca
Compare
Setting the SVC as one lower in priority allows us to combine MPU recovery in one place, and not have to recover from a hard fault. We also use this commit as an opportunity to make the priorities of all system interrupts explicit.
AlessandroA
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Magic numbers are a bit hard to read. Use the standard CMSIS definitions for these fault status bits in an attempt to replace the magic number with something slightly more readable.
We hard fault if we try to read from the stack if the mem manage fault is a stacking or unstacking fault. Check the fault status before reading the pc from the stack to avoid the hard fault.
dfbc8ca to
5d0f159
Compare
Recover from stacking and unstacking faults. Reprioritize faults relative to SVC, so we can recover from more faults and also, as a bonus, get better debug messages. (Other miscellaneous fixes are also included.)
Tested on EFM32GG and K64F with a test that fails before this PR's commits are applied and passes after this PR's commits are applied.
@AlessandroA @meriac @niklas-arm