Stack fault recovery #356

Patater · 2016-10-11T16:47:37Z

Recover from stacking and unstacking faults. Reprioritize faults relative to SVC, so we can recover from more faults and also, as a bonus, get better debug messages. (Other miscellaneous fixes are also included.)

Tested on EFM32GG and K64F with a test that fails before this PR's commits are applied and passes after this PR's commits are applied.

@AlessandroA @meriac @niklas-arm

niklarm · 2016-10-12T09:25:40Z

core/system/inc/page_allocator_config.h

+ * overwritten by the porting engineer for the current platform. */
 #ifndef UVISOR_PAGE_MAX_COUNT
-#define UVISOR_PAGE_MAX_COUNT (16UL)
+#define UVISOR_PAGE_MAX_COUNT (128UL)


I believe @meriac wanted this to stay low. This results in (128 + 8 + 31) / 32) = 5 * uint32_t * #boxes (something like 5 * (5 + 1) * 4 = 30 * 4 = 120B).

@Patater Yes, 128 is way too high. Why ?

I didn't want to reduce process stack sizes just for the tiny pages tests, so we need enough pages in the page heap for allocating all the process stacks plus the test secure allocator. The test secure allocator needs to have at least as many pages available as the number of subregions available for the page heap, which is probably around 48 (6 regions * 8 subregions). In practice, this means around 95 pages: I rounded that up to the nearest power of two to get 128.

If we add a page invalidation API, instead of trying to allocate so many pages that old ones get pushed out of the MPU configuration, then we can keep this count as low as we want.

@Patater It's not acceptable to increase the pagecount for everybody.

either we make that value configurable

or create alternative uVisor-binaries for testing

or add the page eviction as an uVisor API

I agree. This doesn't increase the page count for everybody, only the maximum they could configure it to be. (UVISOR_SET_PAGE_HEAP configures the actual pages and their sizes at the platform configuration level).

Is it unacceptable to increase this maximum just because of the 160 bytes? I'm not understanding why it'd be unacceptable to increase this if the only cost were 160 bytes of RAM. What cost are you seeing that I'm not?

If nobody sets the number pages in the page heap to 128, they, as far as I'm aware, only pay the cost of this bitfield (and not any MPU regions or subregions). If this limit is just in place to prevent "stupid" parameters to UVISOR_SET_PAGE_HEAP, the benefit of better tests outweighs preventing the stupid; as platform configuring people are the ones intended to use UVISOR_SET_PAGE_HEAP (not application writers), this seems fine.

For making the test less fragile, I'm already leaning towards making the page eviction API and have been working on that approach today. The test can also test more interesting cases now, with finer control over the pages loaded into the MPU.

Rewrote the test so we don't need to enable many or small pages yet. We can enable those when we need them.

niklarm · 2016-10-12T09:25:46Z

core/system/inc/page_allocator.h

 extern const void * g_page_heap_start;
 /* Contains the page usage mapped by owner. */
-extern uint32_t g_page_owner_map[UVISOR_MAX_BOXES][(UVISOR_PAGE_MAX_COUNT + 31) / 32];
+extern uint32_t g_page_owner_map[UVISOR_MAX_BOXES][UVISOR_PAGE_MAP_COUNT];


Good catch!

AlessandroA · 2016-10-25T15:45:57Z

api/inc/box_config.h

    \
    UVISOR_EXTERN const __attribute__((section(".keep.uvisor.cfgtbl_ptr"), aligned(4))) void * const box_name ## _cfg_ptr = &box_name ## _cfg;

+#define UVISOR_BOX_CFG_PTR_DECL(box_name) \


How about we call this UVISOR_BOX_EXTERN? We know that internally it declares the box cfg ptr, but from the POV of the user it's just extern-declaring the box.

AlessandroA · 2016-10-25T15:51:00Z

core/uvisor-config.h

 */
-#define UVISOR_FLASH_LENGTH_MAX 0x9C00
+#if NDEBUG
+/* Release builds can be up to this big. */


How does this work when we preprocess the linker script? The options for the preprocessing step are the following:

LINKER_CONFIG:=\ -D$(CONFIGURATION) \ -I$(PLATFORM_DIR)/$(PLATFORM)/inc \ -include $(CORE_DIR)/uvisor-config.h

But NDEBUG is not set there.

Good catch. I had a look into the release *.linker files and they are using the large debug limit. I'll have to set NDEBUG in the linker script generation step.

AlessandroA · 2016-10-25T15:53:29Z

api/inc/box_config.h

-    extern const __attribute__((section(".keep.uvisor.cfgtbl_ptr"), aligned(4))) void * const box_name ## _cfg_ptr = &box_name ## _cfg;
+    UVISOR_EXTERN const __attribute__((section(".keep.uvisor.cfgtbl_ptr"), aligned(4))) void * const box_name ## _cfg_ptr = &box_name ## _cfg;
+
+#define UVISOR_BOX_CFG_PTR_DECL(box_name) \


How about we call this UVISOR_BOX_EXTERN? We know that internally it declares the box cfg ptr, but from the POV of the user it's just extern-declaring the box.

mazimkhan · 2016-10-26T09:17:25Z

retest

mazimkhan · 2016-10-26T10:14:42Z

retest uvisor

mazimkhan · 2016-10-26T10:21:23Z

build uvisor

Patater · 2016-11-07T17:06:52Z

Rebased on latest master

AlessandroA · 2016-11-10T14:27:50Z

core/debug/src/debug_box.c

    g_debug_box.initialized = 1;
 }
+
+/* XXX This is a bit platform specific. Consider moving to an platform specific


If this is meant to stay, please turn into a FIXME.

Yes, will move to a FIXME. We'll refactor when we have another architecture to add support for.

AlessandroA · 2016-11-10T14:30:40Z

core/vmpu/src/armv7m/vmpu_armv7m.c


    /* No recovery is possible if the MPU syndrome register is not valid. */
-    if (fault_status != 0x82) {
+    if (!((fault_status == 0x82) || (fault_status & 0x18))) {


Could you add a comment specifying what the magic numbers mean? I actually always wanted to use the CMSIS symbols for these checks, but it would make them much longer. What do you think?

Added a commit to switch magic numbers to CMSIS symbols for these fault status bits.

AlessandroA · 2016-11-10T14:33:54Z

core/vmpu/src/armv7m/vmpu_armv7m.c

+            DEBUG_FAULT(FAULT_MEMMANAGE, lr, lr & 0x4 ? psp : msp);
+            VMPU_SCB_MMFSR = fault_status;
+            HALT_ERROR(PERMISSION_DENIED, "Access to restricted resource denied");
+            debug_box_enter_from_priv(ipsr, lr);


Does this line imply that this path is only reachable by a non-recoverable from-privileged MemManage fault? If yes, please specify it as a comment in line 164.

Also, I don't understand the debug_box_enter_from_priv very well. The debug box is a "hidden feature", meaning that it doesn't have visibility from the uVisor core. Instead, halt or print functions internally revert to the debug box if present. What is this function going to do here? I would expect that this function is called from some debug-box-specific function before returning here.

No, debug_box_enter_from_priv (if that's the line you mean) is reachable from a non-privileged MemManage fault as well. debug_box_enter_from_priv does nothing when called from privileged mode, letting the ordinary EXC_RETURN (instead of a forged one) enter the debug box.

debug_box_enter_from_priv never returns if from called with an lr that indicates an exception from privileged mode. It makes a fake EXC_RETURN, to use the psp that debug_deprivilege_and_return made, and executes the EXC_RETURN. When called with an lr that indicates an exception from non-privileged mode, the real EXC_RETURN is used just as before.

I don't like this approach very much, because (i.) it leaves a lr pushed onto the original exception stack frame, and because (ii.) it creates a return shortcut which is not easy to track down (for example if you look at the code from https://github.com/ARMmbed/uvisor/blob/master/core/system/src/system.c#L96).

So after this PR it appears that is not granted anymore that we blindly return to the caller using EXC_RETURN as-is. We can change the isr_default_sys_handler wrapper to do something like:

mov r0, lr \n mrs r1, MSP \n bx vmpu_sys_mux_handler \n"

(note: No push of lr, bx instead of blx)

The wrapped function (vmpu_sys_mux_handler) would then do the actual return to lr, either as-is or changed.

@meriac What do you think?

I didn't like it either. I justified it to myself my thinking, "We are halting anyway. We don't care if we leave anything on the stack."

Your suggestion is an improvement in the sense that the stack is cleaned up. I can see the importance of leaving the uVisor stack clean when we support halting individual boxes.

AlessandroA · 2016-11-10T14:36:22Z

core/vmpu/src/armv7m/vmpu_armv7m.c

        case HardFault_IRQn:
            DEBUG_FAULT(FAULT_HARD, lr, lr & 0x4 ? psp : msp);
            HALT_ERROR(FAULT_HARD, "Cannot recover from a hard fault.");
+            debug_box_enter_from_priv(ipsr, lr);


See comments above.

AlessandroA · 2016-11-10T14:38:11Z

core/vmpu/src/kinetis/vmpu_kinetis.c

+            DEBUG_FAULT(FAULT_BUS, lr, lr & 0x4 ? psp : msp);
+            VMPU_SCB_BFSR = fault_status;
+            HALT_ERROR(PERMISSION_DENIED, "Access to restricted resource denied");
+            debug_box_enter_from_priv(ipsr, lr);


See comments above.

AlessandroA · 2016-11-10T14:41:35Z

core/system/src/unvic.c

+     * MemManage, BusFault, and UsageFault, so that we can recovery from
+     * stacking MemManage faults more simply. Set DebugMonitor, PendSV, SYSTICK
+     * to the same priority as SVC. */
+    NVIC_SetPriority(MemoryManagement_IRQn, 0);


Not sure if there's an elegant way to do this, but it seems to me that these hard-coded priority values are too decoupled from the __UVISOR_NVIC_MIN_PRIORITY symbol. For example, at least we should check that the highest priority number used here (1) is lower than the __UVISOR_NVIC_MIN_PRIORITY value.

We do validate the entire exception stack frame (context_validate_exc_sf) before reading the xpsr, but additionally use vmpu_unpriv_uint32_read to read the xpsr, just in case.

Fix comments and brace placement.

This brings parity with the K64F vmpu_sys_mux_handler.

Patater · 2016-11-10T15:24:06Z

Responded to Alessandro's comments

Patater · 2016-11-10T15:25:31Z

Tests pass on EFM32

Patater · 2016-11-10T15:37:31Z

Tests pass on K64F

Patater · 2016-11-14T11:12:55Z

Responded to Alessandro's comments and re-ran tests. Tests are passing on EFM32 and K64F.

AlessandroA · 2016-11-14T11:22:25Z

core/debug/src/debug_box.c


    /* Copy the xPSR from the source exception stack frame. */
-    uint32_t xpsr = ((uint32_t *) src_sp)[7];
+    uint32_t xpsr = vmpu_unpriv_uint32_read((uint32_t) &((uint32_t *) src_sp)[7]);


Isn't this an overkill? The context_validate_exc_sf function does exactly the same check (reads with the t instruction).

Yes, the commit description spells it out as not really necessary, as a non-recoverable fault would happen in context_validate_exc_sf, preventing further progress. As currently written, the unprivileged access instruction is not technically necessary.

However, from the defensive coding perspective, it makes sense to use the unprivileged access instruction. Maybe the context_validate_exc_sf goes away on accident or has a bug introduced on accident. It's also nice to consistently have all places where uVisor accesses unprivileged memory using unprivileged access instructions.

AlessandroA · 2016-11-14T11:23:37Z

core/system/src/unvic.c

+    /* Set the priority of each exception. SVC is lower priority than
+     * MemManage, BusFault, and UsageFault, so that we can recovery from
+     * stacking MemManage faults more simply. Set DebugMonitor, PendSV, SYSTICK
+     * to the same priority as SVC. */


The comment Set DebugMonitor, PendSV, SYSTICK to the same priority as SVC. is outdated now.

AlessandroA · 2016-11-14T11:25:36Z

core/debug/src/debug_box.c

+    from_svc = shcsr & SCB_SHCSR_SVCALLACT_Msk;
+
+    /* Make sure SVC is active. */
+    if (!from_svc) {


Can this happen? If not, maybe it's better to turn it into a debug-only runtime assertion.

OK. Debug makes sense because it'd be a stupid uVisor programmer's fault if this happened. If it does happen with a release build, a usage fault (which uVisor chooses not to recover from) will happen.

AlessandroA · 2016-11-14T11:26:59Z

core/debug/src/debug_box.c

+
+    /* Return to Thread mode and use the Process Stack for return. The PSP will
+     * have been changed already in debug_deprivilege_and_return via
+     * context_switch_in. */


Just to make sure that this comment does not become outdated when we change the code, you could turn it into something more general, as "It's the responsibility of the caller to se the PSP to a valid value.".

AlessandroA · 2016-11-14T11:27:53Z

core/vmpu/src/armv7m/vmpu_armv7m.c

+            /* If we are having an unstacking fault, we can't read the pc
+             * at fault. */
+            if (fault_status & (SCB_CFSR_MSTKERR_Msk | SCB_CFSR_MUNSTKERR_Msk)) {
+                /* fake pc */


s/fake/Fake

AlessandroA · 2016-11-14T13:05:58Z

core/vmpu/src/armv7m/vmpu_armv7m.c

+            DEBUG_FAULT(FAULT_MEMMANAGE, lr, lr & 0x4 ? psp : msp);
+            VMPU_SCB_MMFSR = fault_status;
+            HALT_ERROR(PERMISSION_DENIED, "Access to restricted resource denied");
+            lr = debug_box_enter_from_priv(lr);


Is this tested in release mode as well?

Yup. It works even in release mode. Tests pass on both K64F and EFM32GG in release and debug modes.

Setting the SVC as one lower in priority allows us to combine MPU recovery in one place, and not have to recover from a hard fault. We also use this commit as an opportunity to make the priorities of all system interrupts explicit.

AlessandroA

LGTM

Magic numbers are a bit hard to read. Use the standard CMSIS definitions for these fault status bits in an attempt to replace the magic number with something slightly more readable.

We hard fault if we try to read from the stack if the mem manage fault is a stacking or unstacking fault. Check the fault status before reading the pc from the stack to avoid the hard fault.

niklarm reviewed Oct 12, 2016

View reviewed changes

Patater mentioned this pull request Oct 12, 2016

Fix page heap alignment issues and ISR stack pointer issues #335

Merged

Patater force-pushed the stack-fault-recovery branch 2 times, most recently from 33da77d to 9f0a786 Compare October 25, 2016 15:40

Patater changed the title ~~WIP XXX Stack fault recovery~~ Stack fault recovery Oct 25, 2016

AlessandroA suggested changes Oct 25, 2016

View reviewed changes

Patater force-pushed the stack-fault-recovery branch 2 times, most recently from e52c258 to 2c24181 Compare October 26, 2016 09:11

Patater mentioned this pull request Oct 31, 2016

Dual flash space caps, and other small fixes #368

Merged

Patater force-pushed the stack-fault-recovery branch from 2c24181 to 6ececa9 Compare November 7, 2016 17:00

Patater force-pushed the stack-fault-recovery branch 2 times, most recently from 3fb9267 to 27e16ba Compare November 7, 2016 17:36

Patater changed the title ~~Stack fault recovery~~ WIP Stack fault recovery Nov 8, 2016

Patater force-pushed the stack-fault-recovery branch 3 times, most recently from 2057f69 to ee67526 Compare November 10, 2016 13:10

AlessandroA reviewed Nov 10, 2016

View reviewed changes

Patater added 3 commits November 10, 2016 15:11

debug_box: Read xpsr with vmpu_unpriv_uint32_read

faf1ae5

We do validate the entire exception stack frame (context_validate_exc_sf) before reading the xpsr, but additionally use vmpu_unpriv_uint32_read to read the xpsr, just in case.

vmpu: Fix style in vmpu_{arch}.c files

8861082

Fix comments and brace placement.

Print the IRQ number of the non-system interrupt

16dd8a2

This brings parity with the K64F vmpu_sys_mux_handler.

Update comment formatting in unvic_init

ad9eb48

Patater force-pushed the stack-fault-recovery branch from ee67526 to 00ab224 Compare November 10, 2016 15:23

Patater force-pushed the stack-fault-recovery branch 2 times, most recently from adf24a0 to f2304bd Compare November 10, 2016 15:45

Patater changed the title ~~WIP Stack fault recovery~~ Stack fault recovery Nov 10, 2016

Patater force-pushed the stack-fault-recovery branch from f2304bd to 664cf8f Compare November 10, 2016 17:23

Patater changed the title ~~Stack fault recovery~~ WIP Stack fault recovery Nov 11, 2016

Patater force-pushed the stack-fault-recovery branch 3 times, most recently from 66e5643 to 3fb59b0 Compare November 14, 2016 11:05

Patater changed the title ~~WIP Stack fault recovery~~ Stack fault recovery Nov 14, 2016

AlessandroA reviewed Nov 14, 2016

View reviewed changes

Patater force-pushed the stack-fault-recovery branch 2 times, most recently from e994db1 to 7e9a227 Compare November 14, 2016 11:59

AlessandroA reviewed Nov 14, 2016

View reviewed changes

Patater changed the title ~~Stack fault recovery~~ WIP Stack fault recovery Nov 14, 2016

Patater force-pushed the stack-fault-recovery branch from 7e9a227 to 1460319 Compare November 14, 2016 14:10

Patater changed the title ~~WIP Stack fault recovery~~ Stack fault recovery Nov 14, 2016

Patater force-pushed the stack-fault-recovery branch from 1460319 to dfbc8ca Compare November 14, 2016 15:57

Set SVC one lower in priority

f32d5ed

Setting the SVC as one lower in priority allows us to combine MPU recovery in one place, and not have to recover from a hard fault. We also use this commit as an opportunity to make the priorities of all system interrupts explicit.

AlessandroA approved these changes Nov 14, 2016

View reviewed changes

Patater added 2 commits November 14, 2016 15:59

vmpu: Replace 0x04 and 0x82 with CMSIS definitions

13f269f

Magic numbers are a bit hard to read. Use the standard CMSIS definitions for these fault status bits in an attempt to replace the magic number with something slightly more readable.

Recover from stacking errors

5d0f159

We hard fault if we try to read from the stack if the mem manage fault is a stacking or unstacking fault. Check the fault status before reading the pc from the stack to avoid the hard fault.

Patater force-pushed the stack-fault-recovery branch from dfbc8ca to 5d0f159 Compare November 14, 2016 15:59

AlessandroA merged commit 2f23bf0 into ARMmbed:master Nov 14, 2016

AlessandroA mentioned this pull request Nov 22, 2016

Create a page eviction API #361

Closed

Stack fault recovery #356

Stack fault recovery #356

Uh oh!

Conversation

Patater commented Oct 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

niklarm Oct 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meriac Oct 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mazimkhan commented Oct 26, 2016

Uh oh!

mazimkhan commented Oct 26, 2016

Uh oh!

mazimkhan commented Oct 26, 2016

Uh oh!

Patater commented Nov 7, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Patater Nov 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Patater commented Nov 10, 2016

Patater commented Oct 11, 2016 •

edited

Loading

niklarm Oct 12, 2016 •

edited

Loading

meriac Oct 12, 2016 •

edited

Loading

Patater Nov 10, 2016 •

edited

Loading

Patater Nov 14, 2016 •

edited

Loading