Skip to content

Conversation

@Patater
Copy link
Contributor

@Patater Patater commented Oct 11, 2016

Recover from stacking and unstacking faults. Reprioritize faults relative to SVC, so we can recover from more faults and also, as a bonus, get better debug messages. (Other miscellaneous fixes are also included.)

Tested on EFM32GG and K64F with a test that fails before this PR's commits are applied and passes after this PR's commits are applied.

@AlessandroA @meriac @niklas-arm

* overwritten by the porting engineer for the current platform. */
#ifndef UVISOR_PAGE_MAX_COUNT
#define UVISOR_PAGE_MAX_COUNT (16UL)
#define UVISOR_PAGE_MAX_COUNT (128UL)
Copy link
Contributor

@niklarm niklarm Oct 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @meriac wanted this to stay low. This results in (128 + 8 + 31) / 32) = 5 * uint32_t * #boxes (something like 5 * (5 + 1) * 4 = 30 * 4 = 120B).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Patater Yes, 128 is way too high. Why ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to reduce process stack sizes just for the tiny pages tests, so we need enough pages in the page heap for allocating all the process stacks plus the test secure allocator. The test secure allocator needs to have at least as many pages available as the number of subregions available for the page heap, which is probably around 48 (6 regions * 8 subregions). In practice, this means around 95 pages: I rounded that up to the nearest power of two to get 128.

If we add a page invalidation API, instead of trying to allocate so many pages that old ones get pushed out of the MPU configuration, then we can keep this count as low as we want.

Copy link
Contributor

@meriac meriac Oct 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Patater It's not acceptable to increase the pagecount for everybody.

  • either we make that value configurable
  • or create alternative uVisor-binaries for testing
  • or add the page eviction as an uVisor API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This doesn't increase the page count for everybody, only the maximum they could configure it to be. (UVISOR_SET_PAGE_HEAP configures the actual pages and their sizes at the platform configuration level).

Is it unacceptable to increase this maximum just because of the 160 bytes? I'm not understanding why it'd be unacceptable to increase this if the only cost were 160 bytes of RAM. What cost are you seeing that I'm not?

If nobody sets the number pages in the page heap to 128, they, as far as I'm aware, only pay the cost of this bitfield (and not any MPU regions or subregions). If this limit is just in place to prevent "stupid" parameters to UVISOR_SET_PAGE_HEAP, the benefit of better tests outweighs preventing the stupid; as platform configuring people are the ones intended to use UVISOR_SET_PAGE_HEAP (not application writers), this seems fine.

For making the test less fragile, I'm already leaning towards making the page eviction API and have been working on that approach today. The test can also test more interesting cases now, with finer control over the pages loaded into the MPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote the test so we don't need to enable many or small pages yet. We can enable those when we need them.

extern const void * g_page_heap_start;
/* Contains the page usage mapped by owner. */
extern uint32_t g_page_owner_map[UVISOR_MAX_BOXES][(UVISOR_PAGE_MAX_COUNT + 31) / 32];
extern uint32_t g_page_owner_map[UVISOR_MAX_BOXES][UVISOR_PAGE_MAP_COUNT];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@Patater Patater force-pushed the stack-fault-recovery branch 2 times, most recently from 33da77d to 9f0a786 Compare October 25, 2016 15:40
@Patater Patater changed the title WIP XXX Stack fault recovery Stack fault recovery Oct 25, 2016
\
UVISOR_EXTERN const __attribute__((section(".keep.uvisor.cfgtbl_ptr"), aligned(4))) void * const box_name ## _cfg_ptr = &box_name ## _cfg;

#define UVISOR_BOX_CFG_PTR_DECL(box_name) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we call this UVISOR_BOX_EXTERN? We know that internally it declares the box cfg ptr, but from the POV of the user it's just extern-declaring the box.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

*/
#define UVISOR_FLASH_LENGTH_MAX 0x9C00
#if NDEBUG
/* Release builds can be up to this big. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work when we preprocess the linker script? The options for the preprocessing step are the following:

LINKER_CONFIG:=\
    -D$(CONFIGURATION) \
    -I$(PLATFORM_DIR)/$(PLATFORM)/inc \
    -include $(CORE_DIR)/uvisor-config.h

But NDEBUG is not set there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I had a look into the release *.linker files and they are using the large debug limit. I'll have to set NDEBUG in the linker script generation step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

extern const __attribute__((section(".keep.uvisor.cfgtbl_ptr"), aligned(4))) void * const box_name ## _cfg_ptr = &box_name ## _cfg;
UVISOR_EXTERN const __attribute__((section(".keep.uvisor.cfgtbl_ptr"), aligned(4))) void * const box_name ## _cfg_ptr = &box_name ## _cfg;

#define UVISOR_BOX_CFG_PTR_DECL(box_name) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we call this UVISOR_BOX_EXTERN? We know that internally it declares the box cfg ptr, but from the POV of the user it's just extern-declaring the box.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@Patater Patater force-pushed the stack-fault-recovery branch 2 times, most recently from e52c258 to 2c24181 Compare October 26, 2016 09:11
@mazimkhan
Copy link

retest

@mazimkhan
Copy link

retest uvisor

@mazimkhan
Copy link

build uvisor

@Patater
Copy link
Contributor Author

Patater commented Nov 7, 2016

Rebased on latest master

@Patater Patater force-pushed the stack-fault-recovery branch 2 times, most recently from 3fb9267 to 27e16ba Compare November 7, 2016 17:36
@Patater Patater changed the title Stack fault recovery WIP Stack fault recovery Nov 8, 2016
@Patater Patater force-pushed the stack-fault-recovery branch 3 times, most recently from 2057f69 to ee67526 Compare November 10, 2016 13:10
g_debug_box.initialized = 1;
}

/* XXX This is a bit platform specific. Consider moving to an platform specific
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is meant to stay, please turn into a FIXME.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will move to a FIXME. We'll refactor when we have another architecture to add support for.


/* No recovery is possible if the MPU syndrome register is not valid. */
if (fault_status != 0x82) {
if (!((fault_status == 0x82) || (fault_status & 0x18))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment specifying what the magic numbers mean? I actually always wanted to use the CMSIS symbols for these checks, but it would make them much longer. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a commit to switch magic numbers to CMSIS symbols for these fault status bits.

DEBUG_FAULT(FAULT_MEMMANAGE, lr, lr & 0x4 ? psp : msp);
VMPU_SCB_MMFSR = fault_status;
HALT_ERROR(PERMISSION_DENIED, "Access to restricted resource denied");
debug_box_enter_from_priv(ipsr, lr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this line imply that this path is only reachable by a non-recoverable from-privileged MemManage fault? If yes, please specify it as a comment in line 164.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't understand the debug_box_enter_from_priv very well. The debug box is a "hidden feature", meaning that it doesn't have visibility from the uVisor core. Instead, halt or print functions internally revert to the debug box if present. What is this function going to do here? I would expect that this function is called from some debug-box-specific function before returning here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, debug_box_enter_from_priv (if that's the line you mean) is reachable from a non-privileged MemManage fault as well. debug_box_enter_from_priv does nothing when called from privileged mode, letting the ordinary EXC_RETURN (instead of a forged one) enter the debug box.

Copy link
Contributor Author

@Patater Patater Nov 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug_box_enter_from_priv never returns if from called with an lr that indicates an exception from privileged mode. It makes a fake EXC_RETURN, to use the psp that debug_deprivilege_and_return made, and executes the EXC_RETURN. When called with an lr that indicates an exception from non-privileged mode, the real EXC_RETURN is used just as before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this approach very much, because (i.) it leaves a lr pushed onto the original exception stack frame, and because (ii.) it creates a return shortcut which is not easy to track down (for example if you look at the code from https://github.com/ARMmbed/uvisor/blob/master/core/system/src/system.c#L96).

So after this PR it appears that is not granted anymore that we blindly return to the caller using EXC_RETURN as-is. We can change the isr_default_sys_handler wrapper to do something like:

mov r0, lr \n
mrs r1, MSP \n
bx vmpu_sys_mux_handler \n"

(note: No push of lr, bx instead of blx)

The wrapped function (vmpu_sys_mux_handler) would then do the actual return to lr, either as-is or changed.

@meriac What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't like it either. I justified it to myself my thinking, "We are halting anyway. We don't care if we leave anything on the stack."

Your suggestion is an improvement in the sense that the stack is cleaned up. I can see the importance of leaving the uVisor stack clean when we support halting individual boxes.

case HardFault_IRQn:
DEBUG_FAULT(FAULT_HARD, lr, lr & 0x4 ? psp : msp);
HALT_ERROR(FAULT_HARD, "Cannot recover from a hard fault.");
debug_box_enter_from_priv(ipsr, lr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

DEBUG_FAULT(FAULT_BUS, lr, lr & 0x4 ? psp : msp);
VMPU_SCB_BFSR = fault_status;
HALT_ERROR(PERMISSION_DENIED, "Access to restricted resource denied");
debug_box_enter_from_priv(ipsr, lr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above.

* MemManage, BusFault, and UsageFault, so that we can recovery from
* stacking MemManage faults more simply. Set DebugMonitor, PendSV, SYSTICK
* to the same priority as SVC. */
NVIC_SetPriority(MemoryManagement_IRQn, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there's an elegant way to do this, but it seems to me that these hard-coded priority values are too decoupled from the __UVISOR_NVIC_MIN_PRIORITY symbol. For example, at least we should check that the highest priority number used here (1) is lower than the __UVISOR_NVIC_MIN_PRIORITY value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

We do validate the entire exception stack frame
(context_validate_exc_sf) before reading the xpsr, but additionally use
vmpu_unpriv_uint32_read to read the xpsr, just in case.
Fix comments and brace placement.
This brings parity with the K64F vmpu_sys_mux_handler.
@Patater Patater force-pushed the stack-fault-recovery branch from ee67526 to 00ab224 Compare November 10, 2016 15:23
@Patater
Copy link
Contributor Author

Patater commented Nov 10, 2016

Responded to Alessandro's comments

@Patater
Copy link
Contributor Author

Patater commented Nov 10, 2016

Tests pass on EFM32

@Patater
Copy link
Contributor Author

Patater commented Nov 10, 2016

Tests pass on K64F

@Patater Patater force-pushed the stack-fault-recovery branch 2 times, most recently from adf24a0 to f2304bd Compare November 10, 2016 15:45
@Patater Patater changed the title WIP Stack fault recovery Stack fault recovery Nov 10, 2016
@Patater Patater force-pushed the stack-fault-recovery branch from f2304bd to 664cf8f Compare November 10, 2016 17:23
@Patater Patater changed the title Stack fault recovery WIP Stack fault recovery Nov 11, 2016
@Patater Patater force-pushed the stack-fault-recovery branch 3 times, most recently from 66e5643 to 3fb59b0 Compare November 14, 2016 11:05
@Patater
Copy link
Contributor Author

Patater commented Nov 14, 2016

Responded to Alessandro's comments and re-ran tests. Tests are passing on EFM32 and K64F.

@Patater Patater changed the title WIP Stack fault recovery Stack fault recovery Nov 14, 2016

/* Copy the xPSR from the source exception stack frame. */
uint32_t xpsr = ((uint32_t *) src_sp)[7];
uint32_t xpsr = vmpu_unpriv_uint32_read((uint32_t) &((uint32_t *) src_sp)[7]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this an overkill? The context_validate_exc_sf function does exactly the same check (reads with the t instruction).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the commit description spells it out as not really necessary, as a non-recoverable fault would happen in context_validate_exc_sf, preventing further progress. As currently written, the unprivileged access instruction is not technically necessary.

However, from the defensive coding perspective, it makes sense to use the unprivileged access instruction. Maybe the context_validate_exc_sf goes away on accident or has a bug introduced on accident. It's also nice to consistently have all places where uVisor accesses unprivileged memory using unprivileged access instructions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/* Set the priority of each exception. SVC is lower priority than
* MemManage, BusFault, and UsageFault, so that we can recovery from
* stacking MemManage faults more simply. Set DebugMonitor, PendSV, SYSTICK
* to the same priority as SVC. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment Set DebugMonitor, PendSV, SYSTICK to the same priority as SVC. is outdated now.

from_svc = shcsr & SCB_SHCSR_SVCALLACT_Msk;

/* Make sure SVC is active. */
if (!from_svc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this happen? If not, maybe it's better to turn it into a debug-only runtime assertion.

Copy link
Contributor Author

@Patater Patater Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Debug makes sense because it'd be a stupid uVisor programmer's fault if this happened. If it does happen with a release build, a usage fault (which uVisor chooses not to recover from) will happen.


/* Return to Thread mode and use the Process Stack for return. The PSP will
* have been changed already in debug_deprivilege_and_return via
* context_switch_in. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure that this comment does not become outdated when we change the code, you could turn it into something more general, as "It's the responsibility of the caller to se the PSP to a valid value.".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

/* If we are having an unstacking fault, we can't read the pc
* at fault. */
if (fault_status & (SCB_CFSR_MSTKERR_Msk | SCB_CFSR_MUNSTKERR_Msk)) {
/* fake pc */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/fake/Fake

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@Patater Patater force-pushed the stack-fault-recovery branch 2 times, most recently from e994db1 to 7e9a227 Compare November 14, 2016 11:59
DEBUG_FAULT(FAULT_MEMMANAGE, lr, lr & 0x4 ? psp : msp);
VMPU_SCB_MMFSR = fault_status;
HALT_ERROR(PERMISSION_DENIED, "Access to restricted resource denied");
lr = debug_box_enter_from_priv(lr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this tested in release mode as well?

Copy link
Contributor Author

@Patater Patater Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. It works even in release mode. Tests pass on both K64F and EFM32GG in release and debug modes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Patater Patater changed the title Stack fault recovery WIP Stack fault recovery Nov 14, 2016
@Patater Patater force-pushed the stack-fault-recovery branch from 7e9a227 to 1460319 Compare November 14, 2016 14:10
@Patater Patater changed the title WIP Stack fault recovery Stack fault recovery Nov 14, 2016
@Patater Patater force-pushed the stack-fault-recovery branch from 1460319 to dfbc8ca Compare November 14, 2016 15:57
Setting the SVC as one lower in priority allows us to combine MPU
recovery in one place, and not have to recover from a hard fault. We
also use this commit as an opportunity to make the priorities of all
system interrupts explicit.
Copy link
Contributor

@AlessandroA AlessandroA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Magic numbers are a bit hard to read. Use the standard CMSIS definitions
for these fault status bits in an attempt to replace the magic number
with something slightly more readable.
We hard fault if we try to read from the stack if the mem manage fault is a
stacking or unstacking fault. Check the fault status before reading the pc
from the stack to avoid the hard fault.
@Patater Patater force-pushed the stack-fault-recovery branch from dfbc8ca to 5d0f159 Compare November 14, 2016 15:59
@AlessandroA AlessandroA merged commit 2f23bf0 into ARMmbed:master Nov 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants