Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MbedOS Error Status: 0x80FF013D Code: 317 Module: 255 with wait(), always occurs after programming or reset, never occurs after power cycle #10339

Closed
Hoel opened this issue Apr 8, 2019 · 25 comments

Comments

@Hoel
Copy link

Hoel commented Apr 8, 2019

I encounter a fault exception / Mbed error caused by the wait() function, and for some reason this error only occurs when the project is compiled and run from SW4STM32, the same project compiled from CLI doesnt not show the error. Also it never occurs again after the target has been power cycled.

Here is the full error :

++ MbedOS Fault Handler ++

FaultType: HardFault

Context:
R0 : 00001000
R1 : 00000001
R2 : E000ED00
R3 : 00000000
R4 : 00000000
R5 : 00000000
R6 : 00000000
R7 : 00000000
R8 : 00000000
R9 : 00000000
R10 : 00000000
R11 : 00000000
R12 : 00000000
SP : 20001708
LR : 08006F2F
PC : 08006F30
xPSR : 61000000
PSP : 200016E8
MSP : 20007FC0
CPUID: 412FC230
HFSR : 40000000
MMFSR: 00000000
BFSR : 00000000
UFSR : 00000008
DFSR : 00000008
AFSR : 00000000
Mode : Thread
Priv : Privileged
Stack: PSP

-- MbedOS Fault Handler --

++ MbedOS Error Info ++
Error Status: 0x80FF013D Code: 317 Module: 255
Error Message: Fault exception
Location: 0x8000651
Error Value: 0x8006F30
Current Thread: rtx_idle Id: 0x200011F0 Entry: 0x8003669 StackSize: 0x200 StackMem: 0x20001538 SP: 0x20007F48
Next:
rtx_idle State: 0x2 Entry: 0x08003669 Stack Size: 0x00000200 Mem: 0x20001538 SP: 0x200016F8
Ready:
Wait:
rtx_timer State: 0x83 Entry: 0x080053B1 Stack Size: 0x00000300 Mem: 0x20001238 SP: 0x200014D0
Delay:
main State: 0x13 Entry: 0x080035AD Stack Size: 0x00001000 Mem: 0x20001790 SP: 0x200026F0
rtx_idle State: 0x2 Entry: 0x08003669 Stack Size: 0x00000200 Mem: 0x20001538 SP: 0x200016F8
For more info, visit: https://mbed.com/s/error?error=0x80FF013D&tgt=L80AB_L151CC
-- MbedOS Error Info --

Mbed is the very last revision, i created the project directly from CLI yesterday.
The main.cpp only blinks a LED and print a message on UART, nothing else in it.
When the project is compield from CLI and uploaded with st-flash there is no error at all.
when the project is exported to SW4STM32 (CLI export with -z option) and build / run from there the hardfault always occurs after programming, even with a manual reset (reset tact switch), however if i power cycle the target there is no subsequent error at all (even after multiple hardware reset). This behavior has been repeated and is is 100% consistent.

To make sure it was not a problem with GCC i setup SW4STM32 PATH to point to the Mbed CLI compiler, so the exact same compiler (from theotherjimmy mbed-cli-osx-installer https://github.com/ARMmbed/mbed-cli-osx-installer/releases/tag/v0.0.10 ) is used in both cases.

I also know that the fault comes from wait() since if i remove this statement then there is no error when built / run from SW4STM32.

Once again, when the exact same project is built from CLI (and run with st-flash) there is no fault at all.

The current target is an XDOT_L151CC which has been modified with 8MHz crystal, the set_sysclock function has been directly generated by STM32CubeMX, oher than that no changes have been made to the original target files.

The issue is 100% reproductible, and i believe it can be reproduced to other targets as well.
Here is how to reproduce it:
-create a new project from CLI
-add a blinking LED with a wait() statement in the main loop
-export the project for SW4STM32 with "CLI export" and "-z" option
-open the resulting project in SW4STM32

then

-build the project from CLI and upload to target with st-flash => no fault
-build the project from SW4STM32 and upload to target (run button) => fault exception
*hardware reset the target => still fault exception
*power cycle the target => no fault
*subsequent hardware reset => no fault

Of course i can also provide the two projects if needed.

Issue request type

[ ] Question
[ ] Enhancement
[x] Bug
@0xc0170
Copy link
Contributor

0xc0170 commented Apr 8, 2019

How does the project settings compare (cli vs exporter) ? Are they 100 % same?

cc @ARMmbed/team-st-mcd

@Hoel
Copy link
Author

Hoel commented Apr 8, 2019

yes, they are exactly the same, the SW4STM32 project has been exported directly from the CLI project and no change have been made on SW4STM32 project settings.

@Hoel
Copy link
Author

Hoel commented Apr 8, 2019

side note, if i use :
wait_ms(3) => no fault
wait_ms(300) => fault
wait(0.03) => fault
wait(0.003) => fault

@Hoel
Copy link
Author

Hoel commented Apr 8, 2019

OK, i found the problem, the issue was caused by the systick setup that was still in set_sysclock function taken from STM32CubeMx, it should have been removed, i overlooked that.

culprit :
HAL_SYSTICK_Config(HAL_RCC_GetHCLKFreq()/1000);
HAL_SYSTICK_CLKSourceConfig(SYSTICK_CLKSOURCE_HCLK);
HAL_NVIC_SetPriority(SysTick_IRQn, 0, 0);

@Hoel Hoel closed this as completed Apr 8, 2019
@Hoel Hoel reopened this Apr 8, 2019
@Hoel
Copy link
Author

Hoel commented Apr 8, 2019

Unfortunately, the issue is only partially solved : if i set wait(1.0) it works fine, but if i set wait(0.3) then the fault is back; so something else is wrong.

@Hoel
Copy link
Author

Hoel commented Apr 8, 2019

after hundreds of debug step, error seems to originate after OsMutexAcquire()
image

image

image

@ciarmcom
Copy link
Member

ciarmcom commented Apr 8, 2019

Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-1129

@Hoel
Copy link
Author

Hoel commented Apr 8, 2019

for verification i tested with wait(0.3) on the CLI project, no error, so the problem definitely occurs only on SW4STM32 and only if the wait() value is under 1000ms.

@jeromecoutant
Copy link
Collaborator

@Hoel - Please could you check if #10367 impacts your result?
Thx

@jeromecoutant
Copy link
Collaborator

@deepikabhavnani

@Hoel
Copy link
Author

Hoel commented Apr 11, 2019

@jeromecoutant OK i check that

@Hoel
Copy link
Author

Hoel commented Apr 11, 2019

@jeromecoutant tickless is not enabled in my case, does it matter? i made further debug last days and i feel the problem is somehow related to stdio retarget and possibly delay(), i also noticed very weird behaviors, such as hard fault occuring when only adding a second delay() statement in the main loop (which only blink a LED) or stdio::printf mysteriously stop working (so no more output when harfault occurs except the error LED blink) while serial::printf continue to work normally

@jeromecoutant
Copy link
Collaborator

no, #10367 increases idle thread size when there is no more compilation optimization option.

See #9106 (comment)

@Hoel
Copy link
Author

Hoel commented Apr 11, 2019

ok, i tried it and it didnt worked, hardfault right after programming. BTW you can see how it didnt print the printf statement (HEL version), but printed correctly the mbed error which uses stdio too, that is very weird. I set it to 512
image

image
image

@Hoel
Copy link
Author

Hoel commented Apr 11, 2019

@jeromecoutant Oh, i disabled all optimisations and it no more hardfault after programming nor after power cycle. Binary size is considerably increased so its probably not a long term option, but at least for now it seem to work, i will try more tests to see if its consistent.

@Hoel
Copy link
Author

Hoel commented Apr 11, 2019

@jeromecoutant
It is working consistently with various delay() / led blink tests (failed previously), i did not get any further hardfault after soft reset or power cycle. That said major problems still persist, the stdio retarget is not working consistently (doesnt work) and more important, the SX1280 radio (SPI) cannot initialize properly, while it works fine when the same project is compiled from the CLI.

@Hoel
Copy link
Author

Hoel commented Apr 11, 2019

@jeromecoutant
here i reproduced the radio init which fails, first there is an mbed error on the last __disable_irq( ) statement, i cannot immediately see a good reason for this.

image
image

then if ever i remove the last __disable_irq( ) statement it will hang forver on Wait4Busy(), i did not check the SPI transaction with analyzer yet but i checked the state of BSY GPIO and it is high, which mean the radio is not intialized properly, so most likely something wrong with SPI communication. Again the exact same with CLI works fine, that said i also have to remove the last __disable_irq( ) otherwise i get mbed error.

EDIT:
I went ahead and extracted the function to read register and get firmware revision from the libary, commented the __disable_irq() statements, and that way it works, the radio is initialized properly, it returns the correct revision value. There is definitely something very wrong here since If i comment the __disable_irq() statements in the library it still fails and hang forever on Wait4Busy().

image

image

@Hoel
Copy link
Author

Hoel commented Apr 12, 2019

I made made a minimal test code to reproduce the mutex error directly, it occurs in osMutexAcquire when the SPI is locked after __disable_irq().
By the way, the error is not displayed in console, i only have the LED error pattern and nothing is sent to UART... The error message should be sent since when I step debug i reach mbed_error_puts.

image

image

image

EDIT

If i add a printf statement at the beginning of the test, it is not sent to UART but afterwards the MBED error is printed correctly, all that is not very reassuring.

image

image

@jeromecoutant
Copy link
Collaborator

Hi
To be honest, text copy paste is better than picture...

@Hoel
Copy link
Author

Hoel commented Apr 12, 2019

@jeromecoutant
Well , right, i will paste but then the formatting is messed

`int main(){

uart3.printf("[MBED] init ok\r");
printf("test");

RadioSpi = new SPI( PA_7, PA_6, PA_5 );

__disable_irq( );
RadioSpi->lock();
RadioSpi->unlock();
__enable_irq( );

uart3.printf("[MBED] test finished\r");

while (true) {
    led1 = 0;
    wait(0.005);
    led1 = 1;
    wait(2);
}

}`

`[MBED] init ok

++ MbedOS Error Info ++
Error Status: 0x80010133 Code: 307 Module: 1
Error Message: Mutex: 0x20002C2C, Not allowed in ISR context
Location: 0x8004B13
File: mbed_rtx_handlers.c+132
Error Value: 0x20002C2C
Current Thread: main Id: 0x200012D8 Entry: 0x8004783 StackSize: 0x1000 StackMem: 0x20001C28 SP: 0x20002B0C
Next:
main State: 0x2 Entry: 0x08004783 Stack Size: 0x00001000 Mem: 0x20001C28 SP: 0x20002BE8
Ready:
rtx_idle State: 0x1 Entry: 0x080048F1 Stack Size: 0x00000400 Mem: 0x20001448 SP: 0x20001808
Wait:
rtx_timer State: 0x83 Entry: 0x08007ABD Stack Size: 0x00000300 Mem: 0x20001848 SP: 0x20001AB8
Delay:
For more info, visit: https://mbed.com/s/error?error=0x80010133&tgt=L80AB_L151CC
-- MbedOS Error Info --`

@jeromecoutant
Copy link
Collaborator

You shoud use core_util_critical_section_enter() and core_util_critical_section_exit() functions instead of __disable_irq and __enable_irq

@kjbracey-arm

@kjbracey
Copy link
Contributor

kjbracey commented May 28, 2019

Like most HAL APIs other than really low-level ones like DigitalIn/Out, SPI is made thread-safe by a mutex. So you can't use it with interrupts disabled.

I don't know why you're disabling interrupts here - you have no interrupt handlers.

If you really need to you can inherit from SPI or other similar classes to override the virtual lock and unlock methods to stop it using the mutex, but I doubt that's the answer here.

If you do have some other code not shown here which does have an interrupt handler, and that needs its interrupts disabled, then rather than globally disabling all interrupts, temporarily remove your specific interrupt handler during the reset, by attaching NULL to your InterruptIn. (Or use InterruptIn::disable_irq()).

@kjbracey
Copy link
Contributor

And yes, the enter/exit_critical section is preferred because it handles the case where there's another OS layer underneath Mbed OS, or something that needs super-fast IRQ response - it's an abstraction that can leave some interrupts enabled, rather than disabling all core IRQs. On most devices it is the same thing though.

@jeromecoutant
Copy link
Collaborator

Could we close this issue ?

@0xc0170
Copy link
Contributor

0xc0170 commented Feb 20, 2020

Could we close this issue ?

We will close this issue, as there has not been any update for more than a half year. You can reopen with an update if this issue still needs fixing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants