cpu/cortexm: add NOP(); after WFI(); for stm32l152 to avoid hardfault #8518

kYc0o · 2018-02-05T16:28:18Z

Contribution description

Currently, the stm32l1x hardfaults due to the irq state being stored in r0, which for some reason is lost after wake-up.
~~This PR fixes that by ensuring that the state is being stored in RAM.~~
The actual, less intrusive, fix is to add a __NOP(); just after wake up, which also solves the problem.

It additionally allows to choose a different Low Speed clock source, since by default LSE (external) is hardcoded. Although this might be in another PR, users of an old revision of nucleo boards cannot test (so far one of the two stm32l1x based supported boards).

Issues/PRs references

Refs #8024

aabadie

Minor typo found

aabadie · 2018-02-05T16:29:40Z

boards/nucleo-l152/include/periph_conf.h

+ * 1: external crystal available (always 32.768kHz)
+ *
+ * LSE might not be available by default in early (C-01) Nucleo boards.
+ * If you're sure it is present, define CLOCL_LSE=1 in your project


CLOCK_LSE

Oops, addressed.

aabadie · 2018-02-05T16:36:33Z

I just tested the shell with examples/default on hardware with and without LSE (I have rev C-03 with a LSE):

With this PR: works when not using LSE
Without this PR: crash when not using LSE
With and without this PR: no shell available in both cases. Maybe the clock init is broken when using LSE ?

kYc0o · 2018-02-05T16:41:32Z

Which boards are you testing?

aabadie · 2018-02-05T16:42:28Z

Which boards are you testing?

Sorry, it was not clear: I have nucleo-l152 rev c-03

kYc0o · 2018-02-05T16:49:14Z

Can you try to debug and see if it doesn't get stuck here:

0x08001530 in stmclk_enable_lfclk () at /Users/facosta/git/RIOT-OS/RIOT/cpu/stm32_common/stmclk_common.c:79
79	        while (!(RCC->REG_LSE & BIT_LSERDY)) {}

kaspar030

Please split the PR's. They're dealing with unrelated issues...

kaspar030 · 2018-02-05T20:53:33Z

cpu/cortexm_common/include/cpu.h

@@ -104,7 +104,8 @@ static inline void cortexm_sleep(int deep)
    }

    /* ensure that all memory accesses have completed and trigger sleeping */
-    unsigned state = irq_disable();
+    /* avoid state to be stored in r0 (causes fault in some platforms) */
+    volatile unsigned state = irq_disable();


maybe make this optional?

I see no problem on that, however I cannot see either why this is harmful, and the difference on size is only 4 bytes.

well, it is also an extra memory access (vs. register) on every sleep. Don't know if that matters.

Any other opinions?

It just makes the variable stored in the stack, so its less than a function call.

kYc0o · 2018-02-06T09:56:18Z

Please split the PR's. They're dealing with unrelated issues...

Done.

jnohlgard · 2018-02-06T10:18:24Z

I would prefer if we could find out why r0 is lost on resume from sleep. Is there a hardware issue with these CPUs or is there a bug in the implementation of one of the ISRs in the periph drivers?

kaspar030 · 2018-02-06T10:25:04Z

I would prefer if we could find out why r0 is lost on resume from sleep.

Definitely... But seeing the time that has been put into debugging this already, IMO it is fine to go with a workaround for now.

Still, it should be documented as such (and not as necessary solution to a problem) and only be enabled for affected platforms.

kYc0o · 2018-02-06T10:57:15Z

I would prefer if we could find out why r0 is lost on resume from sleep. Is there a hardware issue with these CPUs or is there a bug in the implementation of one of the ISRs in the periph drivers?

As fas as I can tell any interrupt would cause the hardfault (e.g. hello-world example "works", but if you type something in the terminal it hardfaults), thus I'd discard the possibility of a faulty periph implementation.

I'd also like to get #8402 in since it also helps in this situation, actually I thought it would solve the original problem but unfortunately not. The same with #8403. I can debug more to see the "real" source, if there's one, but for now I'd like to have the platform working again before doing "major" reworking on clock and pm.

kYc0o · 2018-02-06T11:01:30Z

I would prefer if we could find out why r0 is lost on resume from sleep.

Definitely... But seeing the time that has been put into debugging this already, IMO it is fine to go with a workaround for now.

I suspect it's something to do with the power modes which maybe need to be configured before going to sleep. In this situation it "wildly" goes to sleep so I'd expect an undefined behaviour of registers and peripherals, here maybe we are just observing a part of it. Thus the importance of #8403 and #8402 .

kYc0o · 2018-02-06T11:03:28Z

I also insist on investigating #8024 (comment)

kaspar030 · 2018-02-06T11:06:04Z

I'd also like to get #8402 in since it also helps in this situation

It also solves the hard fault?

kYc0o · 2018-02-06T11:06:37Z

It also solves the hard fault?

No, that's why I came to this fix.

kaspar030 · 2018-02-06T11:06:56Z

In this situation it "wildly" goes to sleep

What exactly does this mean? Which situation, why "wildly"?

kYc0o · 2018-02-06T11:17:58Z

What exactly does this mean? Which situation, why "wildly"?

What I mean is that currently we put all cortexes to sleep regardless of the implementation of pm_set_lowest. Whenerver the idle thread comes to be scheduled __WFI is called, and in platforms where there's no PM implementation it just sleeps without configuring any power/clock mode. I didn't dig deep enough (yet) into the manual (I'm reading as much as I can), but it seems to me that stm32l1x behaves quite differently than other platforms and might require some configuration before just going to sleep. I made extensive tests on other non "L" platforms, namely "F" and the hardfault is not present, even though the compiler still saves state in r0.

I'll come with a more formal explanation why r0 is lost in this case (IMHO due to the lack of configuration before sleep) later in other issue or this thread if it's not being merged by then. For that I need to succeed to configure clock/pm as needed and experiment with it.

kYc0o · 2018-02-06T15:12:45Z

Some findings (thanks @kaspar030) suggest that this MCU is particularly behaving "wrong" after idling or sleeping. A simple __NOP(); works also for this platform, instead of making the variable volatile.

I'll change this PR to reflect the new solution which seems much more intrusive than the current one.

kaspar030 · 2018-02-06T15:17:54Z

I'll change this PR to reflect the new solution which seems much more intrusive than the current one.

;)

jnohlgard · 2018-02-06T15:24:18Z

@kYc0o Where exactly do you add the nop?

kYc0o · 2018-02-06T15:41:57Z

@kYc0o Where exactly do you add the nop?

Just after __WFI();

cladmi · 2018-02-06T16:10:24Z

I would still like to know if it can be broken with the state stored on stack with a specific number of nop instructions.
If there is a solution that reliably works, I would find it better than just a magic number of nop because we know that the compiler does not add one.

Could it be something like one of the instruction should be aligned on 4bytes address and is not for this platform because the compiler does crap ?

Is there the same alignment requirement for instructions than for memory access ?

kYc0o · 2018-02-06T16:12:55Z

I was doing some tests and didn't crash with several NOPs. Thus, I don't experience the same as the issue on the ST webpage.

jnohlgard · 2018-02-06T16:19:55Z

Is the r0 corruption completely random or is it always the same?
Could it be that the memory for the stacked r0 is corrupted in some way during the ISR execution?

kYc0o · 2018-02-08T17:39:13Z

@kYc0o, can you also add a reference to the PR that deactivate LSE in the nucleo-l152 board peripheral configuration ? I can't find it. Because to have the nucleo-l152 working again (with both revisions, AFAIU), I think it will be required as soon as this PR gets merged.

There was no other PR, unless you mean my first attempt which I don't consider as good.

I'll open a second PR after this gets merged. However that wouldn't really be the solution for your problem since it should work with the external crystal anyways.

PS. I changed the description.

aabadie · 2018-02-08T17:40:29Z

However that wouldn't really be the solution for your problem since it should work with the external crystal anyways.

Indeed, but at least this would allow people to use this board, which is not possible at the moment.

aabadie · 2018-02-08T17:42:20Z

PS. I changed the description.

strikethrough text is not enough I think, please change fixes to refs. This is just to avoid the initial issue to be closed when this PR will be merged.

kYc0o · 2018-02-09T13:52:52Z

Ok @aabadie it seems you changed to refs, so you ACK?

aabadie

so you ACK?

ACK :)

kYc0o · 2018-02-09T14:02:47Z

Ok so there's only @kaspar030 ACK left.

aabadie · 2018-02-12T13:16:11Z

@kaspar030 do you ACK this one ?

kaspar030 · 2018-02-12T13:18:03Z

cpu/cortexm_common/include/cpu.h

@@ -107,6 +107,14 @@ static inline void cortexm_sleep(int deep)
    unsigned state = irq_disable();
    __DSB();
    __WFI();
+    /*


Let's move the comment into the ifdef, and make it a little shorter. E.g.:

/* STM32L152RE crashes without this __NOP(). See #8518. */

kaspar030

ACK.

aabadie · 2018-02-12T14:09:03Z

@kYc0o, please squash

kYc0o · 2018-02-12T14:10:46Z

Squashed.

- The __NOP() that was added in RIOT-OS#8518 is now remooved. - When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When enabled, a hardfault occured when returning from a branch to irq_restore() we avoid the call by inlining the function call. See #xxxxx for more details.

- The __NOP() that was added in RIOT-OS#8518 is now remooved. - When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When enabled, a hardfault occured when returning from a branch to irq_restore() we avoid the call by inlining the function call. See RIOT-OS#11830 for more details.

kYc0o added this to the Release 2018.04 milestone Feb 5, 2018

aabadie requested changes Feb 5, 2018

View reviewed changes

kaspar030 requested changes Feb 5, 2018

View reviewed changes

kYc0o force-pushed the stm32l1_temp_fix branch from 0f14567 to f8fa21e Compare February 6, 2018 09:55

kaspar030 changed the title ~~cpu/stm32l1: fix hardfault after wake-up~~ cpu/cortexm: force irq state variable to RAM in cortexm_sleep() Feb 6, 2018

kYc0o mentioned this pull request Feb 9, 2018

stm32l1/stmclk: refactor clck init to use stmclk.h, add msi option #8402

Closed

aabadie approved these changes Feb 9, 2018

View reviewed changes

This was referenced Feb 9, 2018

boards/nucleo-l152: configure LSI by default #8545

Merged

cpu: stm32l1: add flashpage writing support #7712

Closed

kaspar030 requested changes Feb 12, 2018

View reviewed changes

kYc0o force-pushed the stm32l1_temp_fix branch from ff54170 to 933f281 Compare February 12, 2018 13:42

kaspar030 approved these changes Feb 12, 2018

View reviewed changes

kYc0o force-pushed the stm32l1_temp_fix branch from 933f281 to ac93283 Compare February 12, 2018 14:10

cpu/cortexm_common: add NOP after WFI to avoid hardfault on stm32l152

ac93283

aabadie merged commit f5da8c2 into RIOT-OS:master Feb 12, 2018

fjmolinas mentioned this pull request May 16, 2019

stm32l1/vendor: update vendor files to v2.3.0 #11489

Merged

fjmolinas mentioned this pull request Jul 9, 2019

stm32l152re: hard-fault unless power-cycled after flash, or depending on optimization #11820

Closed

fjmolinas mentioned this pull request Jul 12, 2019

stm32l152re: fix hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI() #11830

Closed

fjmolinas mentioned this pull request Jul 25, 2019

cpu/cortexm_common: replace irq_restore by __set_PRIMASK for stm32l152re #11919

Merged

kYc0o deleted the stm32l1_temp_fix branch May 4, 2020 11:19

fjmolinas mentioned this pull request May 4, 2020

stm32152re: hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI() #14015

Closed

Carton32 mentioned this pull request Aug 22, 2022

[puf_sram] Hardfault #18468

Closed

cpu/cortexm: add __NOP(); after __WFI(); for stm32l152 to avoid hardfault #8518

cpu/cortexm: add __NOP(); after __WFI(); for stm32l152 to avoid hardfault #8518

Conversation

kYc0o commented Feb 5, 2018 • edited by aabadie Loading

Contribution description

Issues/PRs references

aabadie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aabadie commented Feb 5, 2018

kYc0o commented Feb 5, 2018

aabadie commented Feb 5, 2018

kYc0o commented Feb 5, 2018

kaspar030 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kYc0o commented Feb 6, 2018

jnohlgard commented Feb 6, 2018

kaspar030 commented Feb 6, 2018

kYc0o commented Feb 6, 2018

kYc0o commented Feb 6, 2018

kYc0o commented Feb 6, 2018 • edited Loading

kaspar030 commented Feb 6, 2018

kYc0o commented Feb 6, 2018

kaspar030 commented Feb 6, 2018

kYc0o commented Feb 6, 2018

kYc0o commented Feb 6, 2018

kaspar030 commented Feb 6, 2018

jnohlgard commented Feb 6, 2018

kYc0o commented Feb 6, 2018

cladmi commented Feb 6, 2018

kYc0o commented Feb 6, 2018

jnohlgard commented Feb 6, 2018

kYc0o commented Feb 8, 2018

aabadie commented Feb 8, 2018

aabadie commented Feb 8, 2018 • edited Loading

kYc0o commented Feb 9, 2018

aabadie left a comment

Choose a reason for hiding this comment

kYc0o commented Feb 9, 2018

aabadie commented Feb 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaspar030 left a comment

Choose a reason for hiding this comment

aabadie commented Feb 12, 2018

kYc0o commented Feb 12, 2018

cpu/cortexm: add NOP(); after WFI(); for stm32l152 to avoid hardfault #8518

cpu/cortexm: add NOP(); after WFI(); for stm32l152 to avoid hardfault #8518

kYc0o commented Feb 5, 2018 •

edited by aabadie

Loading

kYc0o commented Feb 6, 2018 •

edited

Loading

aabadie commented Feb 8, 2018 •

edited

Loading