Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu/cortexm: add __NOP(); after __WFI(); for stm32l152 to avoid hardfault #8518

Merged
merged 1 commit into from
Feb 12, 2018

Conversation

kYc0o
Copy link
Contributor

@kYc0o kYc0o commented Feb 5, 2018

Contribution description

Currently, the stm32l1x hardfaults due to the irq state being stored in r0, which for some reason is lost after wake-up.
This PR fixes that by ensuring that the state is being stored in RAM.
The actual, less intrusive, fix is to add a __NOP(); just after wake up, which also solves the problem.

It additionally allows to choose a different Low Speed clock source, since by default LSE (external) is hardcoded. Although this might be in another PR, users of an old revision of nucleo boards cannot test (so far one of the two stm32l1x based supported boards).

Issues/PRs references

Refs #8024

@kYc0o kYc0o added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Platform: ARM Platform: This PR/issue effects ARM-based platforms CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Area: pm Area: (Low) power management labels Feb 5, 2018
@kYc0o kYc0o added this to the Release 2018.04 milestone Feb 5, 2018
Copy link
Contributor

@aabadie aabadie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo found

* 1: external crystal available (always 32.768kHz)
*
* LSE might not be available by default in early (C-01) Nucleo boards.
* If you're sure it is present, define CLOCL_LSE=1 in your project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLOCK_LSE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, addressed.

@aabadie
Copy link
Contributor

aabadie commented Feb 5, 2018

I just tested the shell with examples/default on hardware with and without LSE (I have rev C-03 with a LSE):

  • With this PR: works when not using LSE
  • Without this PR: crash when not using LSE
  • With and without this PR: no shell available in both cases. Maybe the clock init is broken when using LSE ?

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 5, 2018

Which boards are you testing?

@aabadie
Copy link
Contributor

aabadie commented Feb 5, 2018

Which boards are you testing?

Sorry, it was not clear: I have nucleo-l152 rev c-03

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 5, 2018

Can you try to debug and see if it doesn't get stuck here:

0x08001530 in stmclk_enable_lfclk () at /Users/facosta/git/RIOT-OS/RIOT/cpu/stm32_common/stmclk_common.c:79
79	        while (!(RCC->REG_LSE & BIT_LSERDY)) {}

Copy link
Contributor

@kaspar030 kaspar030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split the PR's. They're dealing with unrelated issues...

@@ -104,7 +104,8 @@ static inline void cortexm_sleep(int deep)
}

/* ensure that all memory accesses have completed and trigger sleeping */
unsigned state = irq_disable();
/* avoid state to be stored in r0 (causes fault in some platforms) */
volatile unsigned state = irq_disable();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make this optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no problem on that, however I cannot see either why this is harmful, and the difference on size is only 4 bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, it is also an extra memory access (vs. register) on every sleep. Don't know if that matters.

Any other opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just makes the variable stored in the stack, so its less than a function call.

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 6, 2018

Please split the PR's. They're dealing with unrelated issues...

Done.

@kaspar030 kaspar030 changed the title cpu/stm32l1: fix hardfault after wake-up cpu/cortexm: force irq state variable to RAM in cortexm_sleep() Feb 6, 2018
@jnohlgard
Copy link
Member

I would prefer if we could find out why r0 is lost on resume from sleep. Is there a hardware issue with these CPUs or is there a bug in the implementation of one of the ISRs in the periph drivers?

@kaspar030
Copy link
Contributor

I would prefer if we could find out why r0 is lost on resume from sleep.

Definitely... But seeing the time that has been put into debugging this already, IMO it is fine to go with a workaround for now.

Still, it should be documented as such (and not as necessary solution to a problem) and only be enabled for affected platforms.

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 6, 2018

I would prefer if we could find out why r0 is lost on resume from sleep. Is there a hardware issue with these CPUs or is there a bug in the implementation of one of the ISRs in the periph drivers?

As fas as I can tell any interrupt would cause the hardfault (e.g. hello-world example "works", but if you type something in the terminal it hardfaults), thus I'd discard the possibility of a faulty periph implementation.

I'd also like to get #8402 in since it also helps in this situation, actually I thought it would solve the original problem but unfortunately not. The same with #8403. I can debug more to see the "real" source, if there's one, but for now I'd like to have the platform working again before doing "major" reworking on clock and pm.

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 6, 2018

I would prefer if we could find out why r0 is lost on resume from sleep.

Definitely... But seeing the time that has been put into debugging this already, IMO it is fine to go with a workaround for now.

I suspect it's something to do with the power modes which maybe need to be configured before going to sleep. In this situation it "wildly" goes to sleep so I'd expect an undefined behaviour of registers and peripherals, here maybe we are just observing a part of it. Thus the importance of #8403 and #8402 .

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 6, 2018

I also insist on investigating #8024 (comment)

@kaspar030
Copy link
Contributor

I'd also like to get #8402 in since it also helps in this situation

It also solves the hard fault?

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 6, 2018

It also solves the hard fault?

No, that's why I came to this fix.

@kaspar030
Copy link
Contributor

In this situation it "wildly" goes to sleep

What exactly does this mean? Which situation, why "wildly"?

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 6, 2018

What exactly does this mean? Which situation, why "wildly"?

What I mean is that currently we put all cortexes to sleep regardless of the implementation of pm_set_lowest. Whenerver the idle thread comes to be scheduled __WFI is called, and in platforms where there's no PM implementation it just sleeps without configuring any power/clock mode. I didn't dig deep enough (yet) into the manual (I'm reading as much as I can), but it seems to me that stm32l1x behaves quite differently than other platforms and might require some configuration before just going to sleep. I made extensive tests on other non "L" platforms, namely "F" and the hardfault is not present, even though the compiler still saves state in r0.

I'll come with a more formal explanation why r0 is lost in this case (IMHO due to the lack of configuration before sleep) later in other issue or this thread if it's not being merged by then. For that I need to succeed to configure clock/pm as needed and experiment with it.

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 6, 2018

Some findings (thanks @kaspar030) suggest that this MCU is particularly behaving "wrong" after idling or sleeping. A simple __NOP(); works also for this platform, instead of making the variable volatile.

I'll change this PR to reflect the new solution which seems much more intrusive than the current one.

@kaspar030
Copy link
Contributor

I'll change this PR to reflect the new solution which seems much more intrusive than the current one.

;)

@jnohlgard
Copy link
Member

@kYc0o Where exactly do you add the nop?

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 6, 2018

@kYc0o Where exactly do you add the nop?

Just after __WFI();

@cladmi
Copy link
Contributor

cladmi commented Feb 6, 2018

I would still like to know if it can be broken with the state stored on stack with a specific number of nop instructions.
If there is a solution that reliably works, I would find it better than just a magic number of nop because we know that the compiler does not add one.

Could it be something like one of the instruction should be aligned on 4bytes address and is not for this platform because the compiler does crap ?

Is there the same alignment requirement for instructions than for memory access ?

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 6, 2018

I was doing some tests and didn't crash with several NOPs. Thus, I don't experience the same as the issue on the ST webpage.

@jnohlgard
Copy link
Member

Is the r0 corruption completely random or is it always the same?
Could it be that the memory for the stacked r0 is corrupted in some way during the ISR execution?

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 8, 2018

@kYc0o, can you also add a reference to the PR that deactivate LSE in the nucleo-l152 board peripheral configuration ? I can't find it. Because to have the nucleo-l152 working again (with both revisions, AFAIU), I think it will be required as soon as this PR gets merged.

There was no other PR, unless you mean my first attempt which I don't consider as good.

I'll open a second PR after this gets merged. However that wouldn't really be the solution for your problem since it should work with the external crystal anyways.

PS. I changed the description.

@aabadie
Copy link
Contributor

aabadie commented Feb 8, 2018

However that wouldn't really be the solution for your problem since it should work with the external crystal anyways.

Indeed, but at least this would allow people to use this board, which is not possible at the moment.

@aabadie
Copy link
Contributor

aabadie commented Feb 8, 2018

PS. I changed the description.

strikethrough text is not enough I think, please change fixes to refs. This is just to avoid the initial issue to be closed when this PR will be merged.

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 9, 2018

Ok @aabadie it seems you changed to refs, so you ACK?

Copy link
Contributor

@aabadie aabadie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you ACK?

ACK :)

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 9, 2018

Ok so there's only @kaspar030 ACK left.

@aabadie
Copy link
Contributor

aabadie commented Feb 12, 2018

@kaspar030 do you ACK this one ?

@@ -107,6 +107,14 @@ static inline void cortexm_sleep(int deep)
unsigned state = irq_disable();
__DSB();
__WFI();
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the comment into the ifdef, and make it a little shorter. E.g.:

/* STM32L152RE crashes without this __NOP(). See #8518. */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Copy link
Contributor

@kaspar030 kaspar030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

@aabadie
Copy link
Contributor

aabadie commented Feb 12, 2018

@kYc0o, please squash

@kYc0o
Copy link
Contributor Author

kYc0o commented Feb 12, 2018

Squashed.

@aabadie aabadie merged commit f5da8c2 into RIOT-OS:master Feb 12, 2018
fjmolinas added a commit to fjmolinas/RIOT that referenced this pull request Jul 12, 2019
- The __NOP() that was added in RIOT-OS#8518 is now remooved.
- When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault
  occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When
  enabled, a hardfault occured when returning from a branch to irq_restore()
  we avoid the call by inlining the function call. See #xxxxx for more
  details.
fjmolinas added a commit to fjmolinas/RIOT that referenced this pull request Jul 12, 2019
- The __NOP() that was added in RIOT-OS#8518 is now remooved.
- When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault
  occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When
  enabled, a hardfault occured when returning from a branch to irq_restore()
  we avoid the call by inlining the function call. See RIOT-OS#11830 for more
  details.
fjmolinas added a commit to fjmolinas/RIOT that referenced this pull request Jul 12, 2019
- The __NOP() that was added in RIOT-OS#8518 is now remooved.
- When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault
  occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When
  enabled, a hardfault occured when returning from a branch to irq_restore()
  we avoid the call by inlining the function call. See RIOT-OS#11830 for more
  details.
fjmolinas added a commit to fjmolinas/RIOT that referenced this pull request Jul 12, 2019
- The __NOP() that was added in RIOT-OS#8518 is now remooved.
- When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault
  occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When
  enabled, a hardfault occured when returning from a branch to irq_restore()
  we avoid the call by inlining the function call. See RIOT-OS#11830 for more
  details.
fjmolinas added a commit to fjmolinas/RIOT that referenced this pull request Aug 5, 2019
- The __NOP() that was added in RIOT-OS#8518 is now remooved.
- When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault
  occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When
  enabled, a hardfault occured when returning from a branch to irq_restore()
  we avoid the call by inlining the function call. See RIOT-OS#11830 for more
  details.
@kYc0o kYc0o deleted the stm32l1_temp_fix branch May 4, 2020 11:19
@Carton32 Carton32 mentioned this pull request Aug 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: pm Area: (Low) power management CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ARM Platform: This PR/issue effects ARM-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants