arch/stm32f7: Fixes bug in tickless driver where the compare register… #1380

antmerlino · 2020-07-07T20:26:45Z

… is set to a value less than the current time.

Summary

When using a very high resolution timer, and very small time intervals, it is possible for the Compare register of the STM32 timer peripheral to be set to a value that has just passed. This means the timer expiration won't happen until a full overflow cycle of the timer.

The proposed solution is to disable interrupts, set the capture register, and then after, but before turning back on interrupts, check to see if the time has already passed. If it has, we force the expiration immediately.

Impact

All tickless drivers should be evaluated for this flaw. The LPC family devices have more protection for this and guided my understanding of this issue.

Testing

Run on STM32F765 processor with 1us clock and interval timers of varying times - some very short (6us)

patacongo · 2020-07-07T21:44:53Z

arch/arm/src/stm32f7/stm32_tickless.c

+  g_tickless.pending = true;
+
+  uint64_t counter = ((uint64_t)g_tickless.overflow << 32) |
+                     STM32_TIM_GETCOUNTER(g_tickless.tch);


If an overflow from the LS to MS occurs, the fetched value will be wrong. I usually do this (which does not require disabling interrupts):

uint64_t counter; uint32_t ms1; uint32_t ms2; uint32_t ls; do { ms1 = get_ms(); ls = get_ls(); ms2 = get_ms(); } while (ms1 != ms2); counter = (ms1 << 32) | ls;

Note also that declarations with code is not c89, but that is not so important in M CU-specific code.

@gregory-nutt Noted, I'll make this change, thanks.

If an overflow from the LS to MS occurs, the fetched value will be wrong. I usually do this (which does not require disabling interrupts):

uint64_t counter; uint32_t ms1; uint32_t ms2; uint32_t ls; do { ms1 = get_ms(); ls = get_ls(); ms2 = get_ms(); } while (ms1 != ms2); counter = (ms1 << 32) | ls;

Well, I didn't notice that the MS part is just a software overflow counter and not a hardware register. This that case, what I propose here is nonsense. If interrupts are disabled, not special precautions needs to be taken.

antmerlino · 2020-07-07T22:54:06Z

arch/arm/src/stm32f7/stm32_tickless.c

+
+  if (counter > tm)
+    {
+      stm32_interval_handler();


@gregory-nutt Any recommendations on how to call the stm32_interval_handler in the interrupt context instead of the callers context so that the expiration works as it would if called on timer expiration?

Any recommendations on how to call the stm32_interval_handler in the interrupt context instead of the callers context so that the expiration works as it would if called on timer expiration?

You can't really get into the interrupt context without being in an interrupt. Entering a critical section is about as close as you can get.

The only side effect is per Issue #1138 If that is a concern you can also disable pre-emption.

@gregory-nutt

The problem is, this function eventually calls nxsig_timeout, like it should, but because the task is still currently running, the signal is not actually set, which means the task eventually waits and since the timeout already happened, things are locked.

I'm assuming it would not be safe to change line 128 of sig_timedwait.c to allow the signal in the TSTATE_TASK_RUNNING state.

So you are saying that nxsig_timeout() is being called before up_block_task()? This is the only place that nxsig_timeout is referenced:

358 /* Start the watchdog */ 359 360 wd_start(rtcb->waitdog, waitticks, 361 nxsig_timeout, 1, wdparm.pvarg); 362 363 /* Now wait for either the signal or the watchdog, but 364 * first, make sure this is not the idle task, 365 * descheduling that isn't going to end well. 366 */ 367 368 DEBUGASSERT(NULL != rtcb->flink); 369 up_block_task(rtcb, TSTATE_WAIT_SIG);

The watchdog should not occur because we are in a critical section. We will stay in the critical section until rtcb is blocked in TSTATE_WAIT_SIG state. I guess I don't understand you problem description.

@gregory-nutt
The issue is that the wd_start happening line 360 is being called with a very small waitticks (6 = 6us).

By the time we get into wd_start, call up_alarm_start() we wind up trying to schedule a time that has already passed. Before my changes, this just resulted in the work queue being locked. I suppose it would wakeup after a full overflow cycle.

So my change is trying to basically expire the timer before it even starts.

The issue is that nxsig_timeout can occur before we get into up_block_task. I'm not sure how to protect for this yet.

As I argued above, try using up_udelay(6). There is no loss in performance in this case because the CPU is never relinquished in time by the thread anyway. This corrnercase is certainly not worth a redesign.

The issue is that nxsig_timeout can occur before we get into up_block_task. I'm not sure how to protect for this yet.
@patacongo @davids5 @xiaoxiang781216 Any ideas?

nxsig_timedwait is protected by the critical section, so before up_block_task get called, the interrupt service doesn't have the chance to run.

As I argued above, try using up_udelay(6). There is no loss in performance in this case because the CPU is never relinquished in time by the thread anyway. This corrnercase is certainly not worth a redesign.

No, if @antmerlino's report is correct, we need fix this bug:
1.Either fix the race condition or
2.Enlarge timeout to a safe value at wd level
How do you check every place to ensure all timeout isn't too small to hit the race condition?

@patacongo Can you explain something to me.

In sig_timedwait.c line 258 we enter a critical section, disabling interrupts.

We do not call leave_critical_section(), re-enabling interrupts before we start the watchdog and call up_block_task.

How does this work? Does IRQ context switch somewhere when the task switches to re-enable interrupts?

@patacongo Can you explain something to me.

In sig_timedwait.c line 258 we enter a critical section, disabling interrupts.

We do not call leave_critical_section(), re-enabling interrupts before we start the watchdog and call up_block_task.

How does this work? Does IRQ context switch somewhere when the task switches to re-enable interrupts?

The critical section (and interrupt state) is a per thread property. Disabling interrupts only has effect while the task is running. When it suspends and another task runs, then that task's interrupt state is placed into effect. So (usually) interrupts are re-enabled while the task sleeps.

I works in the context switch because the context switch saves and restores the current state of the ARM interrupt control registers.

Okay, my report was wrong. There was never a race condition at the sig_timedwait level - Greg's answer cleared that up for me. I just didn't protect for the race condition completely at the STM32 level, there was still a very specific timing sequence that would cause it to lock up.

I've now been running stable with the latest branch.

I can't get the nxstyle line 800 error to go away. Otherwise I fixed all other formatting issues.

This is ready for final review and then merging.

… is set to a value less than the current time.

Ouss4 · 2020-07-18T16:32:41Z

I can't get the nxstyle line 800 error to go away. Otherwise I fixed all other formatting issues.
This is ready for final review and then merging.

@antmerlino I fixed that nxstyle error.
@davids5 @xiaoxiang781216 @patacongo If you don't have any other concerns, please merge.

xiaoxiang781216 · 2020-07-18T17:23:01Z

LGTM.

patacongo reviewed Jul 7, 2020

View reviewed changes

antmerlino commented Jul 7, 2020

View reviewed changes

antmerlino force-pushed the stm32f7-tickless branch 3 times, most recently from b8146a8 to c83be2c Compare July 8, 2020 16:12

arch/stm32f7: Fixes bug in tickless driver where the compare register…

cccc1c4

… is set to a value less than the current time.

antmerlino force-pushed the stm32f7-tickless branch from c83be2c to 0018e07 Compare July 10, 2020 14:14

antmerlino changed the title ~~[DRAFT] arch/stm32f7: Fixes bug in tickless driver where the compare register…~~ arch/stm32f7: Fixes bug in tickless driver where the compare register… Jul 10, 2020

Ouss4 force-pushed the stm32f7-tickless branch from 0018e07 to 79e98ea Compare July 18, 2020 16:30

xiaoxiang781216 merged commit f91372c into apache:master Jul 18, 2020

stm32_tickless.c: Fix formatting issues.

79e98ea

btashton added this to To-Add in Release Notes - 10.0.0 Oct 14, 2020

btashton moved this from To-Add to Added in Release Notes - 10.0.0 Oct 19, 2020

antmerlino deleted the stm32f7-tickless branch March 26, 2022 03:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arch/stm32f7: Fixes bug in tickless driver where the compare register… #1380

arch/stm32f7: Fixes bug in tickless driver where the compare register… #1380

antmerlino commented Jul 7, 2020 •

edited

patacongo Jul 7, 2020 •

edited

antmerlino Jul 7, 2020

patacongo Jul 8, 2020

antmerlino Jul 7, 2020

patacongo Jul 7, 2020

antmerlino Jul 7, 2020

patacongo Jul 7, 2020

antmerlino Jul 7, 2020

patacongo Jul 9, 2020 •

edited

xiaoxiang781216 Jul 9, 2020 •

edited

antmerlino Jul 9, 2020

patacongo Jul 9, 2020

antmerlino Jul 10, 2020

Ouss4 commented Jul 18, 2020

xiaoxiang781216 commented Jul 18, 2020

arch/stm32f7: Fixes bug in tickless driver where the compare register… #1380

arch/stm32f7: Fixes bug in tickless driver where the compare register… #1380

Conversation

antmerlino commented Jul 7, 2020 • edited

Summary

Impact

Testing

patacongo Jul 7, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patacongo Jul 9, 2020 • edited

Choose a reason for hiding this comment

xiaoxiang781216 Jul 9, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ouss4 commented Jul 18, 2020

xiaoxiang781216 commented Jul 18, 2020

antmerlino commented Jul 7, 2020 •

edited

patacongo Jul 7, 2020 •

edited

patacongo Jul 9, 2020 •

edited

xiaoxiang781216 Jul 9, 2020 •

edited