Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update idle loop to reduce calls to suspend #6534

Merged
merged 4 commits into from
Apr 25, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 60 additions & 13 deletions rtos/TARGET_CORTEX/SysTimer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
#if DEVICE_LOWPOWERTIMER

#include "hal/lp_ticker_api.h"
#include "mbed_critical.h"
#include "rtx_core_cm.h"
extern "C" {
#include "rtx_lib.h"
Expand All @@ -45,6 +46,8 @@ SysTimer::SysTimer() :
TimerEvent(get_lp_ticker_data()), _start_time(0), _tick(0)
{
_start_time = ticker_read_us(_ticker_data);
_suspend_time_passed = true;
_suspended = false;
}

void SysTimer::setup_irq()
Expand All @@ -61,23 +64,30 @@ void SysTimer::setup_irq()
#endif
}

void SysTimer::schedule_tick(uint32_t delta)
void SysTimer::suspend(uint32_t ticks)
{
insert_absolute(_start_time + (_tick + delta) * 1000000ULL / OS_TICK_FREQ);
}
core_util_critical_section_enter();

void SysTimer::cancel_tick()
{
remove();
schedule_tick(ticks);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c1728p9 Why is there no remove() call here? I'm seeing issues on my system because of this.

The SysTimer is based on a TimerEvent class, which has a static member for the timer's event data struct. Additionally, the call to suspend may happen before the previously set timer expires (for example when all threads go idle and the idle handler gets activated). At that point, the timer event data is still present in the timer queue. Calling schedule_tick here will insert the same event structure again in the queue. And since the queue is by reference, that'll lead to a recursive timer that breaks the system.

Copy link
Contributor Author

@c1728p9 c1728p9 May 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stevew817 I gave this a try but was unable to reproduce the failure. Do you have example code which reproduces the issue?

From both code analysis and running the code can I cant find a sequence which causes this. The basic flow of the idle loop should unconditionally call remove before each insert as indicated below:

while (true) {
    osKernelSuspend();      // removes event
    os_timer->suspend();    // inserts event
    sleep();
    os_timer->resume();     // removes event
    osKernelResume();       // inserts event
}

With a debugger I can confirm the following sequence is being met by setting a breakpoint in TimerEvent::remove, TimerEvent::insert and TimerEvent::insert_absolute. Below is the callstack from each:

osKernelSuspend()
    svcRtxKernelSuspend()
        KernelBlock()
            OS_Tick_Disable()
                os_timer->cancel_tick()
                    remove()

os_timer->suspend()
    schedule_tick()
        insert_absolute()

os_timer->resume()
    remove()

osKernelResume()
    svcRtxKernelResume()
        KernelUnblock()
            OS_Tick_Enable()
                schedule_tick()
                    insert_absolute();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran across this when running cloud-client-example using Thread as a sleepy node in tickless mode. The code would hang, and attaching a debugger confirmed it was stuck in the tick handler IRQ because somehow the tick event had managed to add itself recursively (the next pointer of the tick event data was pointing to the same event data, causing the remove call to not do anything since the head just always ands up being the tick event data).

I fixed the issue by adding a cancel_tick() call before the schedule_ticks call in SysTimer::suspend. I didn't investigate further after solving my issue, so I'm afraid I don't have a reproducer ready.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevew817 I'm still unable to reproduce this problem, but I created #7027 just to be safe. Could you provide further details on how to replicate this failure (such as git shas used in cloud-client-example and mbed-os along with config, any local mods you made and the hardware you were using) or an example application?

_suspend_time_passed = false;
_suspended = true;

core_util_critical_section_exit();
}

uint32_t SysTimer::get_tick()
bool SysTimer::suspend_time_passed()
{
return _tick & 0xFFFFFFFF;
return _suspend_time_passed;
}

uint32_t SysTimer::update_tick()
uint32_t SysTimer::resume()
{
core_util_critical_section_enter();

_suspended = false;
_suspend_time_passed = true;
remove();

uint64_t new_tick = (ticker_read_us(_ticker_data) - _start_time) * OS_TICK_FREQ / 1000000;
if (new_tick > _tick) {
// Don't update to the current tick. Instead, update to the
Expand All @@ -88,9 +98,34 @@ uint32_t SysTimer::update_tick()
}
uint32_t elapsed_ticks = new_tick - _tick;
_tick = new_tick;

core_util_critical_section_exit();
return elapsed_ticks;
}

void SysTimer::schedule_tick(uint32_t delta)
{
core_util_critical_section_enter();

insert_absolute(_start_time + (_tick + delta) * 1000000ULL / OS_TICK_FREQ);

core_util_critical_section_exit();
}

void SysTimer::cancel_tick()
{
core_util_critical_section_enter();

remove();

core_util_critical_section_exit();
}

uint32_t SysTimer::get_tick()
{
return _tick & 0xFFFFFFFF;
}

us_timestamp_t SysTimer::get_time()
{
return ticker_read_us(_ticker_data);
Expand All @@ -100,24 +135,36 @@ SysTimer::~SysTimer()
{
}

void SysTimer::set_irq_pending()
void SysTimer::_set_irq_pending()
{
// Protected function synchronized externally

#if (defined(NO_SYSTICK))
NVIC_SetPendingIRQ(mbed_get_m0_tick_irqn());
#else
SCB->ICSR = SCB_ICSR_PENDSTSET_Msk;
#endif
}

void SysTimer::increment_tick()
void SysTimer::_increment_tick()
{
// Protected function synchronized externally

_tick++;
}

void SysTimer::handler()
{
set_irq_pending();
increment_tick();
core_util_critical_section_enter();

if (_suspended) {
_suspend_time_passed = true;
} else {
_set_irq_pending();
_increment_tick();
}

core_util_critical_section_exit();
}

}
Expand Down
45 changes: 32 additions & 13 deletions rtos/TARGET_CORTEX/SysTimer.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,34 @@ class SysTimer: private mbed::TimerEvent, private mbed::NonCopyable<SysTimer> {
*/
static void setup_irq();

/**
* Set wakeup time and schedule a wakeup event after delta ticks
*
* After suspend has been called the function suspend_time_passed
* can be used to determine if the suspend time has passed.
*
* @param delta Ticks to remain suspended
*/
void suspend(uint32_t delta);

/**
* Check if the suspend time has passed
*
* @return true if the specified number of ticks has passed otherwise false
*/
bool suspend_time_passed();

/**
* Exit suspend mode and return elapsed ticks
*
* Due to a scheduling issue, the number of ticks returned is decremented
* by 1 so that a handler can be called and update to the current value.
* This allows scheduling restart successfully after the OS is resumed.
*
* @return the number of elapsed ticks minus 1
*/
uint32_t resume();

/**
* Schedule an os tick to fire
*
Expand All @@ -77,17 +105,6 @@ class SysTimer: private mbed::TimerEvent, private mbed::NonCopyable<SysTimer> {
*/
uint32_t get_tick();

/**
* Update the internal tick count
*
* @return The number of ticks incremented
*
* @note Due to a scheduling issue, the number of ticks returned is decremented
* by 1 so that a handler can be called and update to the current value.
* This allows scheduling restart successfully after the OS is resumed.
*/
uint32_t update_tick();

/**
* Get the time
*
Expand All @@ -97,10 +114,12 @@ class SysTimer: private mbed::TimerEvent, private mbed::NonCopyable<SysTimer> {

protected:
virtual void handler();
void increment_tick();
static void set_irq_pending();
void _increment_tick();
static void _set_irq_pending();
us_timestamp_t _start_time;
uint64_t _tick;
bool _suspend_time_passed;
bool _suspended;
};

/**
Expand Down
31 changes: 17 additions & 14 deletions rtos/TARGET_CORTEX/mbed_rtx_idle.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -95,21 +95,24 @@ uint32_t OS_Tick_GetInterval (void) {

static void default_idle_hook(void)
{
uint32_t elapsed_ticks = 0;

core_util_critical_section_enter();
uint32_t ticks_to_sleep = svcRtxKernelSuspend();
if (ticks_to_sleep) {
os_timer->schedule_tick(ticks_to_sleep);

sleep();

os_timer->cancel_tick();
// calculate how long we slept
elapsed_ticks = os_timer->update_tick();
uint32_t ticks_to_sleep = osKernelSuspend();
os_timer->suspend(ticks_to_sleep);

bool event_pending = false;
while (!os_timer->suspend_time_passed() && !event_pending) {

core_util_critical_section_enter();
if (osRtxInfo.kernel.pendSV) {
Copy link
Member

@JonatanAntoni JonatanAntoni Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

osRtxInfo.kernel.pendSV is not public API. Perhaps @RobertRostohar can justify if this solution is solid and a small API change would be valuable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might help with the review, details are here #6273 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that there should be better way - pendSV peeking was the best I could think of with current APIs.

The "simple" API change I would suggest is to permit osKernelSuspend/Resume to be callable from critical section (which is what happens via direct svcKernelSuspend calls at the moment, and this replaces), or to have them or equivalents actually return from the suspend call such that the thread is now in a critical section.

I guess in general one also needs to think about priviliged/unprivileged operation, in case a system is using User mode - system suspension, critical sections and the WFI instruction are privileged operations anyway, so this whole "normal thread code does suspend" only works if the idle thread is privileged (or the suspend call makes you privileged?), or you have a WFE-only version (which I believe is less flexible and can reduce power saving opportunities).

Copy link
Contributor Author

@c1728p9 c1728p9 Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JonatanAntoni when I was initially working on tickless I proposed an extension to the RTOS to allow this, which you might be able to use:
https://github.com/ARMmbed/mbed-os/pull/5031/files#diff-415fd494349b5414064e0cc0d9f0d596R189

As for this PR, how do you want to proceed? Can we move forward using this internal API until RTX has a public API which supports this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, for the time being you can access this internal bit. But as it is not public API it might change in the future without being warned. So we should align the low power operation, soon. We need to address this privileged/unprivileged issue nonetheless talking about MPU protection.

event_pending = true;
} else {
sleep();
}
core_util_critical_section_exit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be call for an ISB here - you're wanting pending interrupts to trigger between this line and the next one, and an ISB should make sure that happens.

In theory I think you could redisable interrupts before any have a chance to trigger - formally the sequence "enable IRQs / disable IRQs" needs an ISB somewhere in the middle to guarantee they have time to trigger.

Relevant ARM ARM quote:

If execution of a CPS instruction:

  • Increases the execution priority, the CPS execution serializes that change to the instruction stream.
  • Decreases the execution priority, the architecture guarantees only that the new priority is visible to instructions executed after either executing an ISB, or performing an exception entry or exception return.

(In practice they're likely to trigger before you hit the next disable, which means you'd see pendSV set by the next time you hit line 104.)


// Ensure interrupts get a chance to fire
__ISB();
}
svcRtxKernelResume(elapsed_ticks);
core_util_critical_section_exit();
osKernelResume(os_timer->resume());
}

#elif defined(FEATURE_UVISOR)
Expand Down