New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for tickless and LPTICKER_DELAY_TICKS #7524

Merged
merged 7 commits into from Aug 21, 2018

Conversation

Projects
None yet
9 participants
@c1728p9
Contributor

c1728p9 commented Jul 16, 2018

Description

This PR fixes problems that occur when both tickless and LPTICKER_DELAY_TICKS are turned on. The two primary problems are:

  1. Tests which use the microsecond ticker directly interfere with the lp ticker wrapper
  2. The Timeout object used by the lp ticker wrapper locks deep sleep at various times causing tests which assert that deep sleep is allowed to fail

The tests susceptible to problem 1 are:

  • mbed_hal/common_tickers
  • mbed_hal/sleep

The tests susceptible to problem 2 are:

  • mbed_drivers/sleep_lock
  • mbed_drivers/timeout
  • mbed_hal/lp_ticker
  • mbed_hal/rtc
  • mbed_hal/sleep
  • mbed_hal/sleep_manager
  • mbed_hal/sleep_manager_racecondition

This PR fixes the problems mentioned above. Unfortunately, there are still additional problems which need to be addressed. Further fixes require:

  • Common ticker handling should be suspended before calling us/lp ticker directly. This is important for the below test, but is not currently causing failures.
    • mbed_hal/common_tickers
    • mbed_hal/common_tickers_freq
    • mbed_hal/lp_ticker
    • mbed_hal/sleep
  • The Serial driver must unlock deep sleep only after all bytes have been sent to prevent bytes from being dropped by entering deep sleep. This requires a HAL extension. I am seeing test failures due to this on the NUCLEO_F401RE when tickless is enabled.

Pull request type

[x] Fix
[ ] Refactor
[ ] New target
[ ] Feature
[ ] Breaking change

@0xc0170 0xc0170 requested a review from ARMmbed/mbed-os-test Jul 17, 2018

@0xc0170 0xc0170 requested review from mprse and ARMmbed/mbed-os-hal and removed request for ARMmbed/mbed-os-test Jul 17, 2018

@jeromecoutant

This comment has been minimized.

Contributor

jeromecoutant commented Jul 17, 2018

Hi Russ

I have started a full non regression for each STM32 family.
I will provide this status later.

In parallel, I quickly checked TICKLESS mode with F401. I agree status is much better.

For the moment, I only added some DeepSleepLock in tests-mbed_drivers-lp_ticker.

@mprse

Potential inconsistency between result returned by sleep_manager_can_deep_sleep() and action performed by sleep_manager_sleep_auto().

@@ -196,7 +203,11 @@ void sleep_manager_sleep_auto(void)
#ifdef MBED_DEBUG
hal_sleep();
#else
if (sleep_manager_can_deep_sleep()) {
if (sleep_manager_can_deep_sleep()
#if DEVICE_LPTICKER && (LPTICKER_DELAY_TICKS > 0)

This comment has been minimized.

@mprse

mprse Jul 17, 2018

Member

If lp ticker's Timeout object is in use (i.e. lp_ticker_get_timeout_pending() returns true) and deep-sleep is not locked by other features (i.e. lock_count==1), then:

  • sleep_manager_can_deep_sleep() returns true - indicates that we can put board into deep-sleep mode.
  • sleep_manager_sleep_auto() puts the board into the sleep mode (not deep-sleep).

It looks like a design issue since we can not relay on sleep_manager_can_deep_sleep() function.
From the user perspective it can be very misleading. From the testing perspective sleep_manager_can_deep_sleep() function is used in tests to check if current environment state allows to go into deep-sleep mode which is to be tested. So in some cases the test may test sleep mode instead of required deep-sleep mode.

This comment has been minimized.

@0xc0170

0xc0170 Jul 17, 2018

Member

It looks like a design issue since we can not relay on sleep_manager_can_deep_sleep() function.

where this comes from?

This comment has been minimized.

@fkjagodzinski

fkjagodzinski Jul 17, 2018

Member

I agree with @mprse, sleep_manager_can_deep_sleep() shouldn't be updated this way. I do understand the purpose, but think this change should be test specific, not general.

This comment has been minimized.

@mprse

mprse Jul 17, 2018

Member

Please take a look at these changes c85856b.

This comment has been minimized.

@c1728p9

c1728p9 Jul 17, 2018

Contributor

Some possible alternatives I can think of are:

  • Modify tests to suspend the OS before running the tests - this may be intrusive as you can no longer use most OS primitives
  • Assert in tests using a different function - something like sleep_manager_can_deep_sleep_test()
  • Assert something different in the tests, such as time actually spent in deep sleep

Do you have any thoughts or preferences on this @mprse and @fkjagodzinski?

This comment has been minimized.

@mprse

mprse Jul 18, 2018

Member

In my opinion first bullet is ok, but only for low level tests (HAL) where target specific drivers are tested, where the OS can be omitted. For driver layer tests I think that OS should be working normally and tests should take into consideration possible influence of the OS like systick handling, task scheduling, ticker interrupts, etc. So I suggest to use second bullet for upper layer tests.

This comment has been minimized.

@fkjagodzinski

fkjagodzinski Jul 18, 2018

Member

Since one of the problems is:

The Timeout object used by the lp ticker wrapper locks deep sleep at various times causing tests which assert that deep sleep is allowed to fail

we could modify the tests to wait for the deep sleep to be unlocked before calling sleep(). But even this would require using critical section, so would also introduce more complexity to tests.

Adding sleep_manager_can_deep_sleep_test() seems straight forward, but such change would mean we accept that tests are occasionally run in an environment different from what they were supposed to. Simply put, the test result can be meaningless sometimes.

In my opinion the most honest solution would be adding

#if DEVICE_LPTICKER && (LPTICKER_DELAY_TICKS > 0)
#error [NOT_SUPPORTED] Test not supported for this target
#endif

or executing affected test cases conditionally based on LPTICKER_DELAY_TICKS > 0.

return deep_sleep_lock == 0 ? true : false;
uint32_t lock_count = deep_sleep_lock;
#if DEVICE_LPTICKER && (LPTICKER_DELAY_TICKS > 0)
if (lp_ticker_get_timeout_pending() && (lock_count > 0)) {

This comment has been minimized.

@mprse

mprse Jul 17, 2018

Member

(lock_count > 0) - is this condition needed?
In case when lp ticker's Timeout object is in use deep-sleep is locked (i.e. lock_count > 0).

This comment has been minimized.

@c1728p9

c1728p9 Jul 17, 2018

Contributor

The check is to prevent underflow. If this function is wrapped in a critical section then the check wouldn't be needed, as lock_count would always be at least 1 if lp_ticker_get_timeout_pending() returned true.

@jeromecoutant

This comment has been minimized.

Contributor

jeromecoutant commented Jul 17, 2018

More detailed status with NUCLEO_F401RE:

I applied few patches:
c4f0eaf

And I still got issues with:

  • tests-mbed_drivers-lp_timeout
    [1531832056.30][CONN][RXD] >>> Running case #10: 'Zero delay (attach_us)'...
    [1531832056.37][CONN][RXD] :213::FAIL: Expected 1 Was 0

  • tests-mbedmicro-rtos-mbed-systimer

[1531833600.45][CONN][RXD] >>> Running case #5: 'Schedule zero ticks'...
[1531833600.53][CONN][RXD] :229::FAIL: Expected 1 Was 0

@mprse

This comment has been minimized.

Member

mprse commented Jul 18, 2018

Regarding lp ticker wrapper I have doubts if this is a correct approach. According to the description it is designed for cases when:

  • Interrupt may not fire if set earlier than LPTICKER_DELAY_TICKS low power clock cycles.
  • Setting the interrupt back-to-back will block.

So the first bullet is the case when we have some hardware limitations like on NRF51_DK board:

If the COUNTER is N, writing N or N+1 to a CC register may not trigger a COMPARE event.

In this case the nearest interrupt which can be set is current tick + 2. This can be handled using lp ticker wrapper and LPTICKER_DELAY_TICKS = 2, but currently is handled by ensuring that value written to CC is greater than or equal to current tick + 2 in the Nordic NRF5x lp ticker driver:

const uint32_t now = nrf_rtc_counter_get(COMMON_RTC_INSTANCE);
if (now == ticks_count ||
RTC_WRAP(now + 1) == ticks_count) {
ticks_count = RTC_WRAP(ticks_count + 2);
}
nrf_rtc_cc_set(COMMON_RTC_INSTANCE, cc_channel, ticks_count);

Personally I prefer current option since usage of lp ticker wrapper which utilizes Timeout object in the HAL layer as we can see has too many side effects. Additionally since this is hardware specific limitation it should be handled in the target layer.

Regarding second bullet I'm not sure if I correctly understand the problem. Can you provide some use case for this?

@c1728p9

This comment has been minimized.

Contributor

c1728p9 commented Jul 18, 2018

Hi @mprse, the primary problem is the second point - some low power ticker hardware cannot be written to twice within a certain number of low power clock cycles (LPTICKER_DELAY_TICKS ticks).

  1. The OS scheduler enters tickless mode and sets the low power ticker to wake 500ms in the future
  2. An interrupt fires right after and wakes the OS early
  3. The OS uses the low power ticker to schedule the next tick (1ms in the future)

If the next tick time is written to the low power ticker immediately the new match time will be dropped, as the hardware is still trying to set the first match value. Options to work around this are:

  1. Block in lp_ticker_set_interrupt until the new value is written
  2. Don't allow low power tickers which require a delay
  3. Use a Timeout object to write the second match value after LPTICKER_DELAY_TICKS ticks has passed

Option 1. is what was done prior to the ticker HAL specification. The problem with this is interrupt latency is impacted. An example in practice is the NUCLEO_F401RE. It has a low power ticker clock speed of 8KHz and requires 3 clock cycles between writes. This means that back-to-back writes will block for 366us in a critical section.

Option 2. was considered but it was found that a large number of targets have this low power ticker hardware behavior. Vendors with effected targets include ST, Nuvoton and OnSemi to name a few. Because this would rule out tickless for many targets this option was less than ideal as well. Note - the large number of vendors/targets impacted was also one of the main reason this was done in a common layer.

Option 3. both preserves interrupt latency and allows targets with this low power ticker behavior still implement the low power ticker API. Compared to Option 1, power savings is better since the CPU can go into sleep mode rather than busy waiting. Unfortunately it does add complexity and interferes with deep sleep tests.

@mprse do you know of any alternatives which could simplify this? These were the only options that I could think of, but if there is a better way to do it would be great.

@mprse

This comment has been minimized.

Member

mprse commented Jul 19, 2018

@c1728p9 Thanks Russ for detailed explanation of the problem. It looks like option number 3 despite its complexity is the best solution for now.

We have disused with @fkjagodzinski possible solutions for the deep-sleep/sleep issue. Please consider the following option:

  • sleep_manager_can_deep_sleep(): returns true in case when only Timeout used by the lp ticker wrapper stops us from going into deep-sleep mode (as it is done now).
  • sleep_manager_sleep_auto(): when we can go into deep-sleep (sleep_manager_can_deep_sleep() returns true), but Timeout object of the lp ticker wrapper is in use (lp_ticker_get_timeout_pending() returns true), then we are waiting in busy loop outside the critical section until timeout is not pending (lp_ticker_get_timeout_pending() returns false) and then go into deep-sleep.

Since sleep_manager_sleep_auto() function shell automatically decide which mode should be selected its a good place to handle this specific case. Additionally it would solve the problem with the sleep tests.
I'm not sure about possible side effects of this solutions, but still if we were just about to enter into deep-sleep mode we can waste ~366 us (there are currently no other things to do).
We will lost time which we could spend in sleep mode, but for the rest of planed time we will be in deep-sleep which saves more power.

mprse referenced this pull request in fkjagodzinski/mbed-os Jul 23, 2018

Tests: SleepManager: extend test suite
New test cases:
* "Deep sleep lock/unlock"
* "Deep sleep unbalanced unlock"
* "Deep sleep locked USHRT_MAX times"
* "Deep sleep locked more than USHRT_MAX times"
* "sleep_auto calls sleep/deep sleep based on lock"

@c1728p9 c1728p9 force-pushed the c1728p9:tickless_fix branch from 0f80394 to 26b73fd Jul 26, 2018

@c1728p9

This comment has been minimized.

Contributor

c1728p9 commented Jul 26, 2018

@mprse @fkjagodzinski I updated this PR to address the feedback you gave me. In specific I added and updated the test to use the function sleep_manager_can_deep_sleep_test_check() which checks to see if deep sleep is allowed within 2ms. This way if deep sleep is blocked temporarily for any reason - low power ticker wrapper, buffered uart bytes, or some other unforeseen reason then tests will not start failing again. Let me know what you think.

@mprse

mprse approved these changes Jul 26, 2018

@c1728p9 Looks good to me.

What bothers me is that test for sleep_manager_can_deep_sleep_test_check() is missing (even though this function is intended for use in testing). I think it should be added or task for this should be created.
I suggest also to add information in the description of sleep_manager_can_deep_sleep_test_check() that timeout is set to 2 ms.

@fkjagodzinski

This comment has been minimized.

Member

fkjagodzinski commented Jul 30, 2018

@mprse @c1728p9 I've got an open PR with sleep manager tests update (#7582), I can add a few test cases for that new function.

@cmonr

This comment has been minimized.

Contributor

cmonr commented Jul 31, 2018

/morph build

/**
* Wrapper around lp_ticker_set_interrupt to prevent blocking
*
* Problems this function is solving:

This comment has been minimized.

@cmonr

cmonr Jul 31, 2018

Contributor

👍

@cmonr

cmonr approved these changes Jul 31, 2018

@mbed-ci

This comment has been minimized.

mbed-ci commented Jul 31, 2018

Build : SUCCESS

Build number : 2710
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/7524/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

@mbed-ci

This comment has been minimized.

@mbed-ci

This comment has been minimized.

Update deep sleep lock check in tests
When the define LPTICKER_DELAY_TICKS is set deep sleep can be randomly
disallowed when using the low power ticker. This is because a Timer
object, which locks deep sleep, is used to protect from back-to-back
writes to lp tickers which can't support that. This causes tests which
assert that deep sleep is allowed to intermittently fail.

To fix this intermittent failure this patch adds the function
sleep_manager_can_deep_sleep_test_check() which checks if deep sleep
is allowed over a duration. It updates all the tests to use
sleep_manager_can_deep_sleep_test_check() rather
than sleep_manager_can_deep_sleep() so the tests work even if deep
sleep is spuriously blocked.

@c1728p9 c1728p9 force-pushed the c1728p9:tickless_fix branch from aeebd80 to 959e687 Aug 17, 2018

@cmonr

This comment has been minimized.

Contributor

cmonr commented Aug 17, 2018

/morph build

@cmonr cmonr added needs: CI and removed needs: work labels Aug 17, 2018

@mbed-ci

This comment has been minimized.

mbed-ci commented Aug 17, 2018

@cmonr cmonr added needs: work and removed needs: CI labels Aug 17, 2018

c1728p9 added some commits Jul 12, 2018

Fix tests to work with LPTICKER_DELAY_TICKS
Fix the HAL common_tickers and sleep tests so they work correctly when
the define LPTICKER_DELAY_TICKS is set.
Fix rollover handling in ticker frequency test
To handle timer rollovers the test tests-mbed_hal-common_tickers_freq
calls intf->set_interrupt(0). For this to work correctly the ticker
implementation must fire an interrupt on every rollover event though
intf->set_interrupt(0) was called only once. Whether an interrupt will
fire only once or multiple times is undefined behavior which
cannot be relied upon.

To avoid this undefined behavior this patch continually schedules an
interrupt and performs overflow detection on every read. This also
removes the possibility of race conditions due to overflowCounter
incrementing at the wrong time.
Speed optimization for LowPowerTickerWrapper
Only reschedule the Timeout object in the low power ticker wrapper
if it is not already pending.

@c1728p9 c1728p9 force-pushed the c1728p9:tickless_fix branch from 959e687 to dc2e2c0 Aug 17, 2018

@cmonr

This comment has been minimized.

Contributor

cmonr commented Aug 17, 2018

/morph build

@mbed-ci

This comment has been minimized.

mbed-ci commented Aug 18, 2018

Build : SUCCESS

Build number : 2835
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/7524/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

@mbed-ci

This comment has been minimized.

@mbed-ci

This comment has been minimized.

@cmonr cmonr added needs: CI and removed needs: work labels Aug 20, 2018

@cmonr

This comment has been minimized.

Contributor

cmonr commented Aug 20, 2018

/morph test

@NirSonnenschein

This comment has been minimized.

Contributor

NirSonnenschein commented Aug 21, 2018

/morph uvisor-test

@mbed-ci

This comment has been minimized.

@cmonr cmonr merged commit e02466a into ARMmbed:master Aug 21, 2018

15 checks passed

AWS-CI uVisor Build & Test Success
Details
ci-morph-build build completed
Details
ci-morph-exporter build completed
Details
ci-morph-mbed2-build build completed
Details
ci-morph-test test completed , RTOS ROM(+0.1%) RAM(+0.0%)
Details
continuous-integration/jenkins/pr-head This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-ci/cloud_client_smoke_test Test job was successful
Details
travis-ci/astyle Passed, 584 files
Details
travis-ci/docs Local docs testing has passed
Details
travis-ci/events Passed, runtime is 10009 cycles (+45 cycles)
Details
travis-ci/gitattributestest Local gitattributestest testing has passed
Details
travis-ci/licence_check Local licence_check testing has passed
Details
travis-ci/littlefs Passed, code size is 8372B (+0.00%)
Details
travis-ci/tools-py2.7 Local tools-py2.7 testing has passed
Details

pan- pushed a commit to pan-/mbed that referenced this pull request Aug 22, 2018

Merge pull request ARMmbed#7524 from c1728p9/tickless_fix
Fixes for tickless and LPTICKER_DELAY_TICKS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment