Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISCO-L475VG-IOT01A lp_timeout test fails CI on IAR #11545

Closed
Tharazi97 opened this issue Sep 23, 2019 · 27 comments

Comments

@Tharazi97
Copy link
Contributor

@Tharazi97 Tharazi97 commented Sep 23, 2019

Description

Nightly CI test fails on DISCO-L475VG-IOT01A on lp_timeout test only when compiled on IAR.

It started to fail after this commit: 9858b16

[1569184789.85][CONN][RXD] >>> Running case #1: 'Callback called once (attach)'...
[1569184789.85][CONN][INF] found KV pair in stream: {{__testcase_name;1 s delay during deepsleep (attach_us)}}, queued...
mbedgt: :85::FAIL: Expected TRUE Was FALSE
[1569184789.95][CONN][RXD] :85::FAIL: Expected TRUE Was FALSE
[1569184789.95][CONN][INF] found KV pair in stream: {{__testcase_start;Callback called once (attach)}}, queued...
[1569184790.06][CONN][INF] found KV pair in stream: {{__testcase_finish;Callback called once (attach);0;1}}, queued...
[1569184790.16][CONN][RXD] >>> 'Callback called once (attach)': 0 passed, 1 failed with reason 'Assertion Failed'

@maciejbocianski @jamesbeyond @LMESTM

Issue request type

[ ] Question
[ ] Enhancement
[x] Bug
@0xc0170

This comment has been minimized.

Copy link
Member

@0xc0170 0xc0170 commented Sep 23, 2019

@LMESTM

This comment has been minimized.

Copy link
Contributor

@LMESTM LMESTM commented Sep 23, 2019

@Tharazi97 since the commit 9858b16 the target will automatically enter Deep sleep when there is nothing to do. I don't think that the lp_timeout test cases takes this into account.

When deep sleep is entered, it will induce latencies of several ms which are not considered in the test cases. By adding some more margin to the tests cases, I can run the tests okay. An alternative way forward could be that deep sleep needs to be disabled.

Snapshot of the quick diffs I made to have the test pass OK:

$ git diff TESTS/mbed_drivers/timeout/timeout_tests.h
diff --git a/TESTS/mbed_drivers/timeout/timeout_tests.h b/TESTS/mbed_drivers/timeout/timeout_tests.h
index 534901c1fb..e676b33fb4 100644
--- a/TESTS/mbed_drivers/timeout/timeout_tests.h
+++ b/TESTS/mbed_drivers/timeout/timeout_tests.h
@@ -81,10 +81,10 @@ void test_single_call(void)
     bool acquired = sem.try_acquire();
     TEST_ASSERT_FALSE(acquired);

-    acquired = sem.try_acquire_for(TEST_DELAY_MS + 2);
+    acquired = sem.try_acquire_for(TEST_DELAY_MS + 5);
     TEST_ASSERT_TRUE(acquired);

-    acquired = sem.try_acquire_for(TEST_DELAY_MS + 2);
+    acquired = sem.try_acquire_for(TEST_DELAY_MS + 5);
     TEST_ASSERT_FALSE(acquired);

     timeout.detach();
@@ -147,7 +147,7 @@ void test_override(void)
     TEST_ASSERT_FALSE(acquired);
     timeout.attach_callback(mbed::callback(sem_callback, &sem2), 2.0f * TEST_DELAY_US);

-    acquired = sem2.try_acquire_for(2 * TEST_DELAY_MS + 2);
+    acquired = sem2.try_acquire_for(2 * TEST_DELAY_MS + 6);
     TEST_ASSERT_TRUE(acquired);
     acquired = sem1.try_acquire();
     TEST_ASSERT_FALSE(acquired);

Team in charge of tests could propose the best way forward ...

@Tharazi97

This comment has been minimized.

Copy link
Contributor Author

@Tharazi97 Tharazi97 commented Sep 23, 2019

3 ms delay is quite a big change in case of 10 ms delay. I think disabling deep sleep is a good way to go.

@LMESTM

This comment has been minimized.

Copy link
Contributor

@LMESTM LMESTM commented Sep 23, 2019

@Tharazi97
The deep sleep latency is currently estimated to 3ms.
The System Timer takes this into account and program wake-up time in advance to ensure the precision of the system.

Now I don't think that the Low Power ticker or timeout drivers have implemented a similar safety mechanism, which mean that they will not cope with the deep sleep latency and risk of being unprecise ... but maybe I'm worng here, maybe @kjbracey-arm can correct me if I'm wrong ....

@ciarmcom

This comment has been minimized.

Copy link
Member

@ciarmcom ciarmcom commented Sep 23, 2019

@ciarmcom ciarmcom added the mirrored label Sep 23, 2019
@0xc0170

This comment has been minimized.

Copy link
Member

@0xc0170 0xc0170 commented Sep 23, 2019

@LMESTM what's the deep sleep wake up time for this target (I recall some targets have it defined as 1-3ms, we set this to 10ms to be sufficient for a target to wake up and app should not expect anything lower than that).

We are testing lp ticker in this test case, should it test both - deep sleep and also shallow sleep? If deep sleep involved, timeout (default is 10 + 2 = 12ms - might not be enough for some). I can't find the time reference in the power management header file at the moment (https://os.mbed.com/docs/mbed-os/v5.13/apis/power-management-sleep.html - does not provide this detail).

@LMESTM

This comment has been minimized.

Copy link
Contributor

@LMESTM LMESTM commented Sep 23, 2019

@0xc0170 I think that the exit time is about 3ms as defined in "deep-sleep-latency", that was the value last time I made measurements. Neverthless I need to increase the test delay to TEST_DELAY_MS + 5 be able to PASS the test. TEST_DELAY_MS + 4 wouldn't work.

I don't know how the semaphore timeouts are managed ...

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

@kjbracey-arm kjbracey-arm commented Sep 24, 2019

The HAL specification for sleep permits wakeup time from deep sleep to be up to 10ms.

So if testing the lpticker API, if the system is being allowed to enter deep sleep, then extra 10ms delays on the timer wakeup are to be expected.

If you're only intending to test the lpticker itself, and its IRQ generation accuracy, and you do not want the potential added delay of deep sleep wake interfering with that, then the test could lock deep sleep. That seems the thing to do here.

A broader test might try checking whether the delay exceeds the specified target.deep_sleep_latency. Although that would be also revealed by a test of rtos::ThisThread::sleep_for or similar - that should enter deep sleep but compensate for the latency so that you get woken at the correct time. (If the target has specified a latency - default is 0, meaning no compensation).

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

@kjbracey-arm kjbracey-arm commented Sep 24, 2019

I don't know how the semaphore timeouts are managed ...

I can see in that test_single_call there is going to be some juggling of the HAL timer programming. The manual LowPowerTimeout is going to be setting a timer for start + 10ms, and the semaphore wait is going to be wanting start + 12ms - latency.

If latency > 2, that means a couple of back-to-back programmings happening on the way into the sleep for the semaphore wait.

Still, when the timers go off, the semaphore acquire should not return until either the sem_callback has been run (meaning the HAL timer indicates time > start + 10ms) or the RTOS thinks time is up (meaning the HAL timer indicates time > start + 12ms).

It's a struggle to see how you could get !acquired on the semaphore read. If the OS thinks the 12ms is up, surely the sem_callback must have run?

I've walked through the code, and think I can see a failure mode.

Consider the sequence:

  • We wake up at start + 13ms, so sem_callback and RTX timeout seem simultaneous
  • sem_callback runs, releases semaphore
  • Semaphore count incremented (isrRtxSemaphoreRelease)
  • Post-processing entry added to ISR queue
  • Kernel resumes from sleep (both because time elapsed and entry on ISR queue)
  • Kernel updates ticks after sleep (svcKernelResume)
  • Kernel decides semaphore acquire has timed out (osRtxThreadDelayTick)
  • Kernel goes to return timeout error to thread
  • Post-processing for ISR happens that would wake thread with "OK" - too late

So there's an effective race if due to wake latency issues, we see the 10ms IRQ and the 12ms wake simultaneously; the wake makes the semaphore return timeout, despite there being a semaphore available - the semaphore "release" is not fully processed until the next time we return to a thread, and at that point we're already returning "timeout".

I'm thinking that could potentially be called an RTX bug, although I'm not sure exactly how it might be resolved.

However, why does that situation arise here? What's the declared latency? How long might we be getting stuck inside timer reprogramming?

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

@kjbracey-arm kjbracey-arm commented Sep 24, 2019

I guess that failure I describe above happens reliably if declared latency is 0, so no RTOS compensation, but the hardware actually takes >2ms to wake, so we program wake for "start + 10" (for the LowPowerTicker), and we only actually wake at "start + 13".

@LMESTM

This comment has been minimized.

Copy link
Contributor

@LMESTM LMESTM commented Sep 24, 2019

So if testing the lpticker API, if the system is being allowed to enter deep sleep, then extra 10ms delays on the timer wakeup are to be expected.

That might be argued don't you think ? lpticker driver could be aware of deep-sleep latency and compensate for it, it could even lock deep-sleep in case the next ticker in the pipe is too close ... ?
or maybe not ... I'm not sure whether this makes sense or not - this is really an open question.

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

@kjbracey-arm kjbracey-arm commented Sep 24, 2019

Yes, you could certainly put a pile of intelligence into the HAL driver, but I think it would ultimately make the middle layer's job more confusing. Even before considering variance in HAL implementations.

I would be open to maybe moving latency compensation down one layer into the mbed_ticker_api.c layer, which would make LowPowerTicker get it.

In this test though, there's apparently something weird happening with the semaphores and RTX, together with an apparent failure of the RTOS latency compensation - if that was working we should be out of deep sleep before the 12ms is up, so that the race I think I've deduced shouldn't occur.

@LMESTM

This comment has been minimized.

Copy link
Contributor

@LMESTM LMESTM commented Sep 24, 2019

Yes, you could certainly put a pile of intelligence into the HAL driver, but I think it would ultimately make the middle layer's job more confusing. Even before considering variance in HAL implementations.

I thought the variance would be managed through deep-sleep-latency

I would be open to maybe moving latency compensation down one layer into the mbed_ticker_api.c layer, which would make LowPowerTicker get it.

That's I guess the idea here ... If we're not going that way, the impact of deep sleep latency needs to be described in the LP Timer/Ticker/timeout APIs as I think it's not enough to document the power management / sleep part. As of now, users may program a LP timeout to fire after 150µs and will end up with a 3ms delay or so ...

In this test though, there's apparently something weird happening with the semaphores and RTX, together with an apparent failure of the RTOS latency compensation - if that was working we should be out of deep sleep before the 12ms is up, so that the race I think I've deduced shouldn't occur.

I agree there seems to be something to figure out. I fully trust your analysis as I'm not familiar with RTX :-)

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

@kjbracey-arm kjbracey-arm commented Sep 24, 2019

the impact of deep sleep latency needs to be described in the LP Timer/Ticker/timeout APIs as I think it's not enough to document the power management / sleep part. As of now, users may program a LP timeout to fire after 150µs and will end up with a 3ms delay or so ...

I agree - the 3ms delay actually impacts many IRQs you could register with attach() (the ones that don't implicitly lock deep sleep) but the discrepancy is kind of more obvious with LowPowerTimeout.

I agree there seems to be something to figure out. I fully trust your analysis as I'm not familiar with RTX

What I'd be interested in from your end is why is the latency compensation not sufficient here? If it was, I think we should handle those semaphores fine. Is the JSON underestimating? Or is there more to it?

Aside from that, you could experiment - try making the RTX modification I'm proposing in the linked issue. Just add a call to osRtxPendSV_Handler() to the top of svcRtxKernelResume, just before incrementing the tick count.

@LMESTM

This comment has been minimized.

Copy link
Contributor

@LMESTM LMESTM commented Sep 25, 2019

@kjbracey-arm ok I take the point, I need to put in place a setup with debug / measurements ... but can't do it this week unfortunately.

@adbridge adbridge added the devices: st label Oct 8, 2019
@TuomoHautamaki

This comment has been minimized.

Copy link

@TuomoHautamaki TuomoHautamaki commented Oct 23, 2019

@LMESTM any progress on this?

@LMESTM

This comment has been minimized.

Copy link
Contributor

@LMESTM LMESTM commented Oct 23, 2019

@TuomoHautamaki @kjbracey-arm I'm affraid I could not make any measurement yet (and can't in coming days). I agree this can be helpful but I don't think it would not bring the solution to this issue.
I understand that @kjbracey-arm pointed to possible issues around RTX or semaphores and proposed a few steps to test in #11545 (comment), could someone from mbed-os team look at that ?

@LMESTM

This comment has been minimized.

Copy link
Contributor

@LMESTM LMESTM commented Oct 28, 2019

@TuomoHautamaki I just tried reproducing the issue again on my current working tree, master branch from Oct 23, and the test doesn't fail - are you still seeing the issue ?

@TuomoHautamaki

This comment has been minimized.

Copy link

@TuomoHautamaki TuomoHautamaki commented Oct 29, 2019

@LMESTM I can see it's still failing but only on IAR

@LMESTM

This comment has been minimized.

Copy link
Contributor

@LMESTM LMESTM commented Oct 29, 2019

@TuomoHautamaki my mistake I ran timeout tests instead of lp_timeout tests ... I can still reproduce as well

@LMESTM

This comment has been minimized.

Copy link
Contributor

@LMESTM LMESTM commented Oct 29, 2019

@kjbracey-arm @TuomoHautamaki I found some time to make measurements and fount out that the total wake-up latency is actually around 3,7ms, this time is used on this board for restoring the complete clock tree: LSE, MSI clocks and PLLs. This may be possible to optimize, but for sake of simplicity and maintenance we go through all the clock settings scheme again.

I can increase the deep-sleep latency for this board to 4ms, which actually makes the tests PASS as far as I've seen. That doesn't explain everything, the test makes use of 2ms DELTA so it should be okay with the 1ms extra latency, but this is not the case.

Nevertheless the low power ticker fires 2ms later than expected and the below point is still valid :

the impact of deep sleep latency needs to be described in the LP Timer/Ticker/timeout APIs as I think it's not enough to document the power management / sleep part. As of now, users may program a LP timeout to fire after 150µs and will end up with a 3ms delay or so ...

I agree - the 3ms delay actually impacts many IRQs you could register with attach() (the ones that don't implicitly lock deep sleep) but the discrepancy is kind of more obvious with LowPowerTimeout.

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

@kjbracey-arm kjbracey-arm commented Oct 29, 2019

If that 3.7ms is a hardware measurement, maybe it doesn't include extra overhead for reading the timer, etc. (Embedded in SysTimer::update_and_get_tick()).

It might be worth (temporarily) adding a test assert in do_timed_sleep_relative that sleep_finish - sleep_start <= wake_delay. I think that if that ever triggered, it would be because the latency compensation is insufficient.

@LMESTM

This comment has been minimized.

Copy link
Contributor

@LMESTM LMESTM commented Oct 29, 2019

If that 3.7ms is a hardware measurement, maybe it doesn't include extra overhead for reading the timer, etc. (Embedded in SysTimer::update_and_get_tick()).

The 3,7ms is the time it takes to mostly restore clocks, so this is the SW sequence of the deep sleep exit, from the HW interrupt (end of WFI) to the point where the interrupt handler is called. I'm not sure about your next statement.

Anyway, I still think that something is wrong with the LP timeout and the test.

The test is supposed to verify that the lp_timeout handler when programmed to fire after 10ms will be called within 10 to 12ms max. But in case of deep sleep, the handler will only be called after 13ms to 14ms as the timeout driver does not consider the deep sleep latency. And the test fails to detect it.

The only reason why the test is passed is that the system is woken-up early by the semaphore itself (12ms max - deep sleep latency) before the 10ms has expired. But the semaphore is supposed to be used as a test environment, not as the reason for waking-up the system.

@0xc0170

This comment has been minimized.

Copy link
Member

@0xc0170 0xc0170 commented Oct 30, 2019

Closed via #11767

@0xc0170 0xc0170 closed this Oct 30, 2019
@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

@kjbracey-arm kjbracey-arm commented Oct 30, 2019

You're right, the test is flawed, because it doesn't at all consider latency on the low power timer IRQ.

My description above was that when considering application timing correctness for things scheduled by the RTOS (eg ThisThread::sleep_until(XXX)), we need to also incorporate the time taken to get back to the application, not just the time taken to get out to interrupt handler after wake.

So the extra work versus your measurement is:

  • execute timer interrupt handler (SysTimer::handler) and return
  • note that it's time to wake (SysTimer::wake_time_passed, which looks at bool set by interrupt handler),
  • read actual current time from HW
  • osKernelResume(elapsed time)
  • switch to task

That latency number is currently used for RTOS scheduling, so notionally should incorporate those steps.

I note that the slowest step there is "read actual current time from HW". Conceptually, we could skip it if we see that wake_time_passed is true - assume that we did wake in time, so the time must be exactly the scheduled wake time. (As opposed to an early wake from another interrupt). Being "optimistic" about wake delays and assuming we're never late might make part of the problem go away?

@LMESTM

This comment has been minimized.

Copy link
Contributor

@LMESTM LMESTM commented Oct 30, 2019

@kjbracey-arm

Most of the wake-up tme is spent wiaiting for HW clocks (in particular PLLs) to be ready and stable
I've measured a few timings on NUCLEO_L073RZ which a pretty slow target, here they are:

  • wake-up interrupts T = 0
  • exit of HAL_DeepSleep, HW is ready T1 = T0 + 3,7ms
  • enter user callback / interrupt handler (in this test the semaphore callback): T2 = T1 + 26µs
  • Back to main thread (incl. all steps you've listed + semaphore acquire handling) T3 = T2 + 274µs

So that's up to extra 0,3ms on a slow target for the full handling back to the main thread.
But the callback that was attached to fire after a given time seems only to be delayed by an extra 0.226µs in this case compared to the "HW latency".

How shall we proceed with the test flaw and the fact that lp_ticker and related drivers do not take into account the deep-sleep-latency ?

@LMESTM

This comment has been minimized.

Copy link
Contributor

@LMESTM LMESTM commented Oct 30, 2019

FYI - what is looks like on NUCLEO_L073RZ when there is no semaphore to wake-up the target. (To do so, I extended the semaphore timeout to 20ms in the test ...)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.