Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ztimer is incompatible with real-time requirements #18883

Open
benpicco opened this issue Nov 11, 2022 · 12 comments
Open

ztimer is incompatible with real-time requirements #18883

benpicco opened this issue Nov 11, 2022 · 12 comments
Assignees
Labels
Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@benpicco
Copy link
Contributor

benpicco commented Nov 11, 2022

Description

ztimer_set() and ztimer_remove() will disable all interrupts for the run-time of the function.
The run-time of the function is O(n) where n is the number of timers on the clock.
edit but actual timer operations (which are what is expensive here) are still only performed once per call.

This brings unpredictable latency to any interrupt handling when some thread does set / remove timers (e.g. by using ztimer_periodic for periodic events or starting timeout timers based on network events).

The same is true for evtimer. (see #18795)

Expected results

Ztimer would use more fine-grained locking, e.g. only disabling the timer interrupt or only disabling interrupts for very brief periods.

Versions

@jue89
Copy link
Contributor

jue89 commented Nov 14, 2022

I'd say every ISR with complexity that's not O(1) introduces the same problem, right?

@kaspar030
Copy link
Contributor

(I seem to not have sent my initial comment here :/)

Yeah, ztimer is O(n). So is mutex_lock() (number of waiters in mutex list), msg_send() (number of waiters).
mutex and msg are bounded by max_threads (32).
ztimer is not, it would make the API fallible. So would using finer-graned locking when using a mutex, as that won't work (in the contended case) in ISRs.

Maybe there's some smart lock-free scheme for managing the list ...

BUT:

The same is true for evtimer. (see #18795)

That shows 10k ticks in ztimer. I don't think that's the list traversal, for any reasonable amount of active timers, that number should be a lot lower. What board / timer hardware was that tested on? Is this not hitting e.g., some synchronization between clock domains like on samr21, when calling backend functions?

@benpicco
Copy link
Contributor Author

Yes it's not list traversal that's slow, it's the actual periph_timer/periph_rtt operations. When you have multiple of those, them adding up really becomes noticable.

I'm not talking about lock-free schemes or anything fancy here. What I'm asking is: Could we replace the global irq_disable() with a NVIC_DisableIRQ() on the timer interrupt? (Of course behind a new periph_timer API function)

@kaspar030
Copy link
Contributor

Hm, I think it's not just the timer interrupt that those calls need to be protected from. Maybe!

But, maybe we can also work around those synchronization delays at the periph_rtt level, maybe through some offloading scheme? IIRC, the read can actually be cached (but be outdated in that case). And the synchronization can be interrupt-driven.
There must be a way, or are people using those peripherals only for human-scale timers?

@benpicco
Copy link
Contributor Author

benpicco commented Nov 15, 2022

Reading through the code it isn't actually that bad. If I understood it correctly actual timer operations are just O(1) so they should not compound if there are multiple timers queued.

Iterating the list is negligible.

I think the main issue is that we allow ztimer_set() from ISR, otherwise we could just add a mutex to ztimer_clock_t and call it a day.

@jue89
Copy link
Contributor

jue89 commented Nov 15, 2022

I think the main issue is that we allow ztimer_set() from ISR, otherwise we could just add a mutex to ztimer_clock_t and call it a day.

But which thread shall be in charge of setting the next timer?

Aside from that: Currently we rely on this happening within certain time that never crosses an upper bound. Otherwise, we miss that point in time and the timer won't fire, because we set it to a value that already has been passed.

@kaspar030
Copy link
Contributor

Aside from that: Currently we rely on this happening within certain time that never crosses an upper bound. Otherwise, we miss that point in time and the timer won't fire, because we set it to a value that already has been passed.

yes, the backends have minimum values for that reason. Quite high for some, like, the samr21 rtt has 31us?
Unfortunately, that's also the reason why setting timers has to happen in a critical section. :(

@kaspar030
Copy link
Contributor

I think the main issue is that we allow ztimer_set() from ISR, otherwise we could just add a mutex to ztimer_clock_t and call it a day.

The main issue is that some hardware takes ages to do their operation. If we'd disallow ztimer_set() from ISR, a thread would block on the timer operation.

On samr21, during sync, in some cases the whole peripheral bus is blocked for up to six slow clock ticks (30.5us at 32768hz), so even unprotected the MCU is practically blocked.

@jue89
Copy link
Contributor

jue89 commented Nov 16, 2022

yes, the backends have minimum values for that reason. Quite high for some, like, the samr21 rtt has 31us?
Unfortunately, that's also the reason why setting timers has to happen in a critical section. :(

I think we shouldn't add complexity into ztimer because of quirky hardware. Maybe we can fix the rtt driver that doesn't wait for the sync and rather make sure that the next rtt_set_alarm() will wait for the preceding sync? That would help a lot, I'd say!

@jue89
Copy link
Contributor

jue89 commented Nov 16, 2022

@benpicco see the linked PR above ... does this improve the situation in your setup?

@dylad
Copy link
Member

dylad commented Nov 16, 2022

Could #17403 be of any help here ?

@jue89
Copy link
Contributor

jue89 commented Nov 16, 2022

At least, it would speed up the ztimer_now() calls, I guess. But I don't have a MCU at hand that supports the cont sync.

@maribu maribu added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

No branches or pull requests

5 participants