Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: timers #3542

Merged
merged 151 commits into from
Dec 20, 2022
Merged

feat: timers #3542

merged 151 commits into from
Dec 20, 2022

Conversation

ggreif
Copy link
Contributor

@ggreif ggreif commented Nov 2, 2022

This PR adds the necessary primitives for the base library to rely on.

setTimer : (delayNanos : Nat64, recurring : Bool, job : () -> async ()) -> (id : Nat)
cancelTimer : (id : Nat) -> ()

It also adds a mechanism system func timer, so that the user can choose to provide a custom implementation.
A new compiler flag -no-timer is also provided to disable all timer-related functionality (e.g. for critical applications).


How it works

By default the timer mechanism is enabled, and the setTimer/cancelTimer utilities can be used. If the system func timer is present, the user is responsible for the timer implementation, i.e. she can supply a very simple or very elaborate mechanism. When moc is invoked with -no-timer then the whole mechanism is disabled. This is useful for minimising the code footprint.

Canister-global timer discipline

The IC's underlying timer mechanism is one timer per canister. The canister infrastructure is responsible for providing the abstraction of several (and recurring) timers. Timer expiration is best-effort, i.e. the timer's action will never be executed before the desired expiration time, but there is no guarantee of execution at expiration time. It will usually be later.

The default mechanism

When the user doesn't choose to implement system func timer (and doesn't suppress it by -no-timer), a default mechanism gets generated. The default mechanism is basically a call to the builtin @timer_helper, which will be provided by the compiler.

In the default mechanism the user calls the setTimer call from (Motoko-base). This will register an expiration relative to the current time in form of a timer node that @timer_helper will visit. Correspondingly a cancelTimer call will expunge a timer node. Canister-global timer expirations will also do this, as well as pruning the node tree (a priority-queue).

The Node structure

Nodes will track the expiration time, the timer Id, and optionally a duration for recurring timers. For administrative reasons they also keep pointers to the earlier and later trees. Each timer node further has a thunk (a function of type () -> async ()) which (when called) returns a future (concurrent action).

The most important invariant for nodes is that the expiration time divides the earlier and later trees temporally:

  • every expiration on a Node in the earlier-tree is <= the expiration of the parent node, and analogously
  • every expiration on a Node in the later-tree is >= the expiration of the parent node.

There is a twist to the expiration of the Nodes though, as they are mutable, and are set to 0 when the timer expires or gets cancelled. So the above invariant is strictly only valid for nodes (with their child trees) where no node therein is expunged.

This allows for some neat optimisations. E.g. expiration is caused by time marching from early to later, so if a node's expiration is 0 then we can assume that the earlier-tree is completely expired. Pruning (the cleaning of a timer tree from expired nodes) becomes a later-tree-only operation in this case.

The node expired implies earlier-tree expired is a corollary of the fundamental invariant with expungable nodes.

Insertion

Timer Ids are given out as 1.. and never reused (but begins after upgrade from 1 again).
The initial insertion of a timer (setTimer) respects the invariant that a higher Ids will execute after lower Ids when they have the same expiration time. For recurrent expirations this is also the case if the delays are identical.

Cancellation

Each timer node carries the Id, and this allows expunging the node when cancelTimer is called. To maintain the above corollary, each cancelTimer must be followed by a compacting operation (basically pruning, but considering the earlier-trees too). Alternatively a grafting operation can be used, appending the later-tree onto the latest slot of the earlier-tree.

It is important to note that while code is running that expunges nodes as a result of canister global timer expiration, there must not be calls to cancelTimer (or similar) since that would likely interfere with the corollary. This is only potentially a problem for self-implemented system func timer, as the default mechanism will use @timer_helper, which is not doing such things.

Recurrence

When a timer expires that was set up being recurrent, the timer node needs to be re-added to the timer tree but with an expiration shifted by the timer's delay. This will happen in the loop that expunges timers that are already expired. Thus it may happen that the loop will detect the same timer and expunge it again. For a delay of 0 this might cause an infinite loop, so recurrent timers with 0 delay will be treated as non-recurring. 5b9b62f implements this by avoiding re-adding.

Tree walking and reconstruction

Several operations walk (traverse) the timer tree. setTimer in particular prunes the expired nodes while adding the new node and similarly the reinsertion of recurrent timers.
Since expiration times are kept in mutable fields array elements, node reinsertion due to recurrence will not cause duplicated expirations when tree nodes get copied. This is due to mutable cell array aliasing in the tree walk code. If we decide to get rid of this feature we have to come up with a different scheme to suppress duplicated firing of timers.

Setting the canister-global timer

There are three triggers to call the system API global_timer_set:

  • connected to setTimer (with the expiration time of the earliest active timer)
  • when the timer expiration callback is being invoked
    • and there is another active timer
    • no active timer (global_timer_set(0))
  • cancelTimer is called, and there is no active timer remaining.

Trap avoidance

A timed action trapping must not impede the execution of other actions, and thus we use the type () -> async () for registering such actions with setTimer. This ensures that

  • each action has its own future (even for each recurrence)
  • each future gets run in its own async context and as such traps and throws are isolated.

Reclaiming the continuation slots

Note: originally the futures were awaited, but this seems redundant and has been removed in 204d87a. Also, an upgrade test is now run to check that executed timer actions don't cause stuck canisters.

Upon timer expiration the callback is responsible for installing the workers into the continuation table (this is the async part), but it is also important to await those futures, so they don't fill up the continuation table. I expect that this can be done by a trailing await of an async forall <futures> try await <it> catch {}, since the canister_global_timer endpoint is called in an async context.

The @timer_helper builtin

When no user-written system func timer is present, moc will insert a call to an internal function @timer_helper. This
works by creating a worklist of expired timers relative to the current time, then expunging the corresponding nodes and re-inserting recurring ones with respective delays added.

The worklist is then transformed into an array of futures, each in its trap-proof compartment. Finally all the pending futures are awaited.

The upgrade story

Easy. The global timer gets jettisoned on upgrade, and the timers need to be set up in the post-upgrade hook. Stable variables can be used to remember the timers (not the Ids, but the setup parameters) if they don't have a rigid structure.
Note: The timers themselves cannot be stored in stable variables as the job's type involves async and also we have no mechanism for locating the same function after an upgrade automatically.

Opting out

To opt-out of the default mechanism the user just declares system func timer. By using the passed in function argument to set the global timer, full control is given to either implement a very basic wakeup scheme or an even more elaborate mechanism than the one presented above.

The initial expiration

After the canister starts running, system func timer gets invoked automatically. This also happens after an upgrade.

An example of a periodic wakeup

The simplest way to achieve a periodic wakeup of a canister with fixed intervals can be coded as

import Time "mo:base/Time";
import { fromIntWrap } = "mo:base/Nat64";

actor {

  system func timer(set : Nat64 -> ()) : async () {
    set(fromIntWrap(Time.now()) + 60_000_000_000); // 60 seconds from now
    doSomething();
  };

  func doSomething() {
    // whatever
  }

}

An example of a exponential approximation

Imagine you are at a hackathon, and there is a deadline to submit the hash of your solution. In the last hour you want to be reminded 60, 30, 15, 7.5, etc. minutes before the deadline. Following implementation could serve the purpose:

import { now } = "mo:base/Time";
import { fromIntWrap } = "mo:base/Nat64";
import { print } = "mo:base/Debug";

actor {
  let hour = 60 * 60_000_000_000;
  let deadline = now() + hour; // nanos

  system func timer(set : Nat64 -> ()) : async () {
    let time = now();
    let toGo = deadline - time;
    if (toGo < 0) { set 0; return };

    debug { print("Still to go: " # debug_show toGo) };

    set(fromIntWrap(time + toGo / 2));
  }
}

Open question(s)

  • Should the recurring timer be re-inserted with expiration relative to old expiration time or current time (of callback)? — to avoid time skew, using the planned expiration time as the base
  • Can we use async* to avoid the context switch when calling into @timer_helper? — probably yes, but in another PR
  • Stopped canisters won't receive global timer expirations, but the global timer remains in the same state. So upon canister_start all expired timers will fire. Recurring timers may have expired several times in the canister's stopped state so will begin to catch up. This may lead to unnecessary repeated work. See feat: when re-adding recurrent timers, skip over past expirations #3871.

TODO:

  • add -no-timer option to test/perf sources — dbfaf54
  • release notes
  • check that setTimer documented as taking nanoseconds
  • debug deactivated tests (compacting GC in ic-ref are a possible indication of GC bug! Or maybe starvation.)
  • stabilisation (needed? — CWV: no)
  • tests
    • after upgrade, still on? ("the timers get canceled on upgrade")
    • when defining system func timer, after upgrade it also gets called — 8df4d6f
    • can upgrade after a timer has run
    • test recurrent delay 0 — See 5b9b62f and c440b3e.
    • does actor class behaves correctly?
  • docs: doc/md/heartbeats.md
    • what more is needed?
  • run with ic-ref too (currently no support, but see Introduce canister global timer interface-spec#111)
  • var aliasing is being used to good effect (can we remove it?) — replaced in resort to one-element mutable arrays as a workaround for aliased var field #3617
  • check the invariant: no more expiration <=> global timer is 0
  • add test/fail for -no-timer
  • cancelling of a timer must be followed by compaction (to not break the corollary)
  • duration 0 with recurring timers should be excluded
  • traps in jobs, catch
  • add the low-level API setGlobalTimer (but see also: make setGlobalTimer a scoped capability #3614)
  • try it on the Playground — works!

@github-actions
Copy link

github-actions bot commented Nov 2, 2022

Comparing from 0458e2a to 0137786:
In terms of gas, 4 tests regressed and the mean change is +0.0%.
In terms of size, 4 tests regressed and the mean change is +0.0%.

src/codegen/compile.ml Outdated Show resolved Hide resolved
nix/drun.nix Outdated Show resolved Hide resolved
};
system func timer(set : Nat64 -> ()) : async () {
Prim.debugPrint ("timer endpoint in 4");
set 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed for (outdated?) drun only:

diff --git a/test/run-drun/ok/upgrades.drun.ok b/test/run-drun/ok/upgrades.drun.ok
index e0de5c306..a20a83383 100644
--- a/test/run-drun/ok/upgrades.drun.ok
+++ b/test/run-drun/ok/upgrades.drun.ok
@@ -32,5 +32,25 @@ debug.print: init'ed 4
 debug.print: timer endpoint in 4
 ingress Completed: Reply: 0x4449444c0000
 Ok: Reply: 0x4449444c0000
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
 ingress Completed: Reply: 0x4449444c0000
 Ok: Reply: 0x4449444c0000
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4
+debug.print: timer endpoint in 4

| `inspect` | `{ caller : Principal; msg : <Variant>; arg : Blob } -> Bool` | message predicate |
| `preupgrade` | `() -> ()` | pre upgrade action |
| `postupgrade` | `() -> ()` | post upgrade action |

- `heartbeat`, when declared, is called on every Internet Computer subnet **heartbeat**, scheduling an asynchronous call to the `heartbeat` function. Due to its `async` return type, a heartbeat function may send messages and await results. The result of a heartbeat call, including any trap or thrown error, is ignored. The implicit context switch means that the time the heartbeat body is executed may be later than the time the heartbeat was issued by the subnet.

- `timer`, when declared, is called as a response of the canister global timer's expiration. The canister's global timer can be manipulated with the passed-in function argument of type `Nat64 -> ()` upon which libraries can build their own abstractions. When not declared (and in absence of the `-no-timer` flag), this system action is provided with default implementation by the compiler (additionally `setTimer` and `cancelTimer` are provided as primitives).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the argument here in nanoseconds (like the system API) or seconds, like your prims?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in nanos. (Not a convenience interface.)

@crusso
Copy link
Contributor

crusso commented Dec 15, 2022

I guess one remaining comment is how does this API compare to the Rust one, e.g. in terms of timer resolution (delay in nanoseconds or seconds, like here). I guess we should aim to keep them as similar as possible for ease of writing dual language examples.

Cf. https://github.com/dfinity/examples/pull/419/files for the Rust example.

Right, they are constructing the Duration from seconds, but I think you can also construct from nanoseconds - plus it would be best if the unit was consistent with whatever the capability is expecting, I think. I assume Time.now is in nanoseconds?

@ggreif
Copy link
Contributor Author

ggreif commented Dec 15, 2022

I assume Time.now is in nanoseconds?

Yes. I am a bit torn whether to make the setTimer primitive nanoseconds-based. I am pretty sure we will never see (significantly) sub-second expiration granularity on the IC, so seconds looks like the correct unit of measure. OTOH the system call is defined in nanos, so we could easily support that. But I have seen lots of confusion where people were passing 10 for a nanosecond argument and wondering that it expired immediately. In base we should have a setTimer(duration : { #seconds : Nat64; #nanoseconds : Nat64)

@crusso
Copy link
Contributor

crusso commented Dec 15, 2022

I assume Time.now is in nanoseconds?

Yes. I am a bit torn whether to make the setTimer primitive nanoseconds-based. I am pretty sure we will never see (significantly) sub-second expiration granularity on the IC, so seconds looks like the correct unit of measure. OTOH the system call is defined in nanos, so we could easily support that. But I have seen lots of confusion where people were passing 10 for a nanosecond argument and wondering that it expired immediately. In base we should have a setTimer(duration : { #seconds : Nat64; #nanoseconds : Nat64)

I would keep the low-level API as close to the System API as possible (perhaps even returning the Nat64 in case it ever gets used for something useful), but I like the idea of using the variant in Timer.setTimer. That does require nanoseconds for the prims though, right, unless you want to round.

@dfinity-berestovskyy
Copy link

IMO the ic0.global_timer_set() argument type should be exactly the same what ic0.time() returns, i.e. an expression like ic0.global_timer_set(ic0.time() + 1) should just work (in Motoko primitives).

We can't set an absolute timer without knowing the current time. So it would be counter-intuitive if we get the current time, and then we need to convert it to feed into the timer...

doc/md/language-manual.md Outdated Show resolved Hide resolved
ggreif and others added 2 commits December 16, 2022 16:34
Co-authored-by: Claudio Russo <claudio@dfinity.org>
@ggreif
Copy link
Contributor Author

ggreif commented Dec 16, 2022

low-level API as close to the System API as possible

Done that. PTAL.

@@ -1,4 +1,5 @@
RUNFLAGS = -p
export EXTRA_MOC_ARGS = -no-timer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this avoids the 3% size increase

Copy link
Contributor

@crusso crusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ggreif ggreif added automerge-squash When ready, merge (using squash) and removed DO-NOT-MERGE labels Dec 20, 2022
@mergify mergify bot merged commit 75255ed into master Dec 20, 2022
@mergify mergify bot removed the automerge-squash When ready, merge (using squash) label Dec 20, 2022
@mergify mergify bot deleted the gabor/timers branch December 20, 2022 11:34
ggreif added a commit to dfinity/motoko-base that referenced this pull request Dec 29, 2022
Adds the public API to support the default timer mechanism
- `setTimer : (Duration, () -> async ()) -> TimerId` to insert a one-off timer
- `recurringTimer : (Duration, () -> async ()) -> TimerId ` to insert a repeating timer
- `cancelTimer : TimerId -> ()` to remove timer

where `Duration` is a variant that can specify a timespan in seconds or nanoseconds.

Please refer to the PR dfinity/motoko#3542 for the gory details.

Some caveats:
- importing `Timer.mo` will fail when invoked as `moc -no-timer`
- these functions have no effect (other than filling the heap) when the user chooses to implement `system func timer` herself (i.e. opting out of the Motoko default timer mechanism).
- `Duration` should be well above the round time to make this mechanism more efficient than `system func heartbeat`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants