Push vs Pull for caching

Problem

There are many places in Servo where we would like to cache results of functions, so they can be reused if the arguments haven't changed. For example, the layout code keeps track of dirty bits in the DOM to avoid reflowing, and live HTMLCollections can cache their state to avoid requerying the DOM each time they are accessed.

A simplified version of HTMLCollections (ignoring JS annotations and garbage collecdion issues):

struct HTMLCollection {
    root: &Node,
    filter: &Filter,
    cached_length: Cell<Option<u32>>,
    ...
}

impl HTMLCollection {
    fn length(&self) -> u32 {
        self.cached_length.get().unwrap_or_else(|| {
            let length = self.calculate_length();
            self.cached_length.set(Some(length));
            length
        }
    }
    fn calculate_length(&self) -> u32 { ... }
    ...
}

The problem with caching is that the caches must be invalidated when the underlying state has changed, in the case of HTMLCollection, this is when the root DOM node or any of its descendants are mutated. Adding a method to invalidate the cache is easy:

impl HTMLCollection { ... fn invalidate_cache(&self) { self.cached_length.set(None); ... } ... }

the problem is knowing when to call this function. There are two basic strategies: pull invalidation and push invalidation.

Pull

In this strategy, the underlying state just keeps track of whether its state has changed, and it is the responsibility of the cache to check for state changes. The simplest way to do this is to include a version number in the state, and to ensure the version number is increased when the state changes. For example:

struct HTMLCollection {
    ...
    cached_version: Cell<u64>,
    ...
}

impl HTMLCollection {
    fn length(&self) {
        if self.cached_version.get() != self.root.version() { self.invalidate_cache(); }
        ...
    }
    fn invalidate_cache(&self) {
       self.cached_version.set(root.version());
       self.cached_length.set(None);
       ...
   }
}

In the case that there's only one cache, we can replace the version number by a dirty flag: this is the approach that's used by the layout engine.

Push

For push notification of state changes, we can use the observer pattern:

impl HTMLCollection {
    fn new(root: &Node, filter: &Filter) -> HTMLCollection {
        let result = HTMLCollection{ ... }
        root.add_mutation_observer(|| { result.invalidate_cache() });
        result
    }
    ...
}

Each time the root is updated, it notifies all of its observers, and in the case of HTMLCollections, they invalidate their cache.

Trade-offs

Space

The pull strategy uses one extra word per state object (for the version) plus one extra word per cache (the cached version). The push strategy uses a vector per state object containing one entry per cache (to keep track of the observers).

Time

In both cases, state change triggers cache invalidation, the only difference is when cache invalidation happens. In the pull strategy, cache invalidation is lazy: it is only performed when the cache is used. In the push strategy, it is strict, and is performed each time mutation happens.

Garbage collection

In the case of caching of JS objects, we have to worry about how they will be garbage collected.

The pull strategy is only storing version numbers, so has no impact on gc.

The push strategy is keeping an observer list in the state object. If those references are strong (for example, as required by the MutationObserver API) the caches will be kept alive by the state object.

Losing state changes

One major difference between the two approaches is how they handle multiple state changes. In the pull approach, cache invalidation only happens when the cache is used, and is not aware of how many state changes happened, just whether or not any did. In the push approach, caches are notified of every state change. Since cache invalidation is idempotent, we can use either strategy, but there may be non-idempotent cases where push is more appropriate.

DOM API issues

In the case where the state object is a DOM node, we can use something like the MutationObserver API, but there are some issues that make it difficult to use directly. Firstly, the API requires references from nodes to observers to be strong references, so we have gc issues as discussed above. Secondly, the API schedules notifications as microtasks, so it would be possible for a previously scheduled microtask to be run after the DOM mutation, but before DOM mutation observers have been notified. Because of these issues, we would probably need a variant of MutationObservers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly