Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement InMemoryCache garbage collection and eviction. #5310

Merged
merged 11 commits into from
Sep 13, 2019

Conversation

benjamn
Copy link
Member

@benjamn benjamn commented Sep 12, 2019

This PR adds a handful of powerful new methods to the official InMemoryCache class:

  • The cache.gc() method removes any unreachable objects from the normalized cache, using a tracing garbage collection strategy (as opposed to reference counting). Reachability is determined starting from a set of root entity IDs, which includes any IDs that have been directly written to the cache—including the synthetic ROOT_QUERY ID, which is the default ID used when no specific ID is provided. From this root set of reachable IDs, garbage collection traces all child references (computed lazily and cached aggressively) until the entire reachable reference graph has been visited. Any remaining unvisited IDs are then removed from the cache, and the cache.gc() function returns a list of those removed ID strings.

As of this writing, the cache.gc() method is never called automatically, but must be invoked by the application developer. Although garbage collection becomes more and more efficient over time, the first garbage collection takes time proportional to the number of entities in your cache, so you might want to run it during an idle period after page load.

  • The cache.retain(id) method can be called to treat specific entities as immediately reachable "root" entities, which protects them (and all their transitive child references) from garbage collection. Note that this method is called automatically for the IDs written using writeQuery or writeFragment. Our hope is that this method will not be needed very often, though it can be essential when you don't want certain data to be removed.

  • The cache.release(id) method undoes the effect of cache.retain(id), allowing the given ID to be garbage collected once it becomes otherwise unreachable. Note that cache.release(id) must be called at least as many times as cache.retain(id) was called for a given id. Both methods return the new retainment count. This retainment count is not a "reference count" in the usual sense, since most entities will never need to be retained or released, and we are not using a reference counting garbage collection strategy. In particular, cache.release(id) does not automatically trigger garbage collection when the retainment count falls back to zero.

  • The cache.evict(id) method forcibly removes the given rootId from the cache, but does not immediately trigger garbage collection, since you might want to evict multiple IDs before triggering a single garbage collection. With that said, removing an entity from the cache can have significant consequences for the reachability of other entities, so eviction should usually be followed by garbage collection.

Closes #3965, #4681, #4917, #4916

Part of apollographql/apollo-feature-requests#4

@benjamn benjamn added this to the Release 3.0 milestone Sep 12, 2019
@benjamn benjamn self-assigned this Sep 12, 2019
benjamn added a commit that referenced this pull request Sep 12, 2019
This was referenced Sep 12, 2019

// Although the EntityCache class is abstract, it contains concrete
// implementations of the various NormalizedCache interface methods that
// are inherited by the Root and Layer subclasses.

public toObject(): NormalizedCacheObject {
return this.data;
return { ...this.data };
Copy link
Member Author

@benjamn benjamn Sep 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the StoreWriter class updates entities non-destructively, this shallow copy of this.data is sufficient to provide a complete, immutable snapshot of the cache.


// If you're ever tempted to do this, you probably want to use cache.clear()
// instead, but evicting the ROOT_QUERY should work at least.
expect(cache.evict({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also called "Welcome to the fragment zone"

if (idsToRemove.length) {
let root: EntityCache = this;
while (root instanceof Layer) root = root.parent;
idsToRemove.forEach(root.delete, root);
Copy link
Member Author

@benjamn benjamn Sep 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, this idiom is equivalent to

idsToRemove.forEach(id => root.delete(id))

except that it does not allocate a new function object.

This implementation currently requires calling cache.gc() manually, since
the timing of garbage collection is subject to developer taste.
Copy link
Contributor

@jbaxleyiii jbaxleyiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamn as always, this is wonderful work filled with helpful comments and the occasional chuckle when reading the tests. I really appreciate your styling in the commits and actual implementation here.

I've left one comment that I'd love answered before moving this to an approve. Namely, do we still need the query parameter to the eviction option?

src/cache/inmemory/entityCache.ts Show resolved Hide resolved
public gc() {
const ids = this.getRootIdSet();
const snapshot = this.toObject();
ids.forEach(id => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method for creating a stack is just so cool

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree!!

const idsToRemove = Object.keys(snapshot);
if (idsToRemove.length) {
let root: EntityCache = this;
while (root instanceof Layer) root = root.parent;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so this works up the layers to delete ids

Copy link
Member Author

@benjamn benjamn Sep 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I think about it:

  • Optimistic layers are temporary, so we don't really need to worry about garbage collecting them, since they should be totally removed sometime soon.
  • The goal of garbage collection, then, is to remove unreachable IDs from the Root layer of the cache.
  • The gc() method might have been called against a Layer object, which is important because optimistic Layers can retain otherwise unreachable entities.
    • If the layers didn't matter for garbage collection, we could just skip to the root at the beginning of the gc() method.
    • Instead, we skip to the root at the end of the gc() method to perform the final deletions of unreachable entities. That's what this code does.

src/cache/inmemory/entityCache.ts Outdated Show resolved Hide resolved
@@ -125,27 +211,27 @@ export namespace EntityCache {
// of the EntityCache.Root class.
class Layer extends EntityCache {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably out of scope for this PR, but I think it would be quite helpful to write a set of comments on the architecture of the cache and its layers.

src/cache/inmemory/entityCache.ts Show resolved Hide resolved
src/cache/inmemory/helpers.ts Outdated Show resolved Hide resolved
src/cache/inmemory/inMemoryCache.ts Outdated Show resolved Hide resolved
In practice (by which I mean while writing tests), I found it way too
difficult to keep track of owner objects effectively. If an application
developer wishes to implement a more sophisticated root ID tracking
system, they have all the tools they need. In theory, the developer should
balance retain and release calls, though one can always call release
repeatedly until it returns 0 to release the ID completely.
Eviction always succeeds if the given options.rootId is contained by the
cache, but it does not automatically trigger garbage collection, since the
developer might want to perform serveral evictions before triggering a
single garbage collection.

Resolves apollographql/apollo-feature-requests#4.
Supersedes #4681.
benjamn added a commit that referenced this pull request Sep 12, 2019
Per this discussion with @jbaxleyiii:
#5310 (comment)

tl;dr: This could be a breaking change for other Apollo Cache
implementations, but InMemoryCache and the Hermes cache are the only two
cache implementations we know about, and neither of them have implemented
the evict method so far.
@benjamn benjamn force-pushed the cache-garbage-collection-and-eviction branch from a1fd24c to 7c5b4e5 Compare September 12, 2019 21:03
Per this discussion with @jbaxleyiii:
#5310 (comment)

tl;dr: This could be a breaking change for other Apollo Cache
implementations, but InMemoryCache and the Hermes cache are the only two
cache implementations we know about, and neither of them have implemented
the evict method so far.
@benjamn benjamn force-pushed the cache-garbage-collection-and-eviction branch from 7c5b4e5 to 3b659f8 Compare September 12, 2019 21:04
Copy link
Member

@hwillson hwillson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is AMAZING @benjamn! Thanks very much for working on this, and integrating these changes in such a well thought out and easy to grok manner. I have a few comments below (mostly thoughts or questions), and an additional one here - it looks like we have a circular dependency:

Circular dependency: src/cache/inmemory/inMemoryCache.ts -> src/cache/inmemory/writeToStore.ts -> src/cache/inmemory/inMemoryCache.ts

It isn't caused by the changes in this PR, but just a heads up in-case you want to address it here.

Thanks again!

src/cache/inmemory/entityCache.ts Show resolved Hide resolved
src/cache/inmemory/inMemoryCache.ts Show resolved Hide resolved
src/cache/inmemory/entityCache.ts Show resolved Hide resolved
@benjamn benjamn merged commit 219ddaa into release-3.0 Sep 13, 2019
StephenBarlow pushed a commit that referenced this pull request Oct 1, 2019
StephenBarlow pushed a commit that referenced this pull request Oct 1, 2019
Per this discussion with @jbaxleyiii:
#5310 (comment)

tl;dr: This could be a breaking change for other Apollo Cache
implementations, but InMemoryCache and the Hermes cache are the only two
cache implementations we know about, and neither of them have implemented
the evict method so far.
StephenBarlow pushed a commit that referenced this pull request Oct 1, 2019
@dminkovsky
Copy link
Contributor

Thank you for this.

As of this writing, the cache.gc() method is never called automatically

Is this still the case?

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants