Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize repeated apollo-cache-inmemory reads by caching partial query results. #3394

Merged
merged 97 commits into from Sep 28, 2018

Conversation

@benjamn
Copy link
Contributor

commented May 3, 2018

Reading repeatedly from apollo-cache-inmemory using either readQueryFromStore or diffQueryAgainstStore currently returns a newly-computed object each time, even if no data IDs in the cache have changed.

Passing the previousResult option can improve application performance by ensuring that equivalent results are === to each other, but the presence of previousResult only makes the cache reading computation more expensive, because new objects are still created and then thrown away if they are structurally equivalent to the previousResult.

This PR is a work-in-progress with the goal of returning previous results (including nested result objects, not just the top-level result) immediately, without any unnecessary recomputation, as long as the underlying data IDs involved in the original computation have not been modified in the meantime.

This functionality is based on an npm package called optimism that I wrote to salvage rebuild performance for Meteor 1.4.2, by avoiding unnecessarily rereading files from the file system. It is not an overstatement to say that Meteor would no longer exist as a project without this powerful caching technique.

The optimism library allows caching the results of functions based on (a function of) their arguments, while also keeping track of any other cached functions that were called in the process of evaluating the result, so that the result can be invalidated (or "dirtied") when any of the results of those other functions are dirtied. Dirtying is a very cheap, idempotent operation, since it does not force immediate recomputation, but simply marks the dirtied result as needing to be recomputed the next time the cached function is called with equivalent arguments.

If this approach is successful, it should effectively close the performance gap between apollo-cache-inmemory and https://github.com/convoyinc/apollo-cache-hermes, at least as far as cache reads are concerned, without sacrificing exactness.

Cache write performance should also benefit dramatically, since much of the cost of writing to the cache comes from broadcasting new results for existing queries, which requires first rereading those results from the updated cache.

Along the way, I have taken many opportunities to refactor and simplify the apollo-cache-inmemory code. For example, the first few commits in this PR eliminate the use of graphql-anywhere to read from the local store, which unlocks a number of optimization opportunities by removing a relatively opaque layer of abstraction.

I will try to add comments to the commits below to highlight areas of special interest.

@meteor-bot

This comment has been minimized.

Copy link

commented May 3, 2018

Warnings
⚠️

❗️ Big PR

Generated by 🚫 dangerJS

const args = argumentsObjectFromField(field, variables);

const info: ExecInfo = {
resultKey: resultKeyNameFromField(field),

This comment has been minimized.

Copy link
@stubailo

stubailo May 3, 2018

Member

I'm curious if this would be faster if we cached resultKey as a property on field.

@clayne11

This comment has been minimized.

Copy link
Contributor

commented May 4, 2018

This is a fantastic idea. I brought up this issue as it pertains to rendering in #2895 but this will also solve other performance problems. Great work!

return diffQueryAgainstStore({
store: this.config.storeFactory(this.extract(query.optimistic)),
store: store,

This comment has been minimized.

Copy link
@bbrzoska

bbrzoska May 10, 2018

: store can be omitted.

options?: OptimisticWrapOptions,
): OptimisticWrapperFunction<T>;
defaultMakeCacheKey(...args: any[]): any;
} = require('optimism'); // tslint:disable-line

This comment has been minimized.

Copy link
@bbrzoska

bbrzoska May 10, 2018

Why require instead of import + ambient type declaration file for optimism?

// we should only merge if it's an object of the same type
// otherwise, we should delete the generated object
if (typenameChanged) {
store.delete(generatedKey);
store.delete(escapedId.id);

This comment has been minimized.

Copy link
@bbrzoska

bbrzoska May 10, 2018

Should probably port this fix too:

// remove the old generated value in case the old value was
// inlined and the new value is not, which is indicated by
// the old id being generated and the new id being real
if (!generated) {
store.delete(generatedKey);
}

@benjamn benjamn force-pushed the benjamn/cache-result-objects-with-optimism branch 3 times, most recently from 35f549f to 4d5a851 May 12, 2018

public hasDepTrackingCache() {
return this.data instanceof DepTrackingCache;
}

protected broadcastWatches() {
// Skip this when silenced (like inside a transaction)
if (this.silenceBroadcast) return;

// right now, we invalidate all queries whenever anything changes

This comment has been minimized.

Copy link
@clayne11

clayne11 May 18, 2018

Contributor

Probably want to remove this comment since it's no longer true.

@jamesreggio

This comment has been minimized.

Copy link
Contributor

commented May 21, 2018

I tried to give this branch a try tonight, but ran into some issues.

First, I encountered the problem I described over here: #3300 (comment)

After patching my way around this issue, I ran into an infinite recursion bug in the merge helper function. It wasn't clear what the problem is, and reverting the changes to merge from #3300 didn't fix the issue.

I'll try to pull together a Gist with a query and payload so you can reproduce it.

@jamesreggio

This comment has been minimized.

Copy link
Contributor

commented May 21, 2018

Alright, here's a repro: https://gist.github.com/jamesreggio/eedd17511a3d64d1ba1613cbc08d78c5

It includes the original GraphQL document + variables, the parsed GraphQL document, the resulting data from the server, and the error.

I have a hunch that changes to merge in #3300 led to this issue — but it's quite onerous to cut builds of individual packages within the apollo-client repo for use in another project, so I haven't tried reverting those change wholesale to see if it resolves the problem. Perhaps you can give that a try?

@hwillson hwillson changed the title Optimize repeated apollo-cache-inmemory reads by caching partial query results. [WIP] Optimize repeated apollo-cache-inmemory reads by caching partial query results. May 29, 2018

@clayne11

This comment has been minimized.

Copy link
Contributor

commented Jun 4, 2018

Any movement on this? This is an incredibly exciting improvement.

@benjamn benjamn force-pushed the benjamn/cache-result-objects-with-optimism branch from 4d5a851 to 0cc85c1 Jun 5, 2018

benjamn added a commit that referenced this pull request Jun 5, 2018
Completely remove unnecessary ID_KEY Symbol.
Restoring non-enumerability of the ID_KEY Symbol in #3544 made ID_KEY
slightly more hidden from application code, at the cost of slightly worse
performance (because of Object.defineProperty), but tests were still
broken because Jest now includes Symbol keys when checking object equality
(even the non-enumerable ones).

Fortunately, given all the previousResult refactoring that has happened in
PR #3394, we no longer need to store ID_KEY properties at all, which
completely side-steps the question of whether ID_KEY should be enumerable
or not, and avoids any problems due to Jest including Symbol keys when
checking deep equality.

If we decide to bring this ID metadata back in the future, we could use a
WeakMap to associate result objects with their IDs, so that we can avoid
modifying the result objects.
@benjamn benjamn referenced this pull request Jun 6, 2018
3 of 3 tasks complete
benjamn added a commit that referenced this pull request Jun 6, 2018
Cache query documents transformed by InMemoryCache.
After #3444 removed `Map`-based caching for `addTypenameToDocument` (in
order to fix memory leaks), the `InMemoryCache#transformDocument` method
now creates a completely new `DocumentNode` every time it's called
(assuming this.addTypename is true, which it is by default).

This commit uses a `WeakMap` to cache calls to `addTypenameToDocument` in
`InMemoryCache#transformDocument`, so that repeated cache reads will no
longer create an unbounded number of new `DocumentNode` objects. The
benefit of the `WeakMap` is that it does not prevent its keys (the
original `DocumentNode` objects) from being garbage collected, which is
another way of preventing memory leaks.  Note that `WeakMap` may have to
be polyfilled in older browsers, but there are many options for that.

This optimization will be important for #3394, since the query document is
involved in cache keys used to store cache partial query results.

cc @hwillson @jbaxleyiii @brunorzn
benjamn added a commit that referenced this pull request Jun 6, 2018
Cache query documents transformed by InMemoryCache.
After #3444 removed `Map`-based caching for `addTypenameToDocument` (in
order to fix memory leaks), the `InMemoryCache#transformDocument` method
now creates a completely new `DocumentNode` every time it's called
(assuming this.addTypename is true, which it is by default).

This commit uses a `WeakMap` to cache calls to `addTypenameToDocument` in
`InMemoryCache#transformDocument`, so that repeated cache reads will no
longer create an unbounded number of new `DocumentNode` objects. The
benefit of the `WeakMap` is that it does not prevent its keys (the
original `DocumentNode` objects) from being garbage collected, which is
another way of preventing memory leaks.  Note that `WeakMap` may have to
be polyfilled in older browsers, but there are many options for that.

This optimization will be important for #3394, since the query document is
involved in cache keys used to store cache partial query results.

cc @hwillson @jbaxleyiii @brunorzn
benjamn added a commit that referenced this pull request Jun 6, 2018
Cache query documents transformed by InMemoryCache. (#3553)
After #3444 removed `Map`-based caching for `addTypenameToDocument` (in
order to fix memory leaks), the `InMemoryCache#transformDocument` method
now creates a completely new `DocumentNode` every time it's called
(assuming `this.addTypename` is true, which it is by default).

This commit uses a `WeakMap` to cache calls to `addTypenameToDocument` in
`InMemoryCache#transformDocument`, so that repeated cache reads will no
longer create an unbounded number of new `DocumentNode` objects. The
benefit of the `WeakMap` is that it does not prevent its keys (the
original `DocumentNode` objects) from being garbage collected, which is
another way of preventing memory leaks.  Note that `WeakMap` may have to
be polyfilled in older browsers, but there are many options for that.

This optimization will be important for #3394, since the query document is
involved in cache keys used to store cache partial query results.

cc @hwillson @jbaxleyiii @brunorzn

@benjamn benjamn force-pushed the benjamn/cache-result-objects-with-optimism branch from 0cc85c1 to 8b2ab9b Jun 6, 2018

@benjamn benjamn merged commit 128f329 into master Sep 28, 2018

7 of 8 checks passed

bundlesize ./packages/apollo-cache-inmemory/lib/bundle.min.js: 6.72KB > maxSize 5.25KB (gzip)
Details
CLA Author has signed the Meteor CLA.
Details
ci/circleci: Apollo Client Monorepo Your tests passed on CircleCI!
Details
ci/circleci: Danger Your tests passed on CircleCI!
Details
ci/circleci: Docs Your tests passed on CircleCI!
Details
ci/circleci: Filesize Your tests passed on CircleCI!
Details
codecov/project 89.19% (target 80%)
Details
deploy/netlify Deploy preview ready!
Details
benjamn added a commit that referenced this pull request Sep 28, 2018
Bump bundle size limit for apollo-cache-inmemory.
The performance gains in #3394 required changes like eliminating the
graphql-anywhere dependency in favor of a local implementation, which
naturally increased the bundle size of apollo-cache-inmemory. We are
confident that the benefits outweigh the costs.
benjamn added a commit that referenced this pull request Sep 28, 2018
@stubailo

This comment has been minimized.

Copy link
Member

commented Sep 28, 2018

Holy crap this is in!!!

@mxstbr

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2018

🎉 🎉 🎉 🎉 🎉

@audiolion

This comment has been minimized.

Copy link

commented Sep 28, 2018

Hey I am excited about this PR, but I have some concern about the write (8 observers, partial update) benchmark which is so much slower than apollo-cache-inmemory@1.2.x

Did we just trade faster read times for slower write times?

@benjamn

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2018

@audiolion Good question! I have also noticed that benchmark is surprisingly slow, and honestly I haven't had a chance to dig into it. From an initial profiling, though, I can say most of the time is spent reading from the cache, apparently without much benefit from caching, rather than writing to the cache. So the solution is likely to be making sure we're making the most of the new caching system, rather than improving write performance, though I'm sure there's still room for improvement on that front as well!

All the yellow/tan/brown stuff is reading, and just the narrow violet columns are updating the cache:

screenshot 2018-09-28 15 56 36

The light blue / green stuff at the beginning appears to be from _.merge, which is out of our control, since it's part of the benchmark itself.

@benjamn

This comment has been minimized.

Copy link
Contributor Author

commented Oct 3, 2018

Since the release, we've seen a handful of bugs that led to our removing the latest npm tag from version 1.3.0 and adding it back to 1.2.10. I'm working on fixing those bugs in #3964, which you can help test by running npm install apollo-cache-inmemory@next.

benjamn added a commit that referenced this pull request Oct 3, 2018
Work around limitations of React Native Map/Set polyfills.
As explained here, with astonishing honesty:
https://github.com/facebook/react-native/blob/98a6f19d7c41e1d4fd3c04d07bb2c4e627f21724/Libraries/vendor/core/Map.js#L44-L50

As reported most recently here:
apollographql/react-apollo#2442 (comment)

For the record, I have previously complained that these polyfills are
simply buggy and should be replaced:
#3394 (comment)

While that is still true, this workaround should moot the debate. 🐮
benjamn added a commit that referenced this pull request Oct 3, 2018
Work around limitations of React Native Map/Set polyfills.
As explained here, with astonishing honesty:
https://github.com/facebook/react-native/blob/98a6f19d7c41e1d4fd3c04d07bb2c4e627f21724/Libraries/vendor/core/Map.js#L44-L50

As reported most recently here:
apollographql/react-apollo#2442 (comment)

For the record, I have previously complained that these polyfills are
simply buggy and should be replaced:
#3394 (comment)

While that is still true, this workaround should moot the debate. 🐮
emmenko pushed a commit to commercetools/merchant-center-application-kit that referenced this pull request Oct 8, 2018
chore(package.json): revert `apollo-cache-inmemory`, lock version to …
…1.2.10 (#5557)

#### Summary

- so we discovered a bug on staging where `readFragment` returns the same object for two distinct queries (based on ID)
- When downgrading `apollo-cache-inmemory`, we could see that this is not re-producable. Tracking down the changelog lead us to this PR apollographql/apollo-client#3394
- reverting changes on #5548

This is definitely a bug on their end and we ought to follow up with a bug report.
@benjamn

This comment has been minimized.

Copy link
Contributor Author

commented Oct 8, 2018

Ok, now that #3964 has been merged, apollo-cache-inmemory@1.3.5 is now the latest version on npm again. If you previously reverted to 1.2.x, please try updating to 1.3.5 when you have the chance!

@mxstbr mxstbr deleted the benjamn/cache-result-objects-with-optimism branch Oct 9, 2018

@laukaichung

This comment has been minimized.

Copy link

commented Oct 16, 2018

Has anyone run into this problem in Linux Chromium after upgrading to 1.3.*?
#3992
If the stored object is an array, the component fails to trigger a re-render on pushing a new item to the array.

@ctretyak

This comment has been minimized.

Copy link

commented Oct 20, 2018

@benjamn Hi! If I change something in exists array's item and write it with client.writeQuery then my Query component doesn't re-render. What should I do in that situation?

benjamn added a commit that referenced this pull request Oct 20, 2018
Avoid fallible previousResult === newData.result check.
Should help fix #3992.

As #3992 (and specifically the reproduction that @OurMajesty provided at
https://codesandbox.io/s/q7rkm6y1ow) demonstrates, since the InMemoryCache
may return the same object for multiple reads (when the data have not
changed), the previousResult is often exactly (===) the same object as
newData.result, which was causing the code removed by this commit to
return early instead of broadcasting the watch.

This early return was unsafe, because the contents of the object may have
changed since they were previously broadcast, so we actually do need to
report those modifications, even though the object reference is the same.

In other words, `previousResult === newData.result` does not imply
"nothing has changed," as I mistakenly assumed in this discussion:
#3394 (comment)

In the longer term, I would like to eliminate the previousResult logic
entirely, since it seems redundant now that we can precisely track changes
to the store, but for now it's important for backwards compatibility.
Specifically, previousResult still provides a way to prefer the previous
object if it is structurally identical to the new object, though that
preference is moot when `previousResult === newData.result`.

The previousResult object should never be used as a basis for skipping
broadcasts. Only the application developer can decide whether === object
identity means it's safe to skip rendering, likely in the context of a
larger design which treats cache results as immutable data. The job of the
InMemoryCache is to make no unsound assumptions, and broadcast results
whenever they may have changed.
benjamn added a commit that referenced this pull request Mar 22, 2019
Centralize WeakMap feature detection in apollo-utilities.
Not all environments where WeakMap must be polyfilled do so reliably:
#3394 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.