Eliminate "generated" cache IDs to avoid normalizing objects without meaningful IDs. #5146

benjamn · 2019-08-07T21:23:01Z

As part of its normalization strategy, the InMemoryCache has historically generated fake IDs for unidentified objects, based on the object's path within the query, such as ROOT_QUERY.books.0.author. These fake IDs would appear alongside actual entity IDs (such as those returned by dataIdFromObject) in the normalization map. Whenever an unidentified object was about to be replaced by an identified entity, the existing data would (sometimes) be merged with the actual entity data, and the generated object would be deleted from the cache.

Although this implementation strategy made some things easier—because we could assume any object had an ID, even if it wasn't really a proper entity—it also increased the size of the normalized cache and worsened its performance, thanks to the indirection of the fake IDs.

A more natural alternative would be simply to store unidentified data (arrays, objects without IDs, whatever) within its parent object, similar to scalar field values, without any normalization. In this strategy, the cache uses ID references only to refer to normalized entity objects, rather than abusing IDs to store generic data.

This was a significant refactoring, and I took extreme care to split the work up into meaningful commits, with apollo-cache-inmemory tests passing at every step of the way (with some tweaks, of course). I was not always confident that such a path existed, yet here it is. The end result is a much more compact and understandable internal representation of normalized data in the cache, with all the same test coverage we had before. If I'd rewritten this code from scratch, I would also have needed to write a ton of new tests, with little intuition about backwards compatibility.

On top of all that, this PR reduces the size of the apollo-cache-inmemory package by a few hundred bytes, and paves the way for more configurable normalization logic, thanks to commits like feb9f36 and c87f447.

This will be a backwards-incompatible change if your code depends on the precise internal representation of normalized data in the cache. The use of Reference instances instead of IdValue objects (0af6233) may also complicate JSON serialization, though I have some ideas for how to serialize custom data types without imposing new requirements on persistence APIs like IndexedDB or localStorage. See 01e0cc5 where the Reference class was replaced with a { __ref: string } interface type.

We will need to figure out how to serialize and deserialize these Reference instances, but that can come later.

According to the removed comment in writeToStore.ts, the only reason for escaping arbitrary opaque data as JSON was to avoid potential confusion with IdValue objects, but now that we use makeReference and isReference everywhere, there is no risk of that confusion, so we can just store the data directly.

…tore.

…Store.

The tests changed in this commit were mistaken, and have been mistaken for a long time, because they used generated IDs that did not start with a $ character. Ultimately, we want to eliminate the concept of generated IDs, but it's worth fixing existing tests in the meantime.

This paves the way for much more sophisticated cache reconciliation logic, configurable on a per-type, per-field basis.

These changes really show off the improvement that comes from inlining non-entity data, rather than generating fake IDs for such data.

packages/apollo-client/src/__tests__/ApolloClient.ts

After originally writing this test, I moved the shallow copying logic inside this.merge, which now always copies its first argument (unless that argument is a previous copy, in which case it remains untouched).

jbaxleyiii

@benjamn it is incredibly exciting to see that there is a concrete path forward towards our new store design within the existing test suite and codebase. The work here is excellent and I suspect will clean up an entire class of 1) store bugs and 2) performance bottlenecks especially for teams with lots of data that doesn't need normalization

packages/apollo-cache-inmemory/src/readFromStore.ts

jbaxleyiii · 2019-08-07T23:54:15Z

packages/apollo-cache-inmemory/src/readFromStore.ts

+    let typename: string;
+    if (isReference(objectOrReference)) {
+      object = execContext.contextValue.store.get(objectOrReference.id);
+      typename =


I don't know if we currently handle this case, but the root query type doesn't have to have a typename called Query. So object.__typename could be something like RootQuery here. I guess it doesn't matter given we force it to be called Query at the root, but it does mean that further updates to the root type probably won't be tracked:

schema { query: RootQuery } type RootQuery { siteName: String } type Mutation { updateSiteName(siteName: String): RootQuery } query GetSiteName { siteName } # then later mutation updateSiteName { updateSiteName(name: "Ben's awesome site") { __typename siteName } }

The mutation would return a payload here of { data: { updateSiteName: { __typename: "RootQuery", siteName: "Ben's awesome site" } } } which should ideally overwrite the original store data of { data: { __typename: "RootQuery", siteName: "Ben's site" } } but I don't think this behaviour is supported?

This may be out of the scope of this PR (most likely is!) but if we are adjusting our normalization strategy to try and eliminate non entity normalization (YAY), the root query still feels like an interesting point since it is an entity by some regards (always the entry point so fields are stable to that object) but not in others, not referenceable by a set of primary keys, only the fact that there can only be one root "Query" type.

I agree that we should get this right. If someone names the root query RootQuery and then wants to define per-field policies for that type, they should be able to call it RootQuery and not Query in their configuration.

I'll see if I can remove the Query assumption, though I suspect we will need to start adding the __typename field to the root query fields, like we do for nested selection sets.

@jbaxleyiii After playing with this a bit, I think I'd like to split it out into a separate PR, building on this one.

packages/apollo-cache-inmemory/src/readFromStore.ts

packages/apollo-cache-inmemory/src/references.ts

packages/apollo-cache-inmemory/src/writeToStore.ts

packages/apollo-client/src/__tests__/client.ts

#5146 (comment)

Although the Reference class was convenient within a single runtime, it posed some unnecessary challenges for serialization and deserialization.

hwillson

@benjamn This all looks incredible! I've been testing things out in a few sample apps, throwing a few curve balls here and there, and everything is working as expected. And wow, does this ever make a difference when inspecting cache contents! A big 👍 from me - thanks for tackling this!

@jbaxleyiii

As @jbaxleyiii pointed out in this comment, the root query and mutation types do not necessarily have to be called "Query" or "Mutation", and the only way to find their real names is to ask for the __typename property: #5146 (comment)

These changes really show off the improvement that comes from inlining non-entity data, rather than generating fake IDs for such data.

#5146 (comment)

benjamn added 20 commits August 7, 2019 13:31

Use Reference instances instead of duck-typing IdValue objects.

0af6233

We will need to figure out how to serialize and deserialize these Reference instances, but that can come later.

Batch-update StoreObject fields in writeSelectionSetToStore.

e6d6b88

Pass storeObject into writeFieldToStore.

dad47a3

Move bulk of writeSelectionSetToStore into processSelectionSet.

654e6c7

Pass storeObject into processSelectionSet.

80c1cdb

Stop calling writeSelectionSetToStore from processSelectionSet.

9dcb034

Stop using dataId in processSelectionSet except to call writeFieldToS…

d2f1fc5

…tore.

Move generated ID logic from writeFieldToStore to writeSelectionSetTo…

10f5012

…Store.

Move bulk of writeFieldToStore into processFieldValue method.

b6ac62c

Inline private writeFieldToStore method.

6a1aeae

Stop prepending $ to generated cache IDs.

b64d253

Allow reading non-IdValue objects from the cache.

62ba47d

Unpack first element of arrays when asserting selectionSet.

26d9d75

Use defaultDataIdFromObject when calling writeResultToStore directly.

085b43a

Remove the concept of generated IDs from the cache.

315d510

Disallow creating generated Reference instances.

78aaa59

Make mergeDeep support custom reconciliation functions.

feb9f36

Reimplement mergeStoreObjects using DeepMerger reconciliation function.

c87f447

This paves the way for much more sophisticated cache reconciliation logic, configurable on a per-type, per-field basis.

benjamn added 🐎 performance 👩‍🏭 refactor 📦 bundle size 🧞‍♂️ enhancement labels Aug 7, 2019

benjamn added this to the Release 3.0 milestone Aug 7, 2019

benjamn requested review from hwillson and jbaxleyiii August 7, 2019 21:23

benjamn self-assigned this Aug 7, 2019

benjamn added a commit that referenced this pull request Aug 7, 2019

Update Jest snapshots in apollo-client tests after PR #5146.

b7554e0

These changes really show off the improvement that comes from inlining non-entity data, rather than generating fake IDs for such data.

benjamn mentioned this pull request Aug 7, 2019

Release 3.0 #5116

Merged

31 tasks

benjamn added 5 commits August 7, 2019 19:03

Export makeReference and isReference from apollo-cache-inmemory.

58f297b

Eliminate redundant StoreWriter#writeResultToStore method.

8c7e1a4

Simplify isDataProcessed logic using Set<FieldNode>.

753a606

Fix apollo-client tests that depend on internal cache representation.

906c4bf

Update Jest snapshots in apollo-client tests after PR #5146.

3adb970

These changes really show off the improvement that comes from inlining non-entity data, rather than generating fake IDs for such data.

benjamn force-pushed the eliminate-generated-cache-ids branch from b7554e0 to 3adb970 Compare August 7, 2019 23:16

benjamn commented Aug 7, 2019

View reviewed changes

packages/apollo-client/src/__tests__/ApolloClient.ts Show resolved Hide resolved

benjamn added 2 commits August 7, 2019 19:39

Tighten bundle size limits following recent improvements.

75526a3

Stop calling shallowCopyForMerge in DeepMerger test.

29699ce

After originally writing this test, I moved the shallow copying logic inside this.merge, which now always copies its first argument (unless that argument is a previous copy, in which case it remains untouched).

jbaxleyiii reviewed Aug 8, 2019

View reviewed changes

benjamn added 8 commits August 8, 2019 12:49

Remove dataIdFromObject test that deliberately sidesteps TypeScript.

5ce3339

#5146 (comment)

Simplify Array.isArray case in processFieldValue.

ada838c

#5146 (comment)

Make better guarantees in assertSelectionSetForIdValue.

a263eef

#5146 (comment)

More (non-throwing) tests of assertSelectionSetForIdValue.

3c6b155

Make entity references trivially (de)serializable.

01e0cc5

Although the Reference class was convenient within a single runtime, it posed some unnecessary challenges for serialization and deserialization.

Fix apollo-client tests after Reference => { __ref } refactor.

a5ca898

Derive __typename from result objects more reliably.

1376882

Increase bundle size limit for apollo-utilities from 4.1 to 4.25KB.

8a2e678

hwillson approved these changes Aug 9, 2019

View reviewed changes

benjamn added 2 commits August 9, 2019 10:46

Rename references.ts to helpers.ts.

db9537a

Remove unused assertIdValue function.

054eb7b

benjamn merged commit 18c266d into release-3.0 Aug 9, 2019

benjamn mentioned this pull request Aug 9, 2019

Assume addTypename:true, but hide implicit __typename result fields. #5154

Closed

StephenBarlow pushed a commit that referenced this pull request Oct 1, 2019

Update Jest snapshots in apollo-client tests after PR #5146.

229fa92

These changes really show off the improvement that comes from inlining non-entity data, rather than generating fake IDs for such data.

StephenBarlow pushed a commit that referenced this pull request Oct 1, 2019

Remove dataIdFromObject test that deliberately sidesteps TypeScript.

f8df7a0

#5146 (comment)

StephenBarlow pushed a commit that referenced this pull request Oct 1, 2019

Simplify Array.isArray case in processFieldValue.

0de0f77

#5146 (comment)

StephenBarlow pushed a commit that referenced this pull request Oct 1, 2019

Make better guarantees in assertSelectionSetForIdValue.

3d5e43d

#5146 (comment)

michael-land mentioned this pull request Apr 10, 2020

3.0.0-beta.43 Missing cache result fields warning, cache not updated after mutation #6136

Closed

github-actions bot locked as resolved and limited conversation to collaborators Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate "generated" cache IDs to avoid normalizing objects without meaningful IDs. #5146

Eliminate "generated" cache IDs to avoid normalizing objects without meaningful IDs. #5146

benjamn commented Aug 7, 2019 •

edited

Loading

jbaxleyiii left a comment

jbaxleyiii Aug 7, 2019

benjamn Aug 8, 2019

benjamn Aug 9, 2019

hwillson left a comment

Eliminate "generated" cache IDs to avoid normalizing objects without meaningful IDs. #5146

Eliminate "generated" cache IDs to avoid normalizing objects without meaningful IDs. #5146

Conversation

benjamn commented Aug 7, 2019 • edited Loading

jbaxleyiii left a comment

Choose a reason for hiding this comment

jbaxleyiii Aug 7, 2019

Choose a reason for hiding this comment

benjamn Aug 8, 2019

Choose a reason for hiding this comment

benjamn Aug 9, 2019

Choose a reason for hiding this comment

hwillson left a comment

Choose a reason for hiding this comment

benjamn commented Aug 7, 2019 •

edited

Loading