Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate "generated" cache IDs to avoid normalizing objects without meaningful IDs. #5146

Merged
merged 38 commits into from
Aug 9, 2019

Conversation

benjamn
Copy link
Member

@benjamn benjamn commented Aug 7, 2019

As part of its normalization strategy, the InMemoryCache has historically generated fake IDs for unidentified objects, based on the object's path within the query, such as ROOT_QUERY.books.0.author. These fake IDs would appear alongside actual entity IDs (such as those returned by dataIdFromObject) in the normalization map. Whenever an unidentified object was about to be replaced by an identified entity, the existing data would (sometimes) be merged with the actual entity data, and the generated object would be deleted from the cache.

Although this implementation strategy made some things easier—because we could assume any object had an ID, even if it wasn't really a proper entity—it also increased the size of the normalized cache and worsened its performance, thanks to the indirection of the fake IDs.

A more natural alternative would be simply to store unidentified data (arrays, objects without IDs, whatever) within its parent object, similar to scalar field values, without any normalization. In this strategy, the cache uses ID references only to refer to normalized entity objects, rather than abusing IDs to store generic data.

This was a significant refactoring, and I took extreme care to split the work up into meaningful commits, with apollo-cache-inmemory tests passing at every step of the way (with some tweaks, of course). I was not always confident that such a path existed, yet here it is. The end result is a much more compact and understandable internal representation of normalized data in the cache, with all the same test coverage we had before. If I'd rewritten this code from scratch, I would also have needed to write a ton of new tests, with little intuition about backwards compatibility.

On top of all that, this PR reduces the size of the apollo-cache-inmemory package by a few hundred bytes, and paves the way for more configurable normalization logic, thanks to commits like feb9f36 and c87f447.

This will be a backwards-incompatible change if your code depends on the precise internal representation of normalized data in the cache. The use of Reference instances instead of IdValue objects (0af6233) may also complicate JSON serialization, though I have some ideas for how to serialize custom data types without imposing new requirements on persistence APIs like IndexedDB or localStorage. See 01e0cc5 where the Reference class was replaced with a { __ref: string } interface type.

We will need to figure out how to serialize and deserialize these
Reference instances, but that can come later.
According to the removed comment in writeToStore.ts, the only reason for
escaping arbitrary opaque data as JSON was to avoid potential confusion
with IdValue objects, but now that we use makeReference and isReference
everywhere, there is no risk of that confusion, so we can just store the
data directly.
The tests changed in this commit were mistaken, and have been mistaken for
a long time, because they used generated IDs that did not start with a $
character. Ultimately, we want to eliminate the concept of generated IDs,
but it's worth fixing existing tests in the meantime.
This paves the way for much more sophisticated cache reconciliation logic,
configurable on a per-type, per-field basis.
@benjamn benjamn added this to the Release 3.0 milestone Aug 7, 2019
@benjamn benjamn self-assigned this Aug 7, 2019
benjamn added a commit that referenced this pull request Aug 7, 2019
These changes really show off the improvement that comes from inlining
non-entity data, rather than generating fake IDs for such data.
@benjamn benjamn mentioned this pull request Aug 7, 2019
31 tasks
@benjamn benjamn force-pushed the eliminate-generated-cache-ids branch from b7554e0 to 3adb970 Compare August 7, 2019 23:16
After originally writing this test, I moved the shallow copying logic
inside this.merge, which now always copies its first argument (unless that
argument is a previous copy, in which case it remains untouched).
Copy link
Contributor

@jbaxleyiii jbaxleyiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamn it is incredibly exciting to see that there is a concrete path forward towards our new store design within the existing test suite and codebase. The work here is excellent and I suspect will clean up an entire class of 1) store bugs and 2) performance bottlenecks especially for teams with lots of data that doesn't need normalization

let typename: string;
if (isReference(objectOrReference)) {
object = execContext.contextValue.store.get(objectOrReference.id);
typename =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we currently handle this case, but the root query type doesn't have to have a typename called Query. So object.__typename could be something like RootQuery here. I guess it doesn't matter given we force it to be called Query at the root, but it does mean that further updates to the root type probably won't be tracked:

schema {
  query: RootQuery 
}

type RootQuery {
  siteName: String
}

type Mutation {
  updateSiteName(siteName: String): RootQuery 
}

query GetSiteName {
  siteName
}

# then later
mutation updateSiteName {
  updateSiteName(name: "Ben's awesome site") {
    __typename
    siteName
  }
}

The mutation would return a payload here of { data: { updateSiteName: { __typename: "RootQuery", siteName: "Ben's awesome site" } } } which should ideally overwrite the original store data of { data: { __typename: "RootQuery", siteName: "Ben's site" } } but I don't think this behaviour is supported?

This may be out of the scope of this PR (most likely is!) but if we are adjusting our normalization strategy to try and eliminate non entity normalization (YAY), the root query still feels like an interesting point since it is an entity by some regards (always the entry point so fields are stable to that object) but not in others, not referenceable by a set of primary keys, only the fact that there can only be one root "Query" type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should get this right. If someone names the root query RootQuery and then wants to define per-field policies for that type, they should be able to call it RootQuery and not Query in their configuration.

I'll see if I can remove the Query assumption, though I suspect we will need to start adding the __typename field to the root query fields, like we do for nested selection sets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbaxleyiii After playing with this a bit, I think I'd like to split it out into a separate PR, building on this one.

packages/apollo-cache-inmemory/src/readFromStore.ts Outdated Show resolved Hide resolved
packages/apollo-cache-inmemory/src/references.ts Outdated Show resolved Hide resolved
packages/apollo-cache-inmemory/src/references.ts Outdated Show resolved Hide resolved
packages/apollo-cache-inmemory/src/writeToStore.ts Outdated Show resolved Hide resolved
packages/apollo-cache-inmemory/src/writeToStore.ts Outdated Show resolved Hide resolved
Copy link
Member

@hwillson hwillson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamn This all looks incredible! I've been testing things out in a few sample apps, throwing a few curve balls here and there, and everything is working as expected. And wow, does this ever make a difference when inspecting cache contents! A big 👍 from me - thanks for tackling this!

@benjamn benjamn merged commit 18c266d into release-3.0 Aug 9, 2019
benjamn added a commit that referenced this pull request Aug 9, 2019
As @jbaxleyiii pointed out in this comment, the root query and mutation
types do not necessarily have to be called "Query" or "Mutation", and the
only way to find their real names is to ask for the __typename property:
#5146 (comment)
StephenBarlow pushed a commit that referenced this pull request Oct 1, 2019
These changes really show off the improvement that comes from inlining
non-entity data, rather than generating fake IDs for such data.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants