Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor QueryManager to make better use of observables and enforce fetchPolicy more reliably #6221

Merged
merged 86 commits into from
May 7, 2020

Conversation

benjamn
Copy link
Member

@benjamn benjamn commented Apr 30, 2020

Fixes #5991, #5790, and multiple other related issues (full list TBD).

This is a large pull request, representing about a month and a half of work, but the central idea is relatively simple: the most important job of the QueryManager is to tirelessly enforce the requested FetchPolicy (cache-first, cache-and-network, network-only, etc.) for each query that it manages. Previously, the QueryManager sometimes failed to fulfill this mission, because the logic for interpreting FetchPolicy values was scattered across the codebase in an ad hoc manner, rather than centralized in one place.

For example, the default cache-first fetch policy should reliably trigger a network request if the relevant cache data disappears or becomes incomplete (#5991), but the code that detected that incompleteness (queryListenerForObserver) was not responsible for triggering network requests, and giving it the ability to call fetchQuery would have required yet another one-off, decentralized interpretation of options.fetchPolicy.

The scattering of FetchPolicy logic made it very difficult for anyone (including me) to understand how each fetch policy works at a glance, adding risk to any changes that involved fetch policies.

Thanks to the changes in this PR, the logic for interpreting fetch policies has been centralized into a single switch statement, which I think reads almost as clearly as pseudocode:

switch (fetchPolicy) {
  default: case "cache-first": {
    const diff = readCache();

    if (diff.complete) {
      return [
        resultsFromCache(diff, queryInfo.markReady()),
      ];
    }

    if (returnPartialData) {
      return [
        resultsFromCache(diff),
        resultsFromLink(true),
      ];
    }

    return [
      resultsFromLink(true),
    ];
  }

  case "cache-and-network": {
    const diff = readCache();

    if (diff.complete || returnPartialData) {
      return [
        resultsFromCache(diff),
        resultsFromLink(true),
      ];
    }

    return [
      resultsFromLink(true),
    ];
  }

  case "cache-only":
    return [
      resultsFromCache(readCache(), queryInfo.markReady()),
    ];

  case "network-only":
    return [resultsFromLink(true)];

  case "no-cache":
    return [resultsFromLink(false)];

  case "standby":
    return [];
}

The arrays returned by this code contain zero, one, or two observables, derived from the cache and/or the network link. The observables are concatenated together to form a single observable (specifically, a Concast observable) whose results are then consumed by the ObservableQuery (with any adjacent duplicate results skipped, as usual). At any point in time, most often when cache data has changed, or the application specifically requested a refetch, the ObservableQuery can reobserve the cache and/or the network by invoking this same code again. In other words, this switch statement is now the beating heart of the client codebase, and no data flows through the client without honoring the requested FetchPolicy.

This new system (fetchQueryObservable) is much more Observable-oriented than the previous fetchQuery method was, and I see this shift as a huge improvement. One of the reasons the QueryManager was so complicated was that the old fetchQuery method could return only a single Promise<ApolloQueryResult<T>>, even though there was often more than one result to deliver, necessitating a side-delivery mechanism (broadcastQueries). By allowing fetchQueryObservable to return multiple results using an Observable, we no longer have to worry about delivering different results using different mechanisms, since all results flow through a straightforward Observable pipeline.

Since this is such a deep refactoring, I found it impossible to keep existing tests passing during the initial exploration. Once I was happy with the shape of the new system, I resumed running and debugging the tests, and rebased any necessary code changes back into my previous commits. There was never any hope of leaving all existing tests unchanged, but I think/hope the test changes I made (see commits towards the end of this PR) are justifiable. The whole point of this refactoring was to make Apollo Client do a better job of fetching data, and "better" sometimes means "different." In any case, if you see something I changed in the tests that worries you, please leave comments/questions in-line.

The most substantial commit is Replace fetch{Query,Request} with new fetchQueryObservable method (1ac5d8d). I regret that so much ended up in that commit, but it's also a commit whose parts don't work until you put them all together, so I'm not going to waste time trying to split it up at this point. For what it's worth, everything that comes before that commit is "safe" in the sense of keeping all existing tests passing.

I will leave some comments below to draw attention to any interesting/controversial changes.

@benjamn

This comment has been minimized.

@benjamn benjamn force-pushed the fetchQueryObservable-refactor branch 2 times, most recently from e79fc31 to 28653fb Compare April 30, 2020 16:05
@benjamn

This comment has been minimized.

@3nvi
Copy link

3nvi commented Apr 30, 2020

Tried npm-installing it through the PR branch and building it locally, but had some issues. Is there a cache we can get a preview version on NPM to test it (whenever it's ready/stable)?

@benjamn
Copy link
Member Author

benjamn commented Apr 30, 2020

@3nvi I've had some luck with running npm pack in the apollo-client/dist directory after a successful npm run build, and then doing

npm install ../path/to/apollo-client/dist/apollo-client-3.0.0-beta.*.tgz

to install that packaged .tgz file locally in other projects, but I will try to get a preview release published to npm today/soon!

@benjamn
Copy link
Member Author

benjamn commented May 1, 2020

Ok, I can reproduce some of the tests failures locally if I use Node 10, which is what our CircleCI job uses. Locally I'd been using Node 12.16.1. No theory yet on what the operative difference between the Node versions might be. Fixed below!

benjamn added a commit that referenced this pull request May 1, 2020
The use of setTimeout in tests is often an invitation to race conditions
and timing-sensitive outcomes.

This raciness became particularly problematic thanks to changes (between
Node 10 and Node 12) in the timing of Promise callbacks with respect to
setTimeout callbacks.

These changes also leave all tests passing when cherry-picked onto master,
so I'm confident I am *not* changing the tests just to fix my PR #6221, at
the expense of backwards compatibility.
src/react/data/QueryData.ts Outdated Show resolved Hide resolved
@benjamn benjamn marked this pull request as ready for review May 1, 2020 18:24
@darkbasic
Copy link

@benjamn I suggest using multiple node versions on CI, one per major release.

benjamn added a commit that referenced this pull request May 4, 2020
The use of setTimeout in tests is often an invitation to race conditions
and timing-sensitive outcomes.

This raciness became particularly problematic thanks to changes (between
Node 10 and Node 12) in the timing of Promise callbacks with respect to
setTimeout callbacks.

These changes also leave all tests passing when cherry-picked onto master,
so I'm confident I am *not* changing the tests just to fix my PR #6221, at
the expense of backwards compatibility.
@benjamn benjamn force-pushed the fetchQueryObservable-refactor branch from a6bc761 to 913e4d9 Compare May 4, 2020 16:37
benjamn added a commit that referenced this pull request May 4, 2020
The use of setTimeout in tests is often an invitation to race conditions
and timing-sensitive outcomes.

This raciness became particularly problematic thanks to changes (between
Node 10 and Node 12) in the timing of Promise callbacks with respect to
setTimeout callbacks.

These changes also leave all tests passing when cherry-picked onto master,
so I'm confident I am *not* changing the tests just to fix my PR #6221, at
the expense of backwards compatibility.
@benjamn benjamn force-pushed the fetchQueryObservable-refactor branch from 913e4d9 to dd86a71 Compare May 4, 2020 16:49
benjamn added a commit that referenced this pull request May 4, 2020
The use of setTimeout in tests is often an invitation to race conditions
and timing-sensitive outcomes.

This raciness became particularly problematic thanks to changes (between
Node 10 and Node 12) in the timing of Promise callbacks with respect to
setTimeout callbacks.

These changes also leave all tests passing when cherry-picked onto master,
so I'm confident I am *not* changing the tests just to fix my PR #6221, at
the expense of backwards compatibility.
@benjamn benjamn force-pushed the fetchQueryObservable-refactor branch from fefbe5d to 114566a Compare May 4, 2020 18:25
benjamn added a commit that referenced this pull request May 4, 2020
Now that the ObservableQuery consumes results exclusively through
QueryManager#fetchQueryObservable, we can rely on the addExportedVariables
call that happens there, and avoid potentially re-triggering a network
request while consuming network results.

@hwillson I was a bit surprised this change didn't cause any tests to
fail, but several tests did fail when I cherry-picked it onto master, so
I'm optimistic this addExportedVariables logic is genuinely redundant
after my refactoring PR #6221.
benjamn added a commit that referenced this pull request May 5, 2020
This version does *not* include my big PR #6221, which should be coming in
the _next_ beta release. Code changes in beta.45 (since beta.44) include

  * #6194 by @durchanek
  * #6107 by @hwillson

as well as several documentation PRs.
benjamn added a commit that referenced this pull request May 5, 2020
The use of setTimeout in tests is often an invitation to race conditions
and timing-sensitive outcomes.

This raciness became particularly problematic thanks to changes (between
Node 10 and Node 12) in the timing of Promise callbacks with respect to
setTimeout callbacks.

These changes also leave all tests passing when cherry-picked onto master,
so I'm confident I am *not* changing the tests just to fix my PR #6221, at
the expense of backwards compatibility.
benjamn added a commit that referenced this pull request May 7, 2020
Includes major fetchPolicy refactoring PR: #6221

Please help test this version if you can, because it changes/improves some
fundamental client behaviors.
benjamn added a commit that referenced this pull request May 16, 2020
The `cache.modify` API was first introduced in #5909 as a quick way to
transform the values of specific existing fields in the cache.

At the time, `cache.modify` seemed promising as an alternative to the
`readQuery`-transform-`writeQuery` pattern, but feedback has been mixed,
most importantly from our developer experience team, who helped me
understand why `cache.modify` would be difficult to learn and to teach.

While my refactoring in #6221 addressed concerns about broadcasting
`cache.modify` updates automatically, the bigger problem with
`cache.modify` is simply that it requires knowledge of the internal
workings of the cache, making it nearly impossible to explain to
developers who are not already Apollo Client 3 experts.

As much as I wanted `cache.modify` to be a selling point for Apollo Client
3, it simply wasn't safe to use without a firm understanding of concepts
like cache normalization, references, field identity, read/merge functions
and their options API, and the `options.toReference(object, true)` helper
function (#5970).

By contrast, the `readQuery`-transform-`writeQuery` pattern may be a bit
more verbose, but it has none of these problems, because these older
methods work in terms of GraphQL result objects, rather than exposing the
internal data format of the `EntityStore`.

Since `cache.modify` was motivated to some extent by vague power-user
performance concerns, it's worth noting that we recently made `writeQuery`
and `writeFragment` even more efficient when rewriting unchanged results
(#6274), so whatever performance gap there might have been between
`cache.modify` and `readQuery`/`writeQuery` should now be even less
noticeable.

One final reason that `cache.modify` seemed like a good idea was that
custom `merge` functions can interfere with certain `writeQuery` patterns,
such as deleting an item from a paginated list. If you run into trouble
with `merge` functions running when you don't want them to, we recommend
calling `cache.evict` before `cache.writeQuery` to ensure your `merge`
function won't be confused by existing field data when you write your
modified data back into the cache.
benjamn added a commit that referenced this pull request May 18, 2020
The `cache.modify` API was first introduced in #5909 as a quick way to
transform the values of specific existing fields in the cache.

At the time, `cache.modify` seemed promising as an alternative to the
`readQuery`-transform-`writeQuery` pattern, but feedback has been mixed,
most importantly from our developer experience team, who helped me
understand why `cache.modify` would be difficult to learn and to teach.

While my refactoring in #6221 addressed concerns about broadcasting
`cache.modify` updates automatically, the bigger problem with
`cache.modify` is simply that it requires knowledge of the internal
workings of the cache, making it nearly impossible to explain to
developers who are not already Apollo Client 3 experts.

As much as I wanted `cache.modify` to be a selling point for Apollo Client
3, it simply wasn't safe to use without a firm understanding of concepts
like cache normalization, references, field identity, read/merge functions
and their options API, and the `options.toReference(object, true)` helper
function (#5970).

By contrast, the `readQuery`-transform-`writeQuery` pattern may be a bit
more verbose, but it has none of these problems, because these older
methods work in terms of GraphQL result objects, rather than exposing the
internal data format of the `EntityStore`.

Since `cache.modify` was motivated to some extent by vague power-user
performance concerns, it's worth noting that we recently made `writeQuery`
and `writeFragment` even more efficient when rewriting unchanged results
(#6274), so whatever performance gap there might have been between
`cache.modify` and `readQuery`/`writeQuery` should now be even less
noticeable.

One final reason that `cache.modify` seemed like a good idea was that
custom `merge` functions can interfere with certain `writeQuery` patterns,
such as deleting an item from a paginated list. If you run into trouble
with `merge` functions running when you don't want them to, we recommend
calling `cache.evict` before `cache.writeQuery` to ensure your `merge`
function won't be confused by existing field data when you write your
modified data back into the cache.
benjamn added a commit that referenced this pull request May 18, 2020
The `cache.modify` API was first introduced in #5909 as a quick way to
transform the values of specific existing fields in the cache.

At the time, `cache.modify` seemed promising as an alternative to the
`readQuery`-transform-`writeQuery` pattern, but feedback has been mixed,
most importantly from our developer experience team, who helped me
understand why `cache.modify` would be difficult to learn and to teach.

While my refactoring in #6221 addressed concerns about broadcasting
`cache.modify` updates automatically, the bigger problem with
`cache.modify` is simply that it requires knowledge of the internal
workings of the cache, making it nearly impossible to explain to
developers who are not already Apollo Client 3 experts.

As much as I wanted `cache.modify` to be a selling point for Apollo Client
3, it simply wasn't safe to use without a firm understanding of concepts
like cache normalization, references, field identity, read/merge functions
and their options API, and the `options.toReference(object, true)` helper
function (#5970).

By contrast, the `readQuery`-transform-`writeQuery` pattern may be a bit
more verbose, but it has none of these problems, because these older
methods work in terms of GraphQL result objects, rather than exposing the
internal data format of the `EntityStore`.

Since cache.modify was motivated to some extent by vague power-user
performance concerns, it's worth noting that we recently made writeQuery
and writeFragment even more efficient when rewriting unchanged results (#6274),
so whatever performance gap there might have been between cache.modify
and readQuery/writeQuery should now be even less noticeable.

One final reason that `cache.modify` seemed like a good idea was that
custom `merge` functions can interfere with certain `writeQuery` patterns,
such as deleting an item from a paginated list. If you run into trouble
with `merge` functions running when you don't want them to, we recommend
calling `cache.evict` before `cache.writeQuery` to ensure your `merge`
function won't be confused by existing field data when you write your
modified data back into the cache.
benjamn added a commit that referenced this pull request Jun 10, 2020
This fixes an issue found by @darkbasic while working with optimistic
updates: #6183 (comment)

The cache-first FetchPolicy is important not just because it's the default
policy, but also because both cache-and-network and network-only turn into
cache-first after the first network request (#6353).

Once the cache-first policy is in effect for a query, any changes to the
cache that cause the query to begin reading incomplete data will generally
trigger a network request, thanks to (#6221).

However, if the source of the changes is an optimistic update for a
mutation, it seems reasonable to avoid the network request during the
mutation, since there's a good chance the incompleteness of the optimistic
data is only temporary, and the client might read a complete result after
the optimistic update is rolled back, removing the need to do a network
request.

Of course, if the non-optimistic read following the rollback is
incomplete, a network request will be triggered, so skipping the network
request during optimistic updates does not mean ignoring legitimate
incompleteness forever.

Note: we already avoid sending network requests for queries that are
currently in flight, but in this case it's the mutation that's in flight,
so this commit introduces a way to prevent other affected queries (which
are not currently in flight, themselves) from hitting the network.
benjamn added a commit that referenced this pull request Jun 16, 2020
Ever since the big refactoring in #6221 made the client more aggressive
about triggering network requests in response to incomplete cache data
(when using a cache-first FetchPolicy), folks testing the betas/RCs have
reported that feuding queries can end up triggering an endless loop of
unhelpful network requests.

This change does not solve the more general problem of queries competing
with each other to update the same data in incompatible ways (see #6372
for mitigation strategies), but I believe this commit will effectively put
a stop to most cases of endless network requests.

See my lengthy implementation comment for more details, since this is a
non-obvious solution to a very tricky problem.

Fixes #6307.
Fixes #6370.
Fixes #6434.
Fixes #6444.
benjamn added a commit that referenced this pull request Jun 16, 2020
…6448)

Ever since the big refactoring in #6221 made the client more aggressive
about triggering network requests in response to incomplete cache data
(when using a cache-first FetchPolicy), folks testing the betas/RCs have
reported that feuding queries can end up triggering an endless loop of
unhelpful network requests.

This change does not solve the more general problem of queries competing
with each other to update the same data in incompatible ways (see #6372
for mitigation strategies), but I believe this commit will effectively put
a stop to most cases of endless network requests.

See my lengthy implementation comment for more details, since this is a
non-obvious solution to a very tricky problem.

Fixes #6307.
Fixes #6370.
Fixes #6434.
Fixes #6444.
benjamn added a commit that referenced this pull request Jul 13, 2020
Should fix #6354, #6542, #6534, and #6459.

Reliable delivery of loading results is one of the core benefits that
Apollo Client strives to provide, expecially since handling loading states
tends to be such an annoying, error-prone task in hand-written state
management code, and Apollo Client is all about keeping hand-written state
management code to a minimum.

Ever since I refactored the FetchPolicy system in #6221, the fetchMore
method of ObservableQuery has not been delivering a loading state before
sending its request. Instead, the observed query was updated only once,
after the completion of the fetchMore network request. I've been aware of
this problem for a while now, but I procrastinated solving it because I
knew it would be, well, annoying. With the final release of AC3 right
around the corner (Tuesday!), the time has come to get this right.

This loading result doesn't fit neatly into the fetchQueryObservable
system introduced by #6221, since the consumer of the loading result is
the original ObservableQuery, but the network request gets made with a
fresh query ID, using different variables (and possibly even a different
query) passed to fetchMore. This separation is important so that we don't
have to change the query/variables/options of the original ObservableQuery
for the duration of the fetchMore request. Instead, the fetchMore request
is a one-off, independent request that effectively bypasses the usual
FetchPolicy system (technically, it always uses the no-cache FetchPolicy).

In Apollo Client 2.x (and before #6221 was released in beta.46), this
logic was guided by an extra fetchMoreForQueryId parameter passed to
QueryManager#fetchQuery, which ended up appearing in a number (16) of
different places, across three different methods of the QueryManager. I
think it's an improvement that the logic is now confined to one block of
code in ObservableQuery#fetchMore, which seems naturally responsible for
any fetchMore-related logic.

Still, I don't love the precedent that this simulated loading state sets,
so I hope we can avoid similar hacks in the future.
benjamn added a commit that referenced this pull request Jul 13, 2020
Should fix #6354, #6542, #6534, and #6459.

Reliable delivery of loading results is one of the core benefits that
Apollo Client strives to provide, especially since handling loading states
tends to be such an annoying, error-prone task in hand-written state
management code, and Apollo Client is all about keeping hand-written state
management code to a minimum.

Ever since I refactored the FetchPolicy system in #6221, the fetchMore
method of ObservableQuery has not been delivering a loading state before
sending its request. Instead, the observed query was updated only once,
after the completion of the fetchMore network request. I've been aware of
this problem for a while now, but I procrastinated solving it because I
knew it would be, well, annoying. With the final release of AC3 right
around the corner (Tuesday!), the time has come to get this right.

This loading result doesn't fit neatly into the fetchQueryObservable
system introduced by #6221, since the consumer of the loading result is
the original ObservableQuery, but the network request gets made with a
fresh query ID, using different variables (and possibly even a different
query) passed to fetchMore. This separation is important so that we don't
have to change the query/variables/options of the original ObservableQuery
for the duration of the fetchMore request. Instead, the fetchMore request
is a one-off, independent request that effectively bypasses the usual
FetchPolicy system (technically, it always uses the no-cache FetchPolicy).

In Apollo Client 2.x (and before #6221 was released in beta.46), this
logic was guided by an extra fetchMoreForQueryId parameter passed to
QueryManager#fetchQuery, which ended up appearing in a number (16) of
different places, across three different methods of the QueryManager. I
think it's an improvement that the logic is now confined to one block of
code in ObservableQuery#fetchMore, which seems naturally responsible for
any fetchMore-related logic.

Still, I don't love the precedent that this simulated loading state sets,
so I hope we can avoid similar hacks in the future.
benjamn added a commit that referenced this pull request Jul 13, 2020
Should fix #6354, #6542, #6534, and #6459.

Reliable delivery of loading results is one of the core benefits that
Apollo Client strives to provide, especially since handling loading states
tends to be such an annoying, error-prone task in hand-written state
management code, and Apollo Client is all about keeping hand-written state
management code to a minimum.

Ever since I refactored the FetchPolicy system in #6221, the fetchMore
method of ObservableQuery has not been delivering a loading state before
sending its request. Instead, the observed query was updated only once,
after the completion of the fetchMore network request. I've been aware of
this problem for a while now, but I procrastinated solving it because I
knew it would be, well, annoying. With the final release of AC3 right
around the corner (Tuesday!), the time has come to get this right.

This loading result doesn't fit neatly into the fetchQueryObservable
system introduced by #6221, since the consumer of the loading result is
the original ObservableQuery, but the network request gets made with a
fresh query ID, using different variables (and possibly even a different
query) passed to fetchMore. This separation is important so that we don't
have to change the query/variables/options of the original ObservableQuery
for the duration of the fetchMore request. Instead, the fetchMore request
is a one-off, independent request that effectively bypasses the usual
FetchPolicy system (technically, it always uses the no-cache FetchPolicy).

In Apollo Client 2.x (and before #6221 was released in beta.46), this
logic was guided by an extra fetchMoreForQueryId parameter passed to
QueryManager#fetchQuery, which ended up appearing in a number (16) of
different places, across three different methods of the QueryManager. I
think it's an improvement that the logic is now confined to one block of
code in ObservableQuery#fetchMore, which seems naturally responsible for
any fetchMore-related logic.
benjamn added a commit that referenced this pull request Aug 17, 2020
PR #6221 began enforcing fetch policies more consistently/literally in
response to cache updates (so we do not accidentally skip network requests
that are required by policy).

That consistency has been valuable, but there are some exceptions where we
want to deliver a single result from the cache, but we do not want to use
the fetchQueryObservable system to deliver it, because we do not want the
delivery to have the side effect of triggering a network request under any
circumstances, regardless of the fetch policy.

For example, we want to deliver a { loading: true, networkStatus:
NetworkStatus.fetchMore } result for the original query at the beginning
of a fetchMore request (see #6583), but that one-off result should not
cause the original query to fire a network request if the cache data
happens to be incomplete, because that would likely interfere with the
fetchMore network request.

At the time I implemented #6583, it was the only exception that I knew of,
so I kept things simple and fetchMore-specific. I've since found another
example (not fetchMore-related this time), so I think it's time to make
this pattern more official. Hence ObservableQuery#observe, whose name is
meant to draw a contrast with the ObservableQuery#reobserve method. The
observe method simply delivers the latest result known to the
ObservableQuery, reading from the cache if necessary, whereas reobserve
reapplies the chosen fetch policy, possibly making network requests.
benjamn added a commit that referenced this pull request Aug 17, 2020
This essentially undoes commit b239124,
which was introduced as part of PR #6221. I remember thinking, as I made
that commit, "This change shouldn't be necessary, but it seems harmless
enough." As it turns out, changing these tests was not necessary, but was
instead a symptom of overreacting to the delivery of optimistic mutation
results. I'm happy to be putting things back the way they were.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AC3: Missing Field Silently Prevents Entire Query From Updating
6 participants