Refactor QueryManager to make better use of observables and enforce fetchPolicy more reliably #6221

benjamn · 2020-04-30T15:53:48Z

Fixes #5991, #5790, and multiple other related issues (full list TBD).

This is a large pull request, representing about a month and a half of work, but the central idea is relatively simple: the most important job of the QueryManager is to tirelessly enforce the requested FetchPolicy (cache-first, cache-and-network, network-only, etc.) for each query that it manages. Previously, the QueryManager sometimes failed to fulfill this mission, because the logic for interpreting FetchPolicy values was scattered across the codebase in an ad hoc manner, rather than centralized in one place.

For example, the default cache-first fetch policy should reliably trigger a network request if the relevant cache data disappears or becomes incomplete (#5991), but the code that detected that incompleteness (queryListenerForObserver) was not responsible for triggering network requests, and giving it the ability to call fetchQuery would have required yet another one-off, decentralized interpretation of options.fetchPolicy.

The scattering of FetchPolicy logic made it very difficult for anyone (including me) to understand how each fetch policy works at a glance, adding risk to any changes that involved fetch policies.

Thanks to the changes in this PR, the logic for interpreting fetch policies has been centralized into a single switch statement, which I think reads almost as clearly as pseudocode:

switch (fetchPolicy) {
  default: case "cache-first": {
    const diff = readCache();

    if (diff.complete) {
      return [
        resultsFromCache(diff, queryInfo.markReady()),
      ];
    }

    if (returnPartialData) {
      return [
        resultsFromCache(diff),
        resultsFromLink(true),
      ];
    }

    return [
      resultsFromLink(true),
    ];
  }

  case "cache-and-network": {
    const diff = readCache();

    if (diff.complete || returnPartialData) {
      return [
        resultsFromCache(diff),
        resultsFromLink(true),
      ];
    }

    return [
      resultsFromLink(true),
    ];
  }

  case "cache-only":
    return [
      resultsFromCache(readCache(), queryInfo.markReady()),
    ];

  case "network-only":
    return [resultsFromLink(true)];

  case "no-cache":
    return [resultsFromLink(false)];

  case "standby":
    return [];
}

The arrays returned by this code contain zero, one, or two observables, derived from the cache and/or the network link. The observables are concatenated together to form a single observable (specifically, a Concast observable) whose results are then consumed by the ObservableQuery (with any adjacent duplicate results skipped, as usual). At any point in time, most often when cache data has changed, or the application specifically requested a refetch, the ObservableQuery can reobserve the cache and/or the network by invoking this same code again. In other words, this switch statement is now the beating heart of the client codebase, and no data flows through the client without honoring the requested FetchPolicy.

This new system (fetchQueryObservable) is much more Observable-oriented than the previous fetchQuery method was, and I see this shift as a huge improvement. One of the reasons the QueryManager was so complicated was that the old fetchQuery method could return only a single Promise<ApolloQueryResult<T>>, even though there was often more than one result to deliver, necessitating a side-delivery mechanism (broadcastQueries). By allowing fetchQueryObservable to return multiple results using an Observable, we no longer have to worry about delivering different results using different mechanisms, since all results flow through a straightforward Observable pipeline.

Since this is such a deep refactoring, I found it impossible to keep existing tests passing during the initial exploration. Once I was happy with the shape of the new system, I resumed running and debugging the tests, and rebased any necessary code changes back into my previous commits. There was never any hope of leaving all existing tests unchanged, but I think/hope the test changes I made (see commits towards the end of this PR) are justifiable. The whole point of this refactoring was to make Apollo Client do a better job of fetching data, and "better" sometimes means "different." In any case, if you see something I changed in the tests that worries you, please leave comments/questions in-line.

The most substantial commit is Replace fetch{Query,Request} with new fetchQueryObservable method (1ac5d8d). I regret that so much ended up in that commit, but it's also a commit whose parts don't work until you put them all together, so I'm not going to waste time trying to split it up at this point. For what it's worth, everything that comes before that commit is "safe" in the sense of keeping all existing tests passing.

I will leave some comments below to draw attention to any interesting/controversial changes.

3nvi · 2020-04-30T17:46:22Z

Tried npm-installing it through the PR branch and building it locally, but had some issues. Is there a cache we can get a preview version on NPM to test it (whenever it's ready/stable)?

benjamn · 2020-04-30T19:20:52Z

@3nvi I've had some luck with running npm pack in the apollo-client/dist directory after a successful npm run build, and then doing

npm install ../path/to/apollo-client/dist/apollo-client-3.0.0-beta.*.tgz

to install that packaged .tgz file locally in other projects, but I will try to get a preview release published to npm today/soon!

benjamn · 2020-05-01T15:24:10Z

Ok, I can reproduce some of the tests failures locally if I use Node 10, which is what our CircleCI job uses. Locally I'd been using Node 12.16.1. ~~No theory yet on what the operative difference between the Node versions might be.~~ Fixed below!

The use of setTimeout in tests is often an invitation to race conditions and timing-sensitive outcomes. This raciness became particularly problematic thanks to changes (between Node 10 and Node 12) in the timing of Promise callbacks with respect to setTimeout callbacks. These changes also leave all tests passing when cherry-picked onto master, so I'm confident I am *not* changing the tests just to fix my PR #6221, at the expense of backwards compatibility.

src/react/data/QueryData.ts

darkbasic · 2020-05-01T19:26:05Z

@benjamn I suggest using multiple node versions on CI, one per major release.

The use of setTimeout in tests is often an invitation to race conditions and timing-sensitive outcomes. This raciness became particularly problematic thanks to changes (between Node 10 and Node 12) in the timing of Promise callbacks with respect to setTimeout callbacks. These changes also leave all tests passing when cherry-picked onto master, so I'm confident I am *not* changing the tests just to fix my PR #6221, at the expense of backwards compatibility.

@hwillson

Now that the ObservableQuery consumes results exclusively through QueryManager#fetchQueryObservable, we can rely on the addExportedVariables call that happens there, and avoid potentially re-triggering a network request while consuming network results. @hwillson I was a bit surprised this change didn't cause any tests to fail, but several tests did fail when I cherry-picked it onto master, so I'm optimistic this addExportedVariables logic is genuinely redundant after my refactoring PR #6221.

@durchanek

This version does *not* include my big PR #6221, which should be coming in the _next_ beta release. Code changes in beta.45 (since beta.44) include * #6194 by @durchanek * #6107 by @hwillson as well as several documentation PRs.

The use of setTimeout in tests is often an invitation to race conditions and timing-sensitive outcomes. This raciness became particularly problematic thanks to changes (between Node 10 and Node 12) in the timing of Promise callbacks with respect to setTimeout callbacks. These changes also leave all tests passing when cherry-picked onto master, so I'm confident I am *not* changing the tests just to fix my PR #6221, at the expense of backwards compatibility.

Includes major fetchPolicy refactoring PR: #6221 Please help test this version if you can, because it changes/improves some fundamental client behaviors.

The `cache.modify` API was first introduced in #5909 as a quick way to transform the values of specific existing fields in the cache. At the time, `cache.modify` seemed promising as an alternative to the `readQuery`-transform-`writeQuery` pattern, but feedback has been mixed, most importantly from our developer experience team, who helped me understand why `cache.modify` would be difficult to learn and to teach. While my refactoring in #6221 addressed concerns about broadcasting `cache.modify` updates automatically, the bigger problem with `cache.modify` is simply that it requires knowledge of the internal workings of the cache, making it nearly impossible to explain to developers who are not already Apollo Client 3 experts. As much as I wanted `cache.modify` to be a selling point for Apollo Client 3, it simply wasn't safe to use without a firm understanding of concepts like cache normalization, references, field identity, read/merge functions and their options API, and the `options.toReference(object, true)` helper function (#5970). By contrast, the `readQuery`-transform-`writeQuery` pattern may be a bit more verbose, but it has none of these problems, because these older methods work in terms of GraphQL result objects, rather than exposing the internal data format of the `EntityStore`. Since `cache.modify` was motivated to some extent by vague power-user performance concerns, it's worth noting that we recently made `writeQuery` and `writeFragment` even more efficient when rewriting unchanged results (#6274), so whatever performance gap there might have been between `cache.modify` and `readQuery`/`writeQuery` should now be even less noticeable. One final reason that `cache.modify` seemed like a good idea was that custom `merge` functions can interfere with certain `writeQuery` patterns, such as deleting an item from a paginated list. If you run into trouble with `merge` functions running when you don't want them to, we recommend calling `cache.evict` before `cache.writeQuery` to ensure your `merge` function won't be confused by existing field data when you write your modified data back into the cache.

The `cache.modify` API was first introduced in #5909 as a quick way to transform the values of specific existing fields in the cache. At the time, `cache.modify` seemed promising as an alternative to the `readQuery`-transform-`writeQuery` pattern, but feedback has been mixed, most importantly from our developer experience team, who helped me understand why `cache.modify` would be difficult to learn and to teach. While my refactoring in #6221 addressed concerns about broadcasting `cache.modify` updates automatically, the bigger problem with `cache.modify` is simply that it requires knowledge of the internal workings of the cache, making it nearly impossible to explain to developers who are not already Apollo Client 3 experts. As much as I wanted `cache.modify` to be a selling point for Apollo Client 3, it simply wasn't safe to use without a firm understanding of concepts like cache normalization, references, field identity, read/merge functions and their options API, and the `options.toReference(object, true)` helper function (#5970). By contrast, the `readQuery`-transform-`writeQuery` pattern may be a bit more verbose, but it has none of these problems, because these older methods work in terms of GraphQL result objects, rather than exposing the internal data format of the `EntityStore`. Since cache.modify was motivated to some extent by vague power-user performance concerns, it's worth noting that we recently made writeQuery and writeFragment even more efficient when rewriting unchanged results (#6274), so whatever performance gap there might have been between cache.modify and readQuery/writeQuery should now be even less noticeable. One final reason that `cache.modify` seemed like a good idea was that custom `merge` functions can interfere with certain `writeQuery` patterns, such as deleting an item from a paginated list. If you run into trouble with `merge` functions running when you don't want them to, we recommend calling `cache.evict` before `cache.writeQuery` to ensure your `merge` function won't be confused by existing field data when you write your modified data back into the cache.

@darkbasic

This fixes an issue found by @darkbasic while working with optimistic updates: #6183 (comment) The cache-first FetchPolicy is important not just because it's the default policy, but also because both cache-and-network and network-only turn into cache-first after the first network request (#6353). Once the cache-first policy is in effect for a query, any changes to the cache that cause the query to begin reading incomplete data will generally trigger a network request, thanks to (#6221). However, if the source of the changes is an optimistic update for a mutation, it seems reasonable to avoid the network request during the mutation, since there's a good chance the incompleteness of the optimistic data is only temporary, and the client might read a complete result after the optimistic update is rolled back, removing the need to do a network request. Of course, if the non-optimistic read following the rollback is incomplete, a network request will be triggered, so skipping the network request during optimistic updates does not mean ignoring legitimate incompleteness forever. Note: we already avoid sending network requests for queries that are currently in flight, but in this case it's the mutation that's in flight, so this commit introduces a way to prevent other affected queries (which are not currently in flight, themselves) from hitting the network.

Ever since the big refactoring in #6221 made the client more aggressive about triggering network requests in response to incomplete cache data (when using a cache-first FetchPolicy), folks testing the betas/RCs have reported that feuding queries can end up triggering an endless loop of unhelpful network requests. This change does not solve the more general problem of queries competing with each other to update the same data in incompatible ways (see #6372 for mitigation strategies), but I believe this commit will effectively put a stop to most cases of endless network requests. See my lengthy implementation comment for more details, since this is a non-obvious solution to a very tricky problem. Fixes #6307. Fixes #6370. Fixes #6434. Fixes #6444.

…6448) Ever since the big refactoring in #6221 made the client more aggressive about triggering network requests in response to incomplete cache data (when using a cache-first FetchPolicy), folks testing the betas/RCs have reported that feuding queries can end up triggering an endless loop of unhelpful network requests. This change does not solve the more general problem of queries competing with each other to update the same data in incompatible ways (see #6372 for mitigation strategies), but I believe this commit will effectively put a stop to most cases of endless network requests. See my lengthy implementation comment for more details, since this is a non-obvious solution to a very tricky problem. Fixes #6307. Fixes #6370. Fixes #6434. Fixes #6444.

Should fix #6354, #6542, #6534, and #6459. Reliable delivery of loading results is one of the core benefits that Apollo Client strives to provide, expecially since handling loading states tends to be such an annoying, error-prone task in hand-written state management code, and Apollo Client is all about keeping hand-written state management code to a minimum. Ever since I refactored the FetchPolicy system in #6221, the fetchMore method of ObservableQuery has not been delivering a loading state before sending its request. Instead, the observed query was updated only once, after the completion of the fetchMore network request. I've been aware of this problem for a while now, but I procrastinated solving it because I knew it would be, well, annoying. With the final release of AC3 right around the corner (Tuesday!), the time has come to get this right. This loading result doesn't fit neatly into the fetchQueryObservable system introduced by #6221, since the consumer of the loading result is the original ObservableQuery, but the network request gets made with a fresh query ID, using different variables (and possibly even a different query) passed to fetchMore. This separation is important so that we don't have to change the query/variables/options of the original ObservableQuery for the duration of the fetchMore request. Instead, the fetchMore request is a one-off, independent request that effectively bypasses the usual FetchPolicy system (technically, it always uses the no-cache FetchPolicy). In Apollo Client 2.x (and before #6221 was released in beta.46), this logic was guided by an extra fetchMoreForQueryId parameter passed to QueryManager#fetchQuery, which ended up appearing in a number (16) of different places, across three different methods of the QueryManager. I think it's an improvement that the logic is now confined to one block of code in ObservableQuery#fetchMore, which seems naturally responsible for any fetchMore-related logic. Still, I don't love the precedent that this simulated loading state sets, so I hope we can avoid similar hacks in the future.

Should fix #6354, #6542, #6534, and #6459. Reliable delivery of loading results is one of the core benefits that Apollo Client strives to provide, especially since handling loading states tends to be such an annoying, error-prone task in hand-written state management code, and Apollo Client is all about keeping hand-written state management code to a minimum. Ever since I refactored the FetchPolicy system in #6221, the fetchMore method of ObservableQuery has not been delivering a loading state before sending its request. Instead, the observed query was updated only once, after the completion of the fetchMore network request. I've been aware of this problem for a while now, but I procrastinated solving it because I knew it would be, well, annoying. With the final release of AC3 right around the corner (Tuesday!), the time has come to get this right. This loading result doesn't fit neatly into the fetchQueryObservable system introduced by #6221, since the consumer of the loading result is the original ObservableQuery, but the network request gets made with a fresh query ID, using different variables (and possibly even a different query) passed to fetchMore. This separation is important so that we don't have to change the query/variables/options of the original ObservableQuery for the duration of the fetchMore request. Instead, the fetchMore request is a one-off, independent request that effectively bypasses the usual FetchPolicy system (technically, it always uses the no-cache FetchPolicy). In Apollo Client 2.x (and before #6221 was released in beta.46), this logic was guided by an extra fetchMoreForQueryId parameter passed to QueryManager#fetchQuery, which ended up appearing in a number (16) of different places, across three different methods of the QueryManager. I think it's an improvement that the logic is now confined to one block of code in ObservableQuery#fetchMore, which seems naturally responsible for any fetchMore-related logic. Still, I don't love the precedent that this simulated loading state sets, so I hope we can avoid similar hacks in the future.

Should fix #6354, #6542, #6534, and #6459. Reliable delivery of loading results is one of the core benefits that Apollo Client strives to provide, especially since handling loading states tends to be such an annoying, error-prone task in hand-written state management code, and Apollo Client is all about keeping hand-written state management code to a minimum. Ever since I refactored the FetchPolicy system in #6221, the fetchMore method of ObservableQuery has not been delivering a loading state before sending its request. Instead, the observed query was updated only once, after the completion of the fetchMore network request. I've been aware of this problem for a while now, but I procrastinated solving it because I knew it would be, well, annoying. With the final release of AC3 right around the corner (Tuesday!), the time has come to get this right. This loading result doesn't fit neatly into the fetchQueryObservable system introduced by #6221, since the consumer of the loading result is the original ObservableQuery, but the network request gets made with a fresh query ID, using different variables (and possibly even a different query) passed to fetchMore. This separation is important so that we don't have to change the query/variables/options of the original ObservableQuery for the duration of the fetchMore request. Instead, the fetchMore request is a one-off, independent request that effectively bypasses the usual FetchPolicy system (technically, it always uses the no-cache FetchPolicy). In Apollo Client 2.x (and before #6221 was released in beta.46), this logic was guided by an extra fetchMoreForQueryId parameter passed to QueryManager#fetchQuery, which ended up appearing in a number (16) of different places, across three different methods of the QueryManager. I think it's an improvement that the logic is now confined to one block of code in ObservableQuery#fetchMore, which seems naturally responsible for any fetchMore-related logic.

PR #6221 began enforcing fetch policies more consistently/literally in response to cache updates (so we do not accidentally skip network requests that are required by policy). That consistency has been valuable, but there are some exceptions where we want to deliver a single result from the cache, but we do not want to use the fetchQueryObservable system to deliver it, because we do not want the delivery to have the side effect of triggering a network request under any circumstances, regardless of the fetch policy. For example, we want to deliver a { loading: true, networkStatus: NetworkStatus.fetchMore } result for the original query at the beginning of a fetchMore request (see #6583), but that one-off result should not cause the original query to fire a network request if the cache data happens to be incomplete, because that would likely interfere with the fetchMore network request. At the time I implemented #6583, it was the only exception that I knew of, so I kept things simple and fetchMore-specific. I've since found another example (not fetchMore-related this time), so I think it's time to make this pattern more official. Hence ObservableQuery#observe, whose name is meant to draw a contrast with the ObservableQuery#reobserve method. The observe method simply delivers the latest result known to the ObservableQuery, reading from the cache if necessary, whereas reobserve reapplies the chosen fetch policy, possibly making network requests.

This essentially undoes commit b239124, which was introduced as part of PR #6221. I remember thinking, as I made that commit, "This change shouldn't be necessary, but it seems harmless enough." As it turns out, changing these tests was not necessary, but was instead a symptom of overreacting to the delivery of optimistic mutation results. I'm happy to be putting things back the way they were.

benjamn added 💡 idea 👩‍🏭 refactor 🧞‍♂️ enhancement labels Apr 30, 2020

benjamn added this to the Release 3.0 milestone Apr 30, 2020

benjamn self-assigned this Apr 30, 2020

This comment has been minimized.

Sign in to view

benjamn force-pushed the fetchQueryObservable-refactor branch 2 times, most recently from e79fc31 to 28653fb Compare April 30, 2020 16:05

This comment has been minimized.

Sign in to view

This was referenced Apr 30, 2020

[3.0] useQuery does not update when an optimistic response is rolled back #5790

Closed

AC3: Missing Field Silently Prevents Entire Query From Updating #5991

Closed

Implement InMemoryCache#modify for surgically transforming fields. #5909

Merged

benjamn commented May 1, 2020

View reviewed changes

src/react/data/QueryData.ts Outdated Show resolved Hide resolved

benjamn marked this pull request as ready for review May 1, 2020 18:24

benjamn requested review from jbaxleyiii and hwillson May 1, 2020 18:24

benjamn force-pushed the fetchQueryObservable-refactor branch from a6bc761 to 913e4d9 Compare May 4, 2020 16:37

benjamn force-pushed the fetchQueryObservable-refactor branch from 913e4d9 to dd86a71 Compare May 4, 2020 16:49

benjamn force-pushed the fetchQueryObservable-refactor branch from fefbe5d to 114566a Compare May 4, 2020 18:25

benjamn added a commit that referenced this pull request May 7, 2020

Bump @apollo/client npm version to 3.0.0-beta.46.

884b1e5

Includes major fetchPolicy refactoring PR: #6221 Please help test this version if you can, because it changes/improves some fundamental client behaviors.

This was referenced May 7, 2020

useQuery not updating loading state #6253

Closed

Polling starts when variables change for a skipped query #6240

Closed

qburnsFB mentioned this pull request May 8, 2020

SSR not waiting for queries to resolve apollographql/react-apollo#3678

Open

benjamn mentioned this pull request May 8, 2020

Components never get updated after writeData or writeQuery #3909

Closed

benjamn mentioned this pull request May 16, 2020

Remove experimental cache.modify method. #6289

Merged

darkbasic mentioned this pull request May 18, 2020

[Regression] apollo-client 3.0.0-beta.44 does not rollback optimistic updates #6183

Closed

jebonfig mentioned this pull request May 18, 2020

Cache-and-network should not make a network call when watched cached items change via subscription/mutation #6305

Closed

benjamn mentioned this pull request Jun 9, 2020

Avoid network after incomplete optimistic cache results. #6419

Merged

benjamn mentioned this pull request Jun 16, 2020

Stop feuds by skipping cache writes for unchanged network results. #6448

Merged

zetlen mentioned this pull request Jun 17, 2020

Venia Localization w/ i18n magento/pwa-studio#2314

Closed

glentakahashi mentioned this pull request Jul 1, 2020

Retain previous result when partialRefetching #6012

Closed

2 tasks

benjamn mentioned this pull request Jul 13, 2020

Simulate loading result for original query before fetchMore. #6583

Merged

sirugh mentioned this pull request Jul 22, 2020

skip option fails when assigned a variable that flips from false to true #6670

Closed

benjamn mentioned this pull request Aug 3, 2020

Apollo Client 3 reloading query after mutation update #6760

Closed

stoically mentioned this pull request Oct 23, 2020

Don't read from cache with network-only and no-cache policy? #7219

Closed

mogelbrod mentioned this pull request Mar 15, 2021

fix: Allow fetchMore() to be used with 'no-cache' fetchPolicy #6023

Closed

2 tasks

benjamn mentioned this pull request Oct 13, 2021

Fix #6557 - Fix slow down on multiple fetchMore #8920

Closed

github-actions bot locked as resolved and limited conversation to collaborators Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor QueryManager to make better use of observables and enforce fetchPolicy more reliably #6221

Refactor QueryManager to make better use of observables and enforce fetchPolicy more reliably #6221

benjamn commented Apr 30, 2020 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

3nvi commented Apr 30, 2020

benjamn commented Apr 30, 2020 •

edited

Loading

benjamn commented May 1, 2020 •

edited

Loading

darkbasic commented May 1, 2020

Refactor QueryManager to make better use of observables and enforce fetchPolicy more reliably #6221

Refactor QueryManager to make better use of observables and enforce fetchPolicy more reliably #6221

Conversation

benjamn commented Apr 30, 2020 • edited Loading

This comment has been minimized.

This comment has been minimized.

3nvi commented Apr 30, 2020

benjamn commented Apr 30, 2020 • edited Loading

benjamn commented May 1, 2020 • edited Loading

darkbasic commented May 1, 2020

benjamn commented Apr 30, 2020 •

edited

Loading

benjamn commented Apr 30, 2020 •

edited

Loading

benjamn commented May 1, 2020 •

edited

Loading