Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix duplicate fetches in query plans [js] #671

Merged
merged 4 commits into from Apr 29, 2021

Conversation

martijnwalraven
Copy link
Contributor

Fixes #399 and #294.

(This replaces #579, which fixed the same issue in the Rust query planner.)

The issue fixed by this PR requires a bit of unpacking. The visible result is that in some cases, the query plan will contain duplicate fetches to the same underlying service. For this query, for example:

query {
  topProducts {
    name
    price
    reviews {
      body
      author {
        name
      }
    }
  }
}

We end up fetching the names of authors as many times as there are different types of products. (It isn’t just that we fetch the authors of the reviews of each product type separately, but we literally execute the same query with the same representations, so all authors for all products, multiple times.)

In the case above, this is triggered as a consequence of type explosion, but it is a separate issue and also occurs in other cases where we manually include duplicate nested fields under multiple type conditions.

This query, for example, will also contain as many fetches for authors as there are type conditions (so two fetches here):

query {
  topProducts {
    name
    price
    ... on Book {
      reviews {
        body
        author {
          name
        }
      }
    }
    ... on Video {
      reviews {
        body
        author {
          name
        }
      }
    }
  }
}

The underlying cause is that we don't merge dependent fetch groups from subfields. In this case, we’ll create separate fetch groups for { name } nested under the different type conditions, and even though these operate on the same path and fetch data from the same service, they are treated as separate fetches (that all have the same path however, which is why they take the same representations as input).

This PR attempts to solve this by manually merging nested fetch groups whenever they are compatible. That fixes the issue in most cases, but there are more complicated situations it doesn't address and that will require more substantial changes to the query planner and execution.

In particular, when possible types are spread over multiple services, it will not be possible to merge nested fetch groups from subfields because these should actually result in separate fetches. What you don't want however, is for those fetches to operate on the same objects just fetched by the other, but since representations taken as input are collected by path, there is no way to differentiate between these.

I believe solving this latter problem requires additional capabilities in the query plan (which is something we’ve talked about before). The most straightforward way to fix this seems to be a TypeCondition node in the query plan, which filters input representations based on their type.

deepEqual(group.mergeAt, dependentGroup.mergeAt),
);
if (existingDependentGroup) {
existingDependentGroup.fields.push(...dependentGroup.fields);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we merge other things than fields from dependentGroup? Mainly thinking of potential dependent groups of dependentGroup; I'm not sure we can guarantee those won't exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch! I'll see if I can come up with a failing test, but I think you're right.

// In order to avoid duplicate fetches, we try to find existing dependent
// groups with the same service and merge path first.
for (const dependentGroup of subGroup.dependentGroups) {
const existingDependentGroup = parentGroup.otherDependentGroups.find(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, should this be parentGroup.dependentGroups (which refers to both dependentGroupsByService and otherDependentGroups)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martijnwalraven Random drive-by thought. When I recently reviewed the query planner, I found the aspect of FetchGroup where there are two separate sets of subgroups to be pretty confusing. When I dived in farther, it seemed like dependentGroupsByService was only really written from inside splitSubfields, which is called only from completeField with the FetchGroup subGroup, and subGroup isn't added directly to the tree itself but instead its dependent groups get extracted and added to the otherDependentGroups of an existing tree member.

So it seemed to me there there might be a simplifying refactor where FetchGroup just has a set/list of dependent groups (like otherDependentGroups now), and the logic currently done in dependentGroupForService could be specific to splitSubfields or completeField. ie, dependentGroupsByService could perhaps be a local variable used in building up the FetchGroup rather than hardcoded into the FetchGroup structure.

I tried doing this refactor for a few minutes and it wasn't 100% trivial (eg, the baseGroup.dependentGroupForService call is a bit different), but it might be feasible and increase clarity?

(Alternatively, maybe the opposite refactor where dependentGroupsByService is the only structure that needs exist could happen, and the single write to otherDependentGroups in completeSubfield could instead somehow merge subGroups.dependentGroups into parentGroup.dependentGroupsByService?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed this is confusing, and I think refactoring it would help. I wanted to keep this fix as small as possible, especially since we plan on making larger changes to the query planner soon.

@martijnwalraven martijnwalraven force-pushed the martijnwalraven/duplicate-fetches-fix-js branch from 4a528a0 to 2d6f270 Compare April 23, 2021 14:37
@abernix abernix requested a review from pcmanus April 27, 2021 07:42
Copy link
Contributor

@pcmanus pcmanus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside of minor nits, I still have one concern about the recursive nature of the merge of dependent groups, though I'm not 100% sure if it's a valid concern.

`;

const { queryPlan, errors } = await execute({ query }, [
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there differences between the schema of those few tests? If not, maybe pull the definitions into variables?

//
// Solving this requires us to filter on the types of response objects as
// opposed to just collecting all objects in the path.
it("when including the same nested fields under different type conditions that are split between services", async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I'd deal with such test is to make it assert "what we would want" and mark the test as "ignored"/"skipped" (I assume jest has this notion). I don't strongly mind keeping it the way it is (having it assert the current but inherently wrong behavior), I'm just less used to doing it so I figured I'd point it out. And one minor downside imo is that this make it less obvious what we actually want (but that's minor and the comment is decently clear about it).

@@ -752,7 +752,8 @@ function completeField(
splitSubfields(context, fieldPath, subfields, subGroup);
debug.groupEnd();

parentGroup.otherDependentGroups.push(...subGroup.dependentGroups);
// We need to merge in dependent groups of the subgroup to the parent group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no blaming because I'm certainly guilty of that kind of comment from time to time, but I'll remark that the comment as is provides no value (it paraphrases the next line). But it would very useful to quickly explain "why" we need to merge (maybe something like "we may have created multiple dependent groups to the same service for the same path; merging such groups to avoid duplicate fetches").

this.fields.push(...otherGroup.fields);
this.requiredFields.push(...otherGroup.requiredFields);
this.providedFields.push(...otherGroup.providedFields);
this.mergeDependentGroups(otherGroup.dependentGroups);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you noticed this recusing merging solves concrete inefficiencies, but I'm having trouble convincing myself that it's always correct.

Mainly, unless I'm not reading this correctly, this appear to "flatten" the dependent groups, because we merge the dependents of the dependents of the subGroup into the dependents of the parentGroup. And that doesn't sound safe in general.

Say we merge into group A a dependent group B <- C (group B with a dependent group C). I believe we end up with A <- B and A <- C, meaning C ends up a direct dependent of A. But cannot it be that C has required fields from B? In which case, we'll end up fetching from C with only the data from A (and in parallel of B) and won't get our proper inputs. Is there a hole in that reasoning/a miscomprehension of the code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcmanus If we remove the recursion (and ticket it for follow-up work; not block), do we still solve the use-case / reproduction we set out to resolve? Does it not make things worse for other cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martijnwalraven will know better than I do, but I believe we do fix the main cases we set out to resolve without the recursion, but @martijnwalraven apparently noticed some cases where new duplicate fetches were created (which I don't understand so we need to look at those cases more closely).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it's a tad embarrassing but my comment above is actually incorrect and the recursive call does not "flatten" anything, it does properly recurse in both this and otherGroup at the same time.

As I mentioned to @martijnwalraven, I think my brain tricked me in reading the line

this.mergeDependentGroups(otherGroup.dependentGroups);

more as

this.merge(otherGroup.dependentGroups);

and was like "wait, why are we descending into the dependents of otherGroup but not of this?", but ... bad brain ... bad.

So sorry about that, this actually looks good.

Maybe a minor suggestion would be to make mergeDependentGroups take a FetchGroup as argument (so it means "merge the dependent groups of that group to the dependent groups of this group" (minor aside: when dealing with method that act mostly symmetrically on both this and a single other arg, I've been a big fan of calling the other argument that, both in comments and code; I find it reads very well)). This would make the recursive call be

this.mergeDependentGroups(otherGroup);

which is more symmetric.

But just a suggestion and not insisting if you prefer it the way it is: it's not because I had a brain fart that others will.

Copy link
Contributor

@pcmanus pcmanus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As detailed above, my main concern was actually unfounded. Remains only a few minor remarks and while I'm happy if you find them useful, the PR lgtm either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query planner duplicate calls to services on interface types
4 participants