-
-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve lists of types when type merging to reduce request overhead #1710
Comments
Architecturally, I am of the opinion that batching and caching should happen within the executor layer, just as with schema stitching generally. Not sure if anybody has yet gotten this working yet to satisfaction, so I still think appropriate to keep this open. |
Care to elaborate on your justification for that opinion? If the goal is to both reduce network overhead and data access overhead, and the idea is to push the problem to the executor, then my guess is that you're thinking this:
How does the executor know which / how many requests to batch together? We can't combine queries across different users, so does it have the context it needs to combine queries related to the same "origin" query? Or does it need a timeout? Also, is it possible to use dataloader across multiple operations within the same request? Does it async lazy-load across all operations, or only on one operation at a time? It seems to me that, when possible, allowing developers to implement batching via a query on the delegated schema is a simple and sweet easy-to-reason-about solution that puts them in control. Other executor or dataloader based forms of batching and caching could even be done in addition to this, if that kind of optimization is still necessary. But, you often sacrifice ease of debugging with these more complicated solutions, and in many cases that tradeoff does not make sense. |
I think I am thinking that having a complete batching solution in the executor is just simplest possible solution, and that everything can be batched essentially, a la Prisma and Gatsby implementations. Former HttpLinkDataLoader, latter not yet extracted from Gatsby as per @vladar because of some Gatsby specific optimizations. You can in any case introduce your own custom merge resolver which should I believe allow you to implement batching as you suggest with a more resolver centered data loader approach. For those reasons, I am not personally planning on working on this. However -- PRs are welcome! |
Specifically, related to some of your points, the executor has access to the original gateway graphql context and therefore should have the ability to create a new executor/link that batches everything related to the original request, which is essentially what Prisma binding did and what I am recommending. |
Thanks for elaborating. I'll have to acquaint myself with the Gatsby/Prisma approaches and see if what I find is useful for me. You mentioned that it's already possible to do what I'm suggesting using a custom merge resolver, but it's not immediately clear to me how that's possible. To be clear, you're talking about |
Just mean you have access to context there and can create/use data loader at that point to batch requests and send to list endpoint. You will need a separate data loader for each field combination, keyed on the printed query I guess, keyed on the printed query. Question in my mind is do we know that this is actually better than just batching multiple single root field requests or multiple operations if the subschema/service is itself using data loader? And if network requests are compressed by default To me, the generic batching seems most useful, and I have not delved deeply into how much optimizing for any particular use case gets you any added benefit. Would love to see some numbers if you have the time!!! |
Although you have had on a significant difference between type merging here and Federation which has the entities root field that I believe that allows batching by default? Perhaps it is worth doing just a generic comparison between type merging with schema stitching and federation in terms of speed and bandwidth and seeing if this is necessary from that angle |
I'm sorry, can you rephrase this?
|
I am basically saying that in Federation I think you get this behavior for free Is Federation faster for it from end to end does it help versus the batch execution I am talking about here These are interesting questions that I would love to know the answer to Should have said "have hit on" not "have had on," apologize, was on mobile... |
You've seen this (since you fixed a bug I discovered via it this weekend) but I created a repo where I converted a Federation demo to use schema stitching via graphql-tools: And here's the original: Do you think these would be enough to test for speed? |
Sounds great, let us know what you find out. Make sure to use executor with batching such as Prisma HttpLinkDataLoader on the type merging side! |
FWIW here is the "batching" feature of Gatsby explained: And here is the code: I've found batching But if you have named queries you can write a custom executor to batch only specific queries. It is also more efficient in terms of fragments re-usage and error handling. P.S. As for gatsby-specific part which doesn't allow it to be used in general - it's the requirement in Gatsby that all fragment names are globally unique. In fact, here is a quote from comments:
A general solution for query merging would additionally require fragment renaming for fragments with different contents. But that should be relatively easy to do. |
@vladar Thanks for chiming in! @flux627 I think the first step might be a PR that adds memoization to the following points:
And then we can change buildDelegationPlan to give selection sets instead of selections so a static selectionSet option is passed to merge here:
And then instead of using dataloader via the printed query, a custom merge resolver can implement batching based on the context, info, subSchema, selectionSet, replacing the default merge resolver here:
And then once we get a custom merge resolver going, we could add an option to "build it in" |
The latest changes of #1728 are available as alpha in npm: You can quickly update your package.json by running:
You should then be able to work on a dataloader version of a merging resolver that sends a query in bulk. At least I think! |
You would need a new loader for every combination of context, info, subschema, and selectionSet, so you could use a memoized getLoader function, similar to use of memoize4 in #1728 |
Oops, one cannot memoize based on info to assume you are in the middle of a list, not sure where I got that idea, you can compare the bits of the execution context from within info... |
Add key option within type merging config to enable batch loading for lists. This minimal version of batching uses cached dataLoaders to create a separate batch for each list rather than for every similar query within the resolution tree. This is because a new dataloader is created for every new info.fieldNodes object, which is memoized upstream by graphql-js within resolution of a given list to allow the loader to be used for items of that list. A future version could provide an option to batch by similar target fieldName/selectionSet, but this version may hit the sweet spot in terms of code complexity and batching behavior. see: #1710
Add key option within type merging config to enable batch loading for lists. This minimal version of batching uses cached dataLoaders to create a separate batch for each list rather than for every similar query within the resolution tree. This is because a new dataloader is created for every new info.fieldNodes object, which is memoized upstream by graphql-js within resolution of a given list to allow the loader to be used for items of that list. A future version could provide an option to batch by similar target fieldName/selectionSet, but this version may hit the sweet spot in terms of code complexity and batching behavior. see: #1710
Add key option within type merging config to enable batch loading for lists. This minimal version of batching uses cached dataLoaders to create a separate batch for each list rather than for every similar query within the resolution tree. This is because a new dataloader is created for every new info.fieldNodes object, which is memoized upstream by graphql-js within resolution of a given list to allow the loader to be used for items of that list. A future version could provide an option to batch by similar target fieldName/selectionSet, but this version may hit the sweet spot in terms of code complexity and batching behavior. see: #1710
Add key option within type merging config to enable batch loading for lists. This minimal version of batching uses cached dataLoaders to create a separate batch for each list rather than for every similar query within the resolution tree. This is because a new dataloader is created for every new info.fieldNodes object, which is memoized upstream by graphql-js within resolution of a given list to allow the loader to be used for items of that list. A future version could provide an option to batch by similar target fieldName/selectionSet, but this version may hit the sweet spot in terms of code complexity and batching behavior. see: #1710
* fix memoization = memoize info using info.fieldNodes, which is memoized by upstream graphql-js = refactor StitchingInfo type and handleObject and supporting methods to use consistent keys to enable memoization * add batchDelegate package Add key option within type merging config to enable batch loading for lists. This minimal version of batching uses cached dataLoaders to create a separate batch for each list rather than for every similar query within the resolution tree. This is because a new dataloader is created for every new info.fieldNodes object, which is memoized upstream by graphql-js within resolution of a given list to allow the loader to be used for items of that list. A future version could provide an option to batch by similar target fieldName/selectionSet, but this version may hit the sweet spot in terms of code complexity and batching behavior. see: #1710 * reorganize delegate files remove unused isProxiedResult function move unwrapResult and dehoistResult into createMergedResolver WrapFields and HoistField transforms now use their own unwrapping and dehoisting logic, so these functions should be located only to the file that used them
The latest changes of #1735 are available as alpha in npm: Quickly update your package.json by running:
This requires documentation, of course, but the API I have come up with differs a bit from the one you proposed above and would be as follows: stitchSchemas({
subschemas: [
{
schema: userSchema,
merge: {
User: {
fieldName: "usersByIds",
// Turn this feature on for this type/schema combination
// by specifying a key to convert each original result into a member of an array
key: ({ id }) => id,
resolveMany: true,
selectionSet: "{ id }",
// This is now passed an array of all the keys instead of the originalResult, which can then be converted into the args
args: ( ids ) => ({ ids }),
// Maximum number of grouped objects per request, may cause multiple requests
// maxGroupSize: 10
// not implemented as of yet -- you can provide this behavior however by providing your own resolve function
// within the MergedTypeConfig that uses the new batchDelegate package which lets you pass through
// whatever DataLoader options you want
}
}
},
...
]
}) |
See graphql-tools/packages/stitch/src/stitchingInfo.ts Lines 107 to 128 in d319d27
Should be pretty easy to use createBatchDelegateFn to create your own resolve function to customize batching behavior -- there is a third argument unused in that example that allows you to set DataLoaderOptions. createBatchDelegateFn is likely also useful when stitching/adding fields without type merging! |
Wow, this looks great. I've been pretty busy, but will test it out sometime this week or weekend. |
Haven't done any performance tests yet, but it seems to be working as expected! In my example project, I have log statements whenever a service is accessed for data. Here are two different queries with their respective service access logs, first using normal delegation, and then using batched: Normal delegation branch Query 1:query {
topProducts {
weight
shippingEstimate
reviews {
body
author {
name
reviews {
body
}
}
}
}
} Without batching:
With batching:
Query 2:query {
me {
id
name
reviews {
body
author {
id
name
}
product {
upc
name
inStock
price
weight
shippingEstimate
}
}
}
} Without batching:
With batching:
Of course, with more returned data, these number of request differences would be much larger. Some thoughts:
|
batchDelegate currently ensures there is the same selectionSet by limiting batching to objects within the same list, creating a separate dataLoader for each unique info.fieldNodes, which is the same across objects of the same type within a list because of memoization. It could be changed/improved by giving an option to give you a unique dataloader for every subschema/operation/fieldName/selectionSet combination, which could be done by using Maps for the primitives (operation/fieldName), WeakMaps on the objects themselves (subschema, selectionSet -- now that type merging is using memoization for the selectionSet), or by generating a primitive key, i.e. the printed query. That would give you primitive caching ability to avoid requerying. But I guess I reiterate my concern from above -- maybe this should really be within the job of the executor -- In the example above, the duplicative queries within batched query 2 can never be batched -- only cached, because they will always be in different ticks of the request cycle, and it is only a happy coincidence that they have the exact same selectionSet. This will require batching/caching by printed query rather than by the maps above. And what is the selectionSet causes only a partial cache hit? The next feature you might ask for would be to supply the cached data from the cache and send the rest onward. That would be a good idea! But a full caching solution seems outside the scope of this project -- it seems like a better idea would be to customize the executor and use the caching within Apollo Client or urql that seem to support partial cache hits. [ Basically, batching might be a good idea to do internally, but in terms of caching, I am not convinced. PRs are always welcome if you think this would be more trivial to implement than I do. In terms of the API suggestion, in retrospect, you are probably right that it would be better to have a more obvious option. A PR there is welcome as well. |
I think client caching is definitely worth considering as part of a full strategy, but I don't think the conclusion is that server-side field resolution duplication isn't worth doing. If you serve a public API, for example, you can't control what your clients do. You're then providing more value to the consumer (or your wallet) by being able to provide lower cost usage by default. It's also not quite the same thing. What I'm suggesting would be isolated to the scope of a single query (and its delegated subqueries), since this avoids having to invoke all the opinionated decisions of how to deal with caching between queries. All those decisions can be made on a case-by-case basis on top of per-request key-based object caching. Let me know if this makes sense- if we put this caching in the executor, then we could create a map per client request that:
Does the executor have enough information to provide this kind of caching? How does the executor know how to key the objects? You mentioned having a cache per subschema- what advantages does this have over having a cross-subschema merged object cache? |
I was not suggesting client side caching, i was suggesting having the caching done within the executor. Because the executor has access to the context, you can easily create a per client request executor with its own cache individuated to that subschema. The main reason I think it makes sense to push caching to the executor is that caching is a graphql-wide problem that has some good solutions already on the client level, we would be using them on the server acting as a client to the subschema. I see what you are saying about batching being something that makes sense to do internally, partially because of comments @vladar above, but in terms of caching, because selection sets may differ, a graphql-esque solution would seem to involve partial cache hits and some well-worn client-side solution being used on the server. Still, whatever solution you come up with, I think we are very open to PRs here by others. I am just sharing what I am willing to work on personally. For some prior art on request-specific caching, see: |
To be very explicit, I do not think it makes sense to come up with a schema-stitching specific version of caching -- i think it makes sense to use Apollo Client or urql that bring their own systems to the table. |
Aha, yes I misunderstood your comment, thank you for elaborating. It had not clicked that the client-side solutions would be useful from within the executor. I will investigate to see what these solutions provide, and if I come up with a good design I'll share it here for feedback. Since it is likely out-of-scope from the original feature proposed by this issue, I will make a new issue about it. I will also look into making a PR for passing the |
Thanks again for the impetus for batchDelegate functionality! I have to grown to love it. :) Please do open up issues and PRs as you see fit. Mine is not the only opinion and if you come to the table with an alternate approach, I am sure @ardatan and the Guild more broadly will be very receptive. Thanks again!!! |
WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Does not pass `info` argument to the batched executor call, disabling any executors that used extensions annotated onto info. TODO: - Add testing! - Extensions support should be added by a custom option? Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954
WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Does not pass `info` argument to the batched executor call, disabling any executors that used extensions annotated onto info. TODO: - Add testing! - Extensions support should be added by a custom option? Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954
WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Does not pass `info` argument to the batched executor call, disabling any executors that used extensions annotated onto info. TODO: - Add testing! - Extensions support should be added by a custom option? Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954
WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. TODO: - Add testing! Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954 fix
WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. TODO: - Add testing! Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954
WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. TODO: - Add testing! Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954
WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. TODO: - Add additional testing! Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954
WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. TODO: - Add additional testing! Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954
WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. TODO: - Add additional testing! Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954
WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. TODO: - Add additional testing! Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954
When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954
* enable batch execution When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954 * fix(delegate): pass extensions to executor/subscriber
Feature Request
Let's say you are using
stitchSchemas()
to stitch two schemas like this:And a query like this:
With current type merging, the system will do something like this:
userSchema
withme.id
, query themessageSchema
withme.messages[].author.id
, query theuserSchema
forInstead, it would be great if step 3 could be replaced by:
me.messages[].author.id
and query theuserSchema
withIn fact, if that were possible, then we could even replace
userById
withusersByIds
entirely, using the plural even in singular contexts, for simplicity.Proposed API:
Is this something maintainers of this project would be open to including?
The text was updated successfully, but these errors were encountered: