Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batching implementations should (optionally?) run queries in parallel #176

Closed
glasser opened this issue Oct 19, 2016 · 23 comments
Closed

batching implementations should (optionally?) run queries in parallel #176

glasser opened this issue Oct 19, 2016 · 23 comments
Assignees
Labels

Comments

@glasser
Copy link
Member

glasser commented Oct 19, 2016

Currently they use await to yield before running the next query.

@glasser glasser changed the title batching implementations should run queries in parallel batching implementations should (optionally?) run queries in parallel Oct 19, 2016
@helfer
Copy link
Contributor

helfer commented Oct 19, 2016

When we fix this we should also move batch execution into the core, since it's a functionality shared by all integrations.

@turadg
Copy link

turadg commented Dec 15, 2016

Looks like this is blocked on #186.

In the meantime, hack it in graphql-server-hapi with PhiloInc@4408f55 (Thanks @NeoPhi!) or graphql-server-express with remind101@55f4a08

@mdebbar
Copy link

mdebbar commented Jan 19, 2017

@turadg I'm having trouble installing from your fork (remind101/graphql-server@55f4a08). Any idea how I can install graphql-server-express from your fork? npm doesn't seem to support installing from a subdirectory/sub-package.

@DxCx
Copy link
Contributor

DxCx commented Jan 19, 2017

well, hopefully #180 will be merged soon, then that's the next thing i was planing to take care of.

@mdebbar
Copy link

mdebbar commented Jan 19, 2017

Thanks @DxCx! In that case, I'll wait for the official release since it's not a bug fix, but an optimization.

@turadg
Copy link

turadg commented Jan 19, 2017

@mdebbar our fork is published under a private org, so you'd have to publish your own package somewhere.

Looking forward to the official release!

@DxCx
Copy link
Contributor

DxCx commented Jan 19, 2017

So 180 got merged :)
Now regrading running in parallel, i would like to hear your toughts about it:

  1. Should it just run in parallel? Or extra flag is needed, if we are adding a flag we still need to define default behavior.
  2. If the whole batching proccess is over an array of queiries, thats no problem, since data wont change, but if we have a mutation inside the batch then data may get effected and we need to run in order, how do you think we should handle that case?

@helfer

@glasser
Copy link
Member Author

glasser commented Jan 19, 2017 via email

@DxCx
Copy link
Contributor

DxCx commented Jan 19, 2017

@glasser hmm interesting thought.
well, im not 100% familiar with batching on client side, but as far as i know the user doesn't have much control over it, i mean, he can either enable it or not, and then operations will be aggregated and batched (sometimes by ticks, sometimes by just a serial clock to send a queue)
therefore, i'm not sure how much we can expect from the user to co-operate with this.

however, we can for example, handle all the mutations before the queries on the same batch..
so basically mutations will have higher priority..

@NeoPhi
Copy link
Contributor

NeoPhi commented Jan 19, 2017

I don't feel like the server should try to figure out the intensions of the client with regard to serial or parallel execution if mutations are involved. If the queries are independent of the mutations the overall request time will take longer since the queries must wait until the mutation finishes. As batching is an optimization (versus the client issuing parallel requests) any sequencing involving mutations should be handled by the client (this is the approach Relay takes). That way the server only has to focus on resolving requests as fast as possible.

@mdebbar
Copy link

mdebbar commented Jan 19, 2017

The way I think about it is that, from a developer's perspective, batched queries should behave the same as individual queries and provide the same guarantees.

When batching is disabled and the client does a query and a mutation simultaneously (from different parts of the code), there are no guarantees which one will be executed first. It all depends on network conditions and how the server handles simultaneous requests.

So the developer already doesn't have any expectations on the order of execution. And that gives the graphql server the freedom to do what it thinks is best for batched queries.

IMO, the best way to handle a mix of queries and mutations is to run the mutations first (sequentially or in parallel?*), then run all queries in parallel. This guarantees that the queries won't get stale or out-of-date data.

  • we could also run mutations in parallel in this case. If the user wants to run some mutations in sequence, those mutations should be part of the same graphql query.

@helfer
Copy link
Contributor

helfer commented Jan 21, 2017

@DxCx I agree with what @glasser and others said: Batching doesn't guarantee anything about execution order, so all the queries should be executed in parallel by default. We do not need to have any special ordering inside the batch either, because if the queries or mutations were submitted independently, there would be no guarantee about the order of execution either.

@DxCx
Copy link
Contributor

DxCx commented Jan 23, 2017

Well actually inside graphql engine you can see that queries indeed run in parallel, however mutations runs in serial, because they can effect each other.
However, if you both think we should ignore that, then np, simply run in parallel it is..

@NeoPhi
Copy link
Contributor

NeoPhi commented Jan 23, 2017

@DxCx I see the GraphQL specification is clear about multiple mutations running serially when defined within a single document. Since batching involves multiple documents, I view as akin to multiple requests which have no semantics for the order they will be resolved in.

@DxCx DxCx closed this as completed in #273 Jan 24, 2017
DxCx added a commit that referenced this issue Jan 24, 2017
Run batching queries in parallel (#176)
@jamesreggio
Copy link

jamesreggio commented Apr 2, 2017

I'm a little skeptical that parallel execution is a one-size-fits-all solution for batched mutations.

We have a simple app that has a like and unlike mutation. We're using Apollo Client and Server with batching, and the client implementation of each mutation has an optimistic response that assumes the mutation will succeed. Relatively simple.

A problem emerges if a user rapidly taps the like/unlike button. If this happens too quickly, a batch will be sent to the server with operations that look like this:

[
  `mutation like(id: "foo")`,
  `mutation unlike(id: "foo")`,
  `mutation like(id: "foo")`,
  `mutation unlike(id: "foo")`,
  ...
]

The client preserves the correct order, and if these were executed serially by the server, there would be no issue. However, as it is, all mutations are now issued in parallel, and there's a race to create and clear the 'liked' state. The result is that some mutations error, and the final state is indeterminate.

I respect the proposition that there should be no ordering guarantee between separate network requests containing separate documents. However, if I want to enforce a strict ordering of these mutations, I'm at a loss for how to accomplish this. Should I force the client to complete a full server round-trip for each mutation in serial? Isn't that the whole point of batching?

I suspect that the correct serve behavior here looks a little more sophisticated: perhaps execution should fall back to serial if there is a mutation amongst the requests?

cc: @stevekrenzel

@mdebbar
Copy link

mdebbar commented Apr 2, 2017

@jamesreggio IMHO, the batching network interface should provide no extra guarantees over the simple network interface, and should behave the same. The problem of concurrent mutations is inherent to client-server applications and not specific to GraphQL or Apollo.

I agree that it would be nice if Apollo solved this problem, but I don't believe that's the goal of Apollo. I would love for Apollo to stay as un-opinionated as possible.

That being said, this is not a difficult problem to solve outside of Apollo. Actually, it has been solved since the days of jQuery and HTML forms. Simply disable the like button while the request is inflight. You can have an optimistic update that turns the "Like" to "Unlike" but keep it disabled until the mutation succeeds. Now the question is: does Apollo provide an easy way for the component to get the status of the mutation? cc @helfer @stubailo

@jchavarri
Copy link

jchavarri commented Aug 15, 2017

Imo Apollo should at least allow to optionally execute mutations serially. The problem falls in the GraphQL realm, from what I understand on the GraphQL.org doc site:

While query fields are executed in parallel, mutation fields run in series, one after the other.

It seems #273 introduced parallelism for queries as well as for mutations, which can potentially lead to the race conditions defined above by @jamesreggio. We are already using Apollo extensively for querying, and considering using it for mutations, but unfortunately not being able to guarantee mutation ordering will make that harder.

@helfer @stubailo Is this something that we could solve (at least on a per case basis, not necessarily as part of the library) with a custom Apollo Link in the upcoming version 2.0?

@stubailo
Copy link
Contributor

not being able to guarantee mutation ordering will make that harder.

It's not really possible to ensure mutation ordering over HTTP in the general case. You might want to use a websocket if you really need those to happen in order.

The batching isn't really relevant for this particular case, because it's only in a 10ms interval - it's unlikely that someone will be able to click a button multiple times in one frame. Queueing up mutations over a longer time could be interesting though.

Today it's already possible to send all of your operations including queries and mutations over the websocket transport, which will give you the behavior you're looking for!

Would you mind opening a new issue about this perhaps if it still feels like a problem?

@jchavarri
Copy link

Thanks for the quick answer @stubailo ! Using a websocket transport layer seems like it would solve the race condition issues. I will explore that option further and open a new issue if needed once i have more data 👍

@jonaskello
Copy link

jonaskello commented Jan 15, 2018

I'm working on a link that queues up mutations over longer time (until the user clicks a save button) as mentioned by @stubailo above. Doing this I have run in to the problem that when I send all queued mutations, they get correctly batched using apollo-link-http (since they are all run within short time, although I know this link does not guarantee all being in one batch). But when they reach the server they get executed in parallel so the final result is unpredictable. If I send 3 mutations I sometimes end up with the 2nd being the last one to execute.

My link is similar to apollo-link-queue by @helfer so I guess that link will have similar problems.

Other than moving to websocket transport, is there any ideas how to guarantee ordering of batched mutations?

@jonaskello
Copy link

I thought about having a mutation that accepts an array of mutations. But it seems graphql is not able to express the return results...

type Mutation {
    executeBatch(mutations: [Mutation!]!): XXX!
}

@jonas-app
Copy link
Contributor

@jonaskello I have the exact same issue atm.
In our case we can dedup the mutations but a guaranteed execution order would be really good.
Do you have any additional helpful information now?
helfer/apollo-link-queue#11

@jonaskello
Copy link

@jonas-arkulpa I ended up building my own queue mechanism outside apollo. Basically for each operation the user makes before saving, I queue a definition of the corresponding mutation fields to execute. When new fields are added the queue is deduplicated. When the user hits save I build a single mutation by combining all the queued fields in order. This works because graphql specifies that mutation fields should be executed in order within a mutation. Since a mutation cannot have duplicated field names I generate aliases for all the fields in this mutation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests