Skip to content

Commit

Permalink
[Spike] Query Executor in Rust (#135)
Browse files Browse the repository at this point in the history
Co-authored-by: Ran Magen <ran@apollographql.com>

![spike](https://media.giphy.com/media/XuSS2uqOPf83K/giphy.gif)

## What this is
This is a spike of a porting the JavaScript [query executor](https://github.com/apollographql/apollo-server/blob/60d974240769aac3e1c75ff968657b2d3e60e9b6/packages/apollo-gateway/src/executeQueryPlan.ts) which takes the query plan generated by the new Rust query planner and does a simple http fetch to the federated service and merges the data back together. This is **not something we should intend to merge**, rather it was a spike to see where things will be tricky as we work to do the port genuinely and where we could slice things for smaller deliverables.

## What do I do with this?
What I would to get out of this exercise is a shared understanding of some of the footguns which may lie ahead, what a common data structure needs to look like for the executor, and some of the challenges (and opportunities) in moving from a single threaded environment to a multi-threaded one. So

1. Don't review this for code quality / idiomatic Rust code. Its not that I'm not looking to improve my Rust, but rather this is a sketch of an idea not a real thing.
2.  Do read through the [main executor](http/src/main.rs). Below is a more detailed guide of what is interesting
3. Do ask questions about the data structures, general shape of the problem.
4. Don't focus on the http parts (tide, parsing of the response, etc). This was just for local testing.

## Detailed Review Notes

- The JS query executor relies heavily on two things to be successful. First, the single threaded nature of JavaScript and second, the ease of interior mutability within that thread. In particular, the `FlattenNode` and the `FetchNode` both mutate the result sink in a way that works if there is only one thread and a shared reference to the sink. For example, [flattenResultsAtPath](https://github.com/apollographql/apollo-server/blob/60d974240769aac3e1c75ff968657b2d3e60e9b6/packages/apollo-gateway/src/executeQueryPlan.ts#L165) mutates the results while creating a zip for subsequent nodes to work on. This is really hard / not what we should do in a world where you have multiple threads and interior mutability isn't really the way to go.
- `RwLock` is incredible. In the executor we need to be able to read from a shared reference at multiple times without block access to the data but block on writes. This is important because prior to sending a fetch we need to read the results to generate variables for downstream requests and we don't want to block on this for every request that is being formed. However, we do need to block on writes so that reads get the updated data correctly. A `Mutex` blocks on both but `RwLock` gets us just what we wanted. You can see where we [create the lock](main...jbaxleyiii/gateway-spike#diff-1d121b833f22f1fd43615c8db9fa578bR78) and an example where we [write back to it](main...jbaxleyiii/gateway-spike#diff-1d121b833f22f1fd43615c8db9fa578bR293)
- deep merging of json isn't easy. I found a library but am not 100% sure it actually does what we want. It may or may not be why  parallel flatten nodes [don't work](main...jbaxleyiii/gateway-spike#diff-1d121b833f22f1fd43615c8db9fa578bR170-R175)
- Instead of manually forking threads, I am relying on [futures to manage promises](main...jbaxleyiii/gateway-spike#diff-1d121b833f22f1fd43615c8db9fa578bR109-R117) for me. I'm not sure if this is actually doing what I want it to do but it behaves like `Promise.allSettled` which is similar to the existing one. Maybe we should go more all in on threads here, not sure.
- We are going to need to do [some cloning](main...jbaxleyiii/gateway-spike#diff-1d121b833f22f1fd43615c8db9fa578bR215-R227) when it comes time to send requests. Mainly of the variables because these could be used across multiple requests to the executeNode can't take ownership of them. I don't think this is a problem but wanted to flag it here for discussion.
- [Some idioms in JS are just not possible in Rust.](https://github.com/apollographql/apollo-server/blob/main/packages/apollo-gateway/src/executeQueryPlan.ts#L206). This is on the whole a good thing and it has already helped to improve this implementation but don't count on a direct port being the same shape in each part. 
- We will need to figure out lifetimes / ownership of the schema within threads so it can be created outside of a request. I [didn't worry about that right now](main...jbaxleyiii/gateway-spike#diff-1d121b833f22f1fd43615c8db9fa578bR24-R50) but that is definitely something we will need to do

## Suggested next steps
I think after an initial read through, a pairing session could be a helpful way to talk through this. Overall I'm quite thrilled at how quickly this came together and it makes me even more excited about when we tackle this for real. Let me know if you have any questions! 👍 

## Additional changes
#186 was merged into this one. Its description:

This PR was developed on top of #135

First and foremost, any changes to the existing _executor.rs_ and related code written in #135 are mostly there as **suggestions**. @jbaxleyiii lmk if you want to pair on these or feel free to take / reject parts. I didn't dig too deep into the implementation of the executor yet, I'm going to start doing that this week.

Second: This PR separates out the Stargate library from the binary, for a few reasons:
1. Separating out the business logic from the wrapping server makes for easier testing.
2. The library doesn't choose an async runtime, nor does it assume any specific implementation of an HTTP server.
3. Use actix-web instead of tide; we've discussed these options and feel more comfortable going with a more mature server implementation, as well as one that uses tokio.

@jbaxleyiii If these changes look good, we can merge them into #135 and we can continue from there. If you want to do it differently, lmk!

- [ ] Address `todo!` / `unimplemented!` / `FIXME` in the code
- [ ] Tests for the executor (ideally these should be unit tests on the Stargate library, for coverage)
- [ ] Integration tests. I was thinking we can set up acephei services locally and then the gateway would have `localhost` references to those on different ports, integration tests would be entirely localhost.
- [ ] Add tracing/metrics middleware -- I believe there are some existing crates.
- [ ] Write more log lines so that they're integrated with the logger of the http server (use `env_logger` ?)
  • Loading branch information
James Baxley committed Oct 1, 2020
1 parent ea71ef5 commit 065a672
Show file tree
Hide file tree
Showing 409 changed files with 4,251 additions and 1,314 deletions.
2 changes: 2 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions .idea/modules/graphql-parser.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 5 additions & 5 deletions .idea/modules/query-planner-wasm.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 7 additions & 7 deletions .idea/modules/query-planner.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 11 additions & 0 deletions .idea/modules/stargate-lib.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions .idea/modules/stargate.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 065a672

Please sign in to comment.