Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stargate suggestions to #135 + actix-web support #186

Merged
merged 14 commits into from
Sep 30, 2020

Conversation

Enrico2
Copy link
Contributor

@Enrico2 Enrico2 commented Sep 25, 2020

This PR was developed on top of #135

First and foremost, any changes to the existing executor.rs and related code written in #135 are mostly there as suggestions. @jbaxleyiii lmk if you want to pair on these or feel free to take / reject parts. I didn't dig too deep into the implementation of the executor yet, I'm going to start doing that this week.

Second: This PR separates out the Stargate library from the binary, for a few reasons:

  1. Separating out the business logic from the wrapping server makes for easier testing.
  2. The library doesn't choose an async runtime, nor does it assume any specific implementation of an HTTP server.
  3. Use actix-web instead of tide; we've discussed these options and feel more comfortable going with a more mature server implementation, as well as one that uses tokio.

@jbaxleyiii If these changes look good, we can merge them into #135 and we can continue from there. If you want to do it differently, lmk!

  • Address todo! / unimplemented! / FIXME in the code
  • Tests for the executor (ideally these should be unit tests on the Stargate library, for coverage)
  • Integration tests. I was thinking we can set up acephei services locally and then the gateway would have localhost references to those on different ports, integration tests would be entirely localhost.
  • Add tracing/metrics middleware -- I believe there are some existing crates.
  • Write more log lines so that they're integrated with the logger of the http server (use env_logger ?)

@Enrico2 Enrico2 changed the base branch from main to jbaxleyiii/gateway-spike September 25, 2020 20:38
@Enrico2 Enrico2 force-pushed the ran/jbaxleyiii/gateway-spike branch 2 times, most recently from e2bc774 to f7fc123 Compare September 25, 2020 20:53
@Enrico2 Enrico2 marked this pull request as ready for review September 28, 2020 19:55
@Enrico2 Enrico2 changed the title Stargate Stargate suggestions to #135 + actix-web support Sep 28, 2020
Copy link
Member

@trevor-scheer trevor-scheer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's problematic or blocking, but it seems strange to have graphql-parser "belong" to stargate. I'd expect we'll be using this crate in all kinds of tooling and it seems pretty standalone.

@@ -1,7 +1,6 @@
[package]
name = "query-planner-wasm" # this ends up being the name in the published npm package.
version = "0.0.0-DO.NOT.CHANGE" # The version is managed in the package.json, by Lerna.
private = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional? I suspect @abernix could speak to its existence. This seems likely to be a relic of package.json's private key - maybe it's reflected in a generated package.json?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaict this had no affect, and in fact caused a warning that said there's an unknown key in the file.
Also, for future reference, the only keys that get copied to the package.json that gets generated by wasm-pack are here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, all good to remove this; this doesn't appear to do what I had thought it would. I was trying to prevent this package from getting accidentally published to Crates.io. Though I think that what I had meant to do was probably to use the publish = false option.

#[structopt(long, parse(from_os_str))]
pub manifest: PathBuf,

/// Where to write the output: to `stdout` or `file`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copypasta?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thanks!

Comment on lines +23 to +29
let planner = QueryPlanner::new(schema);
let service_list = get_service_list(&planner.schema);
Stargate {
planner,
service_list,
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can get_service_list be a fn that lives on a QueryPlanner instance? Then Stargate would only need a planner passed in to its new fn, which could then get the service_list itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can, I tried that at some point. I think I landed on this because it makes sense for the service list to be a property of Stargate. It felt a little awkward to get that from the planner in terms of ownership.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I remember, in order to do that, query-planner (which is a different crate) would need to either be aware of ServiceDefinition which comes from stargate-lib, and so that definition would have to move, and that has cascading effects.

@Enrico2 Enrico2 merged this pull request into jbaxleyiii/gateway-spike Sep 30, 2020
@Enrico2 Enrico2 deleted the ran/jbaxleyiii/gateway-spike branch September 30, 2020 21:08
@Enrico2 Enrico2 mentioned this pull request Sep 30, 2020
5 tasks
@Enrico2 Enrico2 mentioned this pull request Sep 30, 2020
5 tasks
Enrico2 pushed a commit that referenced this pull request Sep 30, 2020
This PR was developed on top of #135

First and foremost, any changes to the existing _executor.rs_ and related code written in #135 are mostly there as **suggestions**. @jbaxleyiii lmk if you want to pair on these or feel free to take / reject parts. I didn't dig too deep into the implementation of the executor yet, I'm going to start doing that this week.

Second: This PR separates out the Stargate library from the binary, for a few reasons:
1. Separating out the business logic from the wrapping server makes for easier testing.
2. The library doesn't choose an async runtime, nor does it assume any specific implementation of an HTTP server.
3. Use actix-web instead of tide; we've discussed these options and feel more comfortable going with a more mature server implementation, as well as one that uses tokio.

@jbaxleyiii If these changes look good, we can merge them into #135 and we can continue from there. If you want to do it differently, lmk!

- [ ] Address `todo!` / `unimplemented!` / `FIXME` in the code
- [ ] Tests for the executor (ideally these should be unit tests on the Stargate library, for coverage)
- [ ] Integration tests. I was thinking we can set up acephei services locally and then the gateway would have `localhost` references to those on different ports, integration tests would be entirely localhost.
- [ ] Add tracing/metrics middleware -- I believe there are some existing crates.
- [ ] Write more log lines so that they're integrated with the logger of the http server (use `env_logger` ?)
Enrico2 pushed a commit that referenced this pull request Oct 1, 2020
Co-authored-by: Ran Magen <ran@apollographql.com>

![spike](https://media.giphy.com/media/XuSS2uqOPf83K/giphy.gif)

## What this is
This is a spike of a porting the JavaScript [query executor](https://github.com/apollographql/apollo-server/blob/60d974240769aac3e1c75ff968657b2d3e60e9b6/packages/apollo-gateway/src/executeQueryPlan.ts) which takes the query plan generated by the new Rust query planner and does a simple http fetch to the federated service and merges the data back together. This is **not something we should intend to merge**, rather it was a spike to see where things will be tricky as we work to do the port genuinely and where we could slice things for smaller deliverables.

## What do I do with this?
What I would to get out of this exercise is a shared understanding of some of the footguns which may lie ahead, what a common data structure needs to look like for the executor, and some of the challenges (and opportunities) in moving from a single threaded environment to a multi-threaded one. So

1. Don't review this for code quality / idiomatic Rust code. Its not that I'm not looking to improve my Rust, but rather this is a sketch of an idea not a real thing.
2.  Do read through the [main executor](http/src/main.rs). Below is a more detailed guide of what is interesting
3. Do ask questions about the data structures, general shape of the problem.
4. Don't focus on the http parts (tide, parsing of the response, etc). This was just for local testing.

## Detailed Review Notes

- The JS query executor relies heavily on two things to be successful. First, the single threaded nature of JavaScript and second, the ease of interior mutability within that thread. In particular, the `FlattenNode` and the `FetchNode` both mutate the result sink in a way that works if there is only one thread and a shared reference to the sink. For example, [flattenResultsAtPath](https://github.com/apollographql/apollo-server/blob/60d974240769aac3e1c75ff968657b2d3e60e9b6/packages/apollo-gateway/src/executeQueryPlan.ts#L165) mutates the results while creating a zip for subsequent nodes to work on. This is really hard / not what we should do in a world where you have multiple threads and interior mutability isn't really the way to go.
- `RwLock` is incredible. In the executor we need to be able to read from a shared reference at multiple times without block access to the data but block on writes. This is important because prior to sending a fetch we need to read the results to generate variables for downstream requests and we don't want to block on this for every request that is being formed. However, we do need to block on writes so that reads get the updated data correctly. A `Mutex` blocks on both but `RwLock` gets us just what we wanted. You can see where we [create the lock](main...jbaxleyiii/gateway-spike#diff-1d121b833f22f1fd43615c8db9fa578bR78) and an example where we [write back to it](main...jbaxleyiii/gateway-spike#diff-1d121b833f22f1fd43615c8db9fa578bR293)
- deep merging of json isn't easy. I found a library but am not 100% sure it actually does what we want. It may or may not be why  parallel flatten nodes [don't work](main...jbaxleyiii/gateway-spike#diff-1d121b833f22f1fd43615c8db9fa578bR170-R175)
- Instead of manually forking threads, I am relying on [futures to manage promises](main...jbaxleyiii/gateway-spike#diff-1d121b833f22f1fd43615c8db9fa578bR109-R117) for me. I'm not sure if this is actually doing what I want it to do but it behaves like `Promise.allSettled` which is similar to the existing one. Maybe we should go more all in on threads here, not sure.
- We are going to need to do [some cloning](main...jbaxleyiii/gateway-spike#diff-1d121b833f22f1fd43615c8db9fa578bR215-R227) when it comes time to send requests. Mainly of the variables because these could be used across multiple requests to the executeNode can't take ownership of them. I don't think this is a problem but wanted to flag it here for discussion.
- [Some idioms in JS are just not possible in Rust.](https://github.com/apollographql/apollo-server/blob/main/packages/apollo-gateway/src/executeQueryPlan.ts#L206). This is on the whole a good thing and it has already helped to improve this implementation but don't count on a direct port being the same shape in each part. 
- We will need to figure out lifetimes / ownership of the schema within threads so it can be created outside of a request. I [didn't worry about that right now](main...jbaxleyiii/gateway-spike#diff-1d121b833f22f1fd43615c8db9fa578bR24-R50) but that is definitely something we will need to do

## Suggested next steps
I think after an initial read through, a pairing session could be a helpful way to talk through this. Overall I'm quite thrilled at how quickly this came together and it makes me even more excited about when we tackle this for real. Let me know if you have any questions! 👍 

## Additional changes
#186 was merged into this one. Its description:

This PR was developed on top of #135

First and foremost, any changes to the existing _executor.rs_ and related code written in #135 are mostly there as **suggestions**. @jbaxleyiii lmk if you want to pair on these or feel free to take / reject parts. I didn't dig too deep into the implementation of the executor yet, I'm going to start doing that this week.

Second: This PR separates out the Stargate library from the binary, for a few reasons:
1. Separating out the business logic from the wrapping server makes for easier testing.
2. The library doesn't choose an async runtime, nor does it assume any specific implementation of an HTTP server.
3. Use actix-web instead of tide; we've discussed these options and feel more comfortable going with a more mature server implementation, as well as one that uses tokio.

@jbaxleyiii If these changes look good, we can merge them into #135 and we can continue from there. If you want to do it differently, lmk!

- [ ] Address `todo!` / `unimplemented!` / `FIXME` in the code
- [ ] Tests for the executor (ideally these should be unit tests on the Stargate library, for coverage)
- [ ] Integration tests. I was thinking we can set up acephei services locally and then the gateway would have `localhost` references to those on different ports, integration tests would be entirely localhost.
- [ ] Add tracing/metrics middleware -- I believe there are some existing crates.
- [ ] Write more log lines so that they're integrated with the logger of the http server (use `env_logger` ?)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants