Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly higher memory usage in gateway 2.0.5 and 2.1.0-alpha #2085

Closed
lennyburdette opened this issue Aug 22, 2022 · 2 comments
Closed
Assignees

Comments

@lennyburdette
Copy link
Contributor

Reported by a customer using a large Fed1 supergraph (shared in Slack).

Node.js heap total at gateway server start:
0.52.1 --> 126 MB
2.0.5 --> 1489 MB
2.1.0-alpha.4 --> 1575 MB

This is just on startup before taking any requests, so it's not caused by large query plans.

@iuliiasobolevska
Copy link

iuliiasobolevska commented Aug 23, 2022

Some more numbers from the same setup/investigation:

  • node version: v14.20.0
  • starting express app with ApolloServer and ApolloGateway pulled from diff @apollo/gateway package versions
  • measuring consumed memory only on startup (no traffic being served, query planner not involved)
  • fed1-variant and fed2-variant have the same schema, the only diff is the 1st being composed using Federation 1, 2nd - Federation 2
@apollo/gateway version @apollo/gateway major version supergraphSdl variant Federation Version Heap total (MB) Heap used (MB) RSS (MB)
0.51.0 0.5 fed1-variant 1 114 81 160
0.52.0 0.5 fed1-variant 1 113 81 160
0.52.1 0.5 fed1-variant 1 113 81 160
2.0.5 2 fed1-variant 1 1391 1367 1480
2.0.5 2 fed2-variant 2 211 189 279
2.0.5 (2nd run) 2 fed2-variant 2 211 189 275
2.1.0-alpha.1 2 fed1-variant 1 1525 1496 1622
2.1.0-alpha.1 2 fed2-variant 2 218 185 290
2.1.0-alpha.2 2 fed1-variant 1 1481 1457 1578
2.1.0-alpha.2 2 fed2-variant 2 219 186 291
2.1.0-alpha.3 2 fed1-variant 1 1490 1461 1588
2.1.0-alpha.3 2 fed2-variant 2 222 189 288
2.1.0-alpha.4 2 fed1-variant 1 1511 1482 1612
2.1.0-alpha.4 2 fed2-variant 2 219 185 284

@pcmanus pcmanus self-assigned this Aug 24, 2022
pcmanus pushed a commit to pcmanus/federation that referenced this issue Aug 24, 2022
Fed1 supergraph lacks information on value types regarding which
subgraphs defines them. We use to brute-force add all value types to all
extract subgraphs, on the idea that if some extracted subgraphs have a
few unused types, it's useless but has no functional impact.
Unfortunately, in some special cases (lots of subgraphs and value
types), those useless types can lead to a significant increase in
memory consumptions.

This patch instead look at type reachability within subgraphs to avoid
including those useless value types, and thus lower the memory
consumptions.

Note that fed2 supergraphs are not affected by this problem has they
have all the information needed to only extract types in the proper
subgraphs.

Fixes apollographql#2085
@pcmanus
Copy link
Contributor

pcmanus commented Aug 24, 2022

There is probably 2 things to be remarked on those numbers:

  1. something weird is happening with fed1 supergraphs that make them use roughly an order of magnitude more memory.
  2. fed2 in general use more memory than fed1.

The 1st point is obviously the most problematic given the number involved. The reason this happen is due to:

  1. the example here having a large number of subgraphs, and each subgraphs define a fair amount of value types.
  2. fed1 supergraphs not preserving information for value type on which subgraph originally defined them (contrarily to entity types where the information is available and fed2 supergraphs where the information is available for all types).
  3. the fact that under the hood, the query planner rebuild a version of the subgaphs based on the supergraphs.

In doing point 3, and due to point 2, the code ended up adding all value types to all (reconstructed) subgraphs. Which is fine from a correction POV (the resulting subgraph just have some unreachable types in many cases), but in the case of point 1, this could all subgraph being quite big and with lots of subgraph ends up consuming lots of memory.

Anyway, pushed a fix in #2089 that does some reachability checks on types to avoid this problem and this makes the memory consumption for fed1 supergraphs to be roughly the same than when recomposing into fed2 supergraphs.

The 2nd point mentioned above is the fact that fed2 in general use memory than fed1. That part is really more due to the fact that fed1 and fed2 query planner are very different. In particular, as said above, fed2 does re-create the original subgraph to some extent and those end up consuming a fair amount of memory when there is lots of subgraphs.

Nonetheless, to be honest, I hadn't looked at memory consumption at all with the fed2 query planner, and look a bit closer for this issue, I did noticed a few simple opportunity to reduce consumption so include those in the PR (on #2089).

Fwiw, my own measurement on a test relatively similar to the one above give me:

variant Total (MB) Used (MB) RSS (MB)
Fed1 supergraph unpatched 1574 1273 1645
Fed1 supergraph patched 126 61 193
Fed2 supergraph unpatched 161 107 233
Fed2 supergraph patched 116 66 180

which both shows that the crazy fed1-supergraph case is fixed and some gains with the patch compared to current main. I haven't compared to the fed1 code (gateway 0.51) because that wasn't easy with the test I was using for this, but I'm sure fed2 still use more memory than fed1 even with this patch, the difference is just hopefully a bit smaller. I'm sure we can further decrease consumption too, but it will require more work.

pcmanus pushed a commit to pcmanus/federation that referenced this issue Aug 26, 2022
Fed1 supergraph lacks information on value types regarding which
subgraphs defines them. We use to brute-force add all value types to all
extract subgraphs, on the idea that if some extracted subgraphs have a
few unused types, it's useless but has no functional impact.
Unfortunately, in some special cases (lots of subgraphs and value
types), those useless types can lead to a significant increase in
memory consumptions.

This patch instead look at type reachability within subgraphs to avoid
including those useless value types, and thus lower the memory
consumptions.

Note that fed2 supergraphs are not affected by this problem has they
have all the information needed to only extract types in the proper
subgraphs.

Fixes apollographql#2085
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants