Significantly higher memory usage in gateway 2.0.5 and 2.1.0-alpha #2085

lennyburdette · 2022-08-22T19:51:54Z

Reported by a customer using a large Fed1 supergraph (shared in Slack).

Node.js heap total at gateway server start:
0.52.1 --> 126 MB
2.0.5 --> 1489 MB
2.1.0-alpha.4 --> 1575 MB

This is just on startup before taking any requests, so it's not caused by large query plans.

The text was updated successfully, but these errors were encountered:

iuliiasobolevska · 2022-08-23T19:33:10Z

Some more numbers from the same setup/investigation:

node version: v14.20.0
starting express app with ApolloServer and ApolloGateway pulled from diff @apollo/gateway package versions
measuring consumed memory only on startup (no traffic being served, query planner not involved)
fed1-variant and fed2-variant have the same schema, the only diff is the 1st being composed using Federation 1, 2nd - Federation 2

@apollo/gateway version	@apollo/gateway major version	supergraphSdl variant	Federation Version	Heap total (MB)	Heap used (MB)	RSS (MB)
0.51.0	0.5	fed1-variant	1	114	81	160
0.52.0	0.5	fed1-variant	1	113	81	160
0.52.1	0.5	fed1-variant	1	113	81	160
2.0.5	2	fed1-variant	1	1391	1367	1480
2.0.5	2	fed2-variant	2	211	189	279
2.0.5 (2nd run)	2	fed2-variant	2	211	189	275
2.1.0-alpha.1	2	fed1-variant	1	1525	1496	1622
2.1.0-alpha.1	2	fed2-variant	2	218	185	290
2.1.0-alpha.2	2	fed1-variant	1	1481	1457	1578
2.1.0-alpha.2	2	fed2-variant	2	219	186	291
2.1.0-alpha.3	2	fed1-variant	1	1490	1461	1588
2.1.0-alpha.3	2	fed2-variant	2	222	189	288
2.1.0-alpha.4	2	fed1-variant	1	1511	1482	1612
2.1.0-alpha.4	2	fed2-variant	2	219	185	284

Fed1 supergraph lacks information on value types regarding which subgraphs defines them. We use to brute-force add all value types to all extract subgraphs, on the idea that if some extracted subgraphs have a few unused types, it's useless but has no functional impact. Unfortunately, in some special cases (lots of subgraphs and value types), those useless types can lead to a significant increase in memory consumptions. This patch instead look at type reachability within subgraphs to avoid including those useless value types, and thus lower the memory consumptions. Note that fed2 supergraphs are not affected by this problem has they have all the information needed to only extract types in the proper subgraphs. Fixes apollographql#2085

pcmanus · 2022-08-24T17:18:52Z

There is probably 2 things to be remarked on those numbers:

something weird is happening with fed1 supergraphs that make them use roughly an order of magnitude more memory.
fed2 in general use more memory than fed1.

The 1st point is obviously the most problematic given the number involved. The reason this happen is due to:

the example here having a large number of subgraphs, and each subgraphs define a fair amount of value types.
fed1 supergraphs not preserving information for value type on which subgraph originally defined them (contrarily to entity types where the information is available and fed2 supergraphs where the information is available for all types).
the fact that under the hood, the query planner rebuild a version of the subgaphs based on the supergraphs.

In doing point 3, and due to point 2, the code ended up adding all value types to all (reconstructed) subgraphs. Which is fine from a correction POV (the resulting subgraph just have some unreachable types in many cases), but in the case of point 1, this could all subgraph being quite big and with lots of subgraph ends up consuming lots of memory.

Anyway, pushed a fix in #2089 that does some reachability checks on types to avoid this problem and this makes the memory consumption for fed1 supergraphs to be roughly the same than when recomposing into fed2 supergraphs.

The 2nd point mentioned above is the fact that fed2 in general use memory than fed1. That part is really more due to the fact that fed1 and fed2 query planner are very different. In particular, as said above, fed2 does re-create the original subgraph to some extent and those end up consuming a fair amount of memory when there is lots of subgraphs.

Nonetheless, to be honest, I hadn't looked at memory consumption at all with the fed2 query planner, and look a bit closer for this issue, I did noticed a few simple opportunity to reduce consumption so include those in the PR (on #2089).

Fwiw, my own measurement on a test relatively similar to the one above give me:

variant	Total (MB)	Used (MB)	RSS (MB)
Fed1 supergraph unpatched	1574	1273	1645
Fed1 supergraph patched	126	61	193
Fed2 supergraph unpatched	161	107	233
Fed2 supergraph patched	116	66	180

which both shows that the crazy fed1-supergraph case is fixed and some gains with the patch compared to current main. I haven't compared to the fed1 code (gateway 0.51) because that wasn't easy with the test I was using for this, but I'm sure fed2 still use more memory than fed1 even with this patch, the difference is just hopefully a bit smaller. I'm sure we can further decrease consumption too, but it will require more work.

Fed1 supergraph lacks information on value types regarding which subgraphs defines them. We use to brute-force add all value types to all extract subgraphs, on the idea that if some extracted subgraphs have a few unused types, it's useless but has no functional impact. Unfortunately, in some special cases (lots of subgraphs and value types), those useless types can lead to a significant increase in memory consumptions. This patch instead look at type reachability within subgraphs to avoid including those useless value types, and thus lower the memory consumptions. Note that fed2 supergraphs are not affected by this problem has they have all the information needed to only extract types in the proper subgraphs. Fixes apollographql#2085

pcmanus self-assigned this Aug 24, 2022

pcmanus mentioned this issue Aug 24, 2022

Fix high memory usage when extracting subgraphs for some fed1 supergraphs #2089

Merged

pcmanus closed this as completed in a593ac8 Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significantly higher memory usage in gateway 2.0.5 and 2.1.0-alpha #2085

Significantly higher memory usage in gateway 2.0.5 and 2.1.0-alpha #2085

lennyburdette commented Aug 22, 2022

iuliiasobolevska commented Aug 23, 2022 •

edited

Loading

pcmanus commented Aug 24, 2022

Significantly higher memory usage in gateway 2.0.5 and 2.1.0-alpha #2085

Significantly higher memory usage in gateway 2.0.5 and 2.1.0-alpha #2085

Comments

lennyburdette commented Aug 22, 2022

iuliiasobolevska commented Aug 23, 2022 • edited Loading

pcmanus commented Aug 24, 2022

iuliiasobolevska commented Aug 23, 2022 •

edited

Loading