-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defer and cache calculation of Engine reporting signature. #2670
Conversation
This moves the Apollo Engine Reporting signature calculation (which the Apollo Cloud uses to aggregate similar operations) to perform asynchronously, after the response has been returned to the client, and also caches the signature for use by additional operations which match the same `queryHash` (which is an exact SHA-256 hex digest of the operation body received from the client). While the signature calculation is relatively quick on small operations, with sustained load and more complex operations, this recurring calculation can be more costly. As a bit of validation to the success of this change, on a very basic performance benchmark, using a schema with 1000 `String` fields (i.e. a `type Query` with `Field_1: String` through `Field_1000: String`) and an incoming operation which selects from all 1000 of those fields (i.e `Field_1`), this showed quite an improvement: ``` ┌───────────────┬───────────────┬───────────────┐ │ │ Before │ After │ ├───────────────┼───────────────┼───────────────┤ │ Percentile, % │ Resp. Time, s │ Resp. Time, s │ ├───────────────┼───────────────┼───────────────┤ │ 0.0 │ 0.076 │ 0.024 │ │ 50.0 │ 0.342 │ 0.14 │ │ 90.0 │ 0.388 │ 0.161 │ │ 95.0 │ 0.41 │ 0.164 │ │ 99.0 │ 0.444 │ 0.17 │ │ 99.9 │ 0.486 │ 0.177 │ │ 100.0 │ 0.487 │ 0.196 │ └───────────────┴───────────────┴───────────────┘ ``` Of course, this is a relatively simple example and still includes actual GraphQL execution, but the win factor is certainly there. Other tests with more dynamic fields and a higher cardinality of unique operations also showed improvement, though extremely high cardinality of complex operations (which have more expensive execution characteristics) certainly made the _win_ factor less pronounced. It's not surprising that this calculation is a bit expensive since it requires a number of normalization steps. Luckily, we can let this all happen after the request is sent to the client and it's highly cacheable. By default, we'll use a relatively small cache which should store a large number of operations, and to avoid introducing configuration surface area right now without much evidence as to the need, we'll simply log a message periodically if we're ejecting operations.
b135a17
to
eb3bc7b
Compare
- apollo-cache-control@0.7.0-alpha.0 - apollo-engine-reporting@1.2.0-alpha.0 - apollo-server-azure-functions@2.6.0-alpha.0 - apollo-server-cloud-functions@2.6.0-alpha.0 - apollo-server-cloudflare@2.6.0-alpha.0 - apollo-server-core@2.6.0-alpha.0 - apollo-server-express@2.6.0-alpha.0 - apollo-server-fastify@2.6.0-alpha.0 - apollo-server-hapi@2.6.0-alpha.0 - apollo-server-integration-testsuite@2.6.0-alpha.0 - apollo-server-koa@2.6.0-alpha.0 - apollo-server-lambda@2.6.0-alpha.0 - apollo-server-micro@2.6.0-alpha.0 - apollo-server-plugin-base@0.5.0-alpha.0 - apollo-server-plugin-response-cache@0.2.0-alpha.0 - apollo-server-testing@2.6.0-alpha.0 - apollo-server@2.6.0-alpha.0 - apollo-tracing@0.7.0-alpha.0 - graphql-extensions@0.7.0-alpha.0
@@ -226,7 +301,7 @@ export class EngineReportingAgent<TContext = any> { | |||
this.reportSize >= | |||
(this.options.maxUncompressedReportSize || 4 * 1024 * 1024) | |||
) { | |||
this.sendReportAndReportErrors(); | |||
await this.sendReportAndReportErrors(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of this await
shouldn't change execution dynamics (and I haven't changed what sendReportAndReportErrors
does) since addTrace
itself is never awaited, but the nested un-await
ed Promise
seemed a bit too wily for my liking.
This has been published into an
|
}: { | ||
queryHash: string; | ||
operationName: string; | ||
documentAST?: DocumentNode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is always either a document AST or a string, should this be a single DocumentNode | string
argument?
// calculating it asynchronously. The `addTrace` method will | ||
// `await` the `signature` if it's a Promise, prior to putting it | ||
// on the stack of traces to deliver to the cloud. | ||
const cachedSignature = await this.signatureCache.get(queryHash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the signature calculated per document or per operation? If the latter, we should probably also use operationName
as part of the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great cache-tch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed via 5e7084a. Thank you!
lastSignatureCacheWarn = new Date(); | ||
console.warn( | ||
[ | ||
'This server is processing a high number of unique operations. ', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is a nice touch!
Per astute feedback from @martijnwalraven!
5e7084a
to
b55da84
Compare
Also, add a comment indicating the intent. The `logger` in this case will be a request-specific logger when available or the general "Engine" logger otherwise. Related: #2670
Also, add a comment indicating the intent. The `logger` in this case will be a request-specific logger when available or the general "Engine" logger otherwise. Related: #2670
Also, add a comment indicating the intent. The `logger` in this case will be a request-specific logger when available or the general "Engine" logger otherwise. Also adjust recently introduced test to accurately return a `Promise` in its mock. Related: #2670
This moves the Apollo Engine Reporting signature calculation (which the
Apollo Cloud uses to aggregate similar operations) to perform
asynchronously, after the response has been returned to the client, and also
caches the signature for use by additional operations which match the same
queryHash
(which is an exact SHA-256 hex digest of the operation bodyreceived from the client).
While the signature calculation is relatively quick on small operations, with
sustained load and more complex operations, this recurring calculation can
be more costly.
As a bit of validation to the success of this change, on a very basic
performance benchmark, using a schema with 1000
String
fields (i.e. atype Query
withField_1: String
throughField_1000: String
) and anincoming operation which selects from all 1000 of those fields (i.e
Field_1
), this showed quite an improvement:Of course, this is a relatively simple example and still includes actual
GraphQL execution, but the win factor is certainly there. Other tests with
more dynamic fields and a higher cardinality of unique operations also
showed improvement, though extremely high cardinality of complex operations
(which have more expensive execution characteristics) certainly made the
win factor less pronounced.
It's not surprising that this calculation is a bit expensive since it
requires a number of normalization steps. Luckily, we can let this all
happen after the request is sent to the client and it's highly cacheable.
By default, we'll use a relatively small cache which should store a large
number of operations, and to avoid introducing configuration surface area
right now without much evidence as to the need, we'll simply log a message
periodically if we're ejecting operations.