New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide Micrometer Metrics that capture aspects of GraphQL query latency. #52
Conversation
c95dce7
to
8787ac4
Compare
c4a93f7
to
af26e4f
Compare
graphql
endpoint latency, via Micrometer0a45cbc
to
95bb8a7
Compare
+1 for this support. Look forward to it :) |
|
||
private fun recordDataFetcherTime(timerSampler: Timer.Sample, tags: Tags, registry: MeterRegistry) { | ||
timerSampler.stop(registry, | ||
Timer.builder("gql.resolver.time").tags(tags).publishPercentileHistogram()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest striking publishPercentileHistogram
as it it can be turned on later by an application developer if they want histograms for this timer in application.properties with management.metrics.distribution.percentiles-histogram.gql=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had no idea, thanks for the pointer.
exception: Throwable?): Iterable<Tag> { | ||
|
||
return if (result.errors.isNotEmpty()) { | ||
InstrumentationTags.sanitizeErrorPathsForTags(result).map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to add the tags here vs in the instrumentation class as part of instrumentExecutionResult?
|
||
override fun instrumentExecutionResult(executionResult: ExecutionResult, parameters: InstrumentationExecutionParameters): CompletableFuture<ExecutionResult> { | ||
sanitizeErrorPathsForTags(executionResult).forEach { _ -> | ||
registry.counter("gql.error", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this counter? Timers always ship a count as well as other distribution statistics like max, sum, percentiles, etc. So if I understand this correctly, this data already is retrievable by filtering gql.query{outcome='ERROR'}
and looking at the count statistic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please review the latest code. In the latest implementation a qgl.error is emitted per error in the Grql Request. Per the spec a request can have multiple errors.
c16d862
to
2894c3f
Compare
There is plenty of documentation to write for this feature. In the meantime the MicrometerServletSmokeTest.kt offers a hint on the metrics, and the tags, according to different scenarios. |
f484baf
to
5143ca3
Compare
Mining for decent. Most likely will merge tomorrow and have a soft release of this feature since I'm planning on adapting our internal implementation to it. |
h2. Context *Work in Progress* h3. Provide metrics that capture the `graphql` endpoint latency. This PR introduces the `graphql-dgs-spring-boot-micrometer` module. It has an _auto-configuration_ that enables the capture of GraphQL query metrics, via [micrometer]. The `GraphQLMetricsInstrumentation` will provide the following metrics: | Name | Kind | Description | | ------------ | ------ | ----------- | | gql.query | Timer | Captures the time it takes to execute the query. | | gql.error | Counter| Captures number of GraphQL errors, note that one GraphQL request can have multiple.| | gql.resolver | Timer | Tagged with the resolver's GraphQL field via `gql.field`, it captures the resolver's execution time.| | gql.dataLoader | Timer | Captures the elapse time for a data loader invocation for batch of queries. This is useful if you want to find data loaders that might be responsible for poor query performance. | All metrics are tagged with the tags provided by the `DefaultGraphQLTagsProvider`. Developers can override the `DefaultGraphQLTagsProvider` by defining their own implementation of the `GraphQLTagsProvider` interface. The `DefaultGraphQLTagsProvider` defines the following tags. | Tag | Description | | ------------ | ----------- | | outcome | `"SUCCESS"` if the [ExecutionResult] has data, else an `"ERROR"`. | | gql.path | (optional) [ValidationError]'s `queryPath` or the [GraphQLError] `path`.[^1]| | gql.errorCode | (optional) The [ErrorType]'s name if the error is a [ValidationError], else it will attempt to extract its value from the Error's extensions via the `errorType` key.[^1]| | gql.errorDetail | (optional) it will attempt to extract its value from the GgraphQLError's extensions via the `errorDetail` key. For details on the cardinality of the value please see refer to the [typederror interface](https://netflix.github.io/dgs/error-handling/#the-typederror-interface). | | gql.loaderName | (specific to gql.Loader) Name of the data loader. | | gql.loaderBatchSize | (specific to gql.Loader) The number of queries executed in the batch. | [^1]: Remember that one request can have multiple errors. h3. Examples * Included as part of this PR we have the [MicrometerServletSmokeTest.kt] test that depicts the metrics emitted on particular scenarios. ---- [micrometer]: https://micrometer.io/ [ErrorType]: https://github.com/graphql-java/graphql-java/blob/master/src/main/java/graphql/ErrorType.java [ExecutionResult]: https://github.com/graphql-java/graphql-java/blob/master/src/main/java/graphql/ExecutionResult.java [GraphQLError]:https://github.com/graphql-java/graphql-java/blob/master/src/main/java/graphql/GraphQLError.java [ValidationError]: https://github.com/graphql-java/graphql-java/blob/master/src/main/java/graphql/validation/ValidationError.java [MicrometerServletSmokeTest.kt]: https://github.com/Netflix/dgs-framework/blob/2894c3fc96388f0e21e4954740a558dda6626f3a/graphql-dgs-spring-boot-micrometer/src/test/kotlin/com/netflix/graphql/dgs/metrics/micrometer/MicrometerServletSmokeTest.kt ---- h2. DGS Resolver Metrics Initially we had the a `dgs.resolver.timer` and a `dgs.resolver.counter`. These were removed in favor of a `dgs.resolver` timer. As mentioned in Micrometer's [Counters](https://micrometer.io/docs/concepts#_counters) documentation... > Never count something you can time with a Timer or summarize with a DistributionSummary! > Both Timer and DistributionSummary always publish a count of events in addition to other measurements. Flags to enable/disable features, Smoke Test metrics for DataLoaders
Provide metrics that capture the
graphql
endpoint latency.This PR introduces the
graphql-dgs-spring-boot-micrometer
module. It has an auto-configuration that enables the capture of GraphQL query metrics, via micrometer. TheGraphQLMetricsInstrumentation
will provide the following metrics:gql.field
, it captures the resolver's execution time.All metrics are tagged with the tags provided by the
DefaultGraphQLTagsProvider
. Developers can override theDefaultGraphQLTagsProvider
by defining their own implementation of theGraphQLTagsProvider
interface.The
DefaultGraphQLTagsProvider
defines the following tags."SUCCESS"
if the ExecutionResult has data, else an"ERROR"
.queryPath
or the GraphQLErrorpath
.1errorType
key.1errorDetail
key. For details on the cardinality of the value please see refer to the typederror interface.Examples
Footnotes
Remember that one request can have multiple errors. ↩ ↩2