Provide Micrometer Metrics that capture aspects of GraphQL query latency. #52

berngp · 2021-02-09T20:03:50Z

Provide metrics that capture the `graphql` endpoint latency.

This PR introduces the graphql-dgs-spring-boot-micrometer module. It has an auto-configuration that enables the capture of GraphQL query metrics, via micrometer. The GraphQLMetricsInstrumentation will provide the following metrics:

Name	Kind	Description
gql.query	Timer	Captures the time it takes to execute the query.
gql.error	Counter	Captures number of GraphQL errors, note that one GraphQL request can have multiple.
gql.resolver	Timer	Tagged with the resolver's GraphQL field via `gql.field`, it captures the resolver's execution time.
gql.dataLoader	Timer	Captures the elapse time for a data loader invocation for batch of queries. This is useful if you want to find data loaders that might be responsible for poor query performance.

All metrics are tagged with the tags provided by the DefaultGraphQLTagsProvider. Developers can override the DefaultGraphQLTagsProvider by defining their own implementation of the GraphQLTagsProvider interface.

The DefaultGraphQLTagsProvider defines the following tags.

Tag	Description
outcome	`"SUCCESS"` if the ExecutionResult has data, else an `"ERROR"`.
gql.path	(optional) ValidationError's `queryPath` or the GraphQLError `path`.¹
gql.errorCode	(optional) The ErrorType's name if the error is a ValidationError, else it will attempt to extract its value from the Error's extensions via the `errorType` key.¹
gql.errorDetail	(optional) it will attempt to extract its value from the GgraphQLError's extensions via the `errorDetail` key. For details on the cardinality of the value please see refer to the typederror interface.
gql.loaderName	(specific to gql.Loader) Name of the data loader.
gql.loaderBatchSize	(specific to gql.Loader) The number of queries executed in the batch.

Examples

Included as part of this PR we have the MicrometerServletSmokeTest.kt test that depicts the metrics emitted on particular scenarios.

Remember that one request can have multiple errors. ↩ ↩²

jkschneider · 2021-02-25T01:14:05Z

+1 for this support. Look forward to it :)

jkschneider · 2021-02-25T15:32:36Z

.../src/main/kotlin/com/netflix/graphql/dgs/metrics/micrometer/GraphQLMetricsInstrumentation.kt

+
+    private fun recordDataFetcherTime(timerSampler: Timer.Sample, tags: Tags, registry: MeterRegistry) {
+        timerSampler.stop(registry,
+                Timer.builder("gql.resolver.time").tags(tags).publishPercentileHistogram())


Suggest striking publishPercentileHistogram as it it can be turned on later by an application developer if they want histograms for this timer in application.properties with management.metrics.distribution.percentiles-histogram.gql=true

had no idea, thanks for the pointer.

srinivasankavitha · 2021-02-11T00:48:29Z

...ter/src/main/kotlin/com/netflix/graphql/dgs/metrics/micrometer/DefaultGraphQLTagsProvider.kt

+                         exception: Throwable?): Iterable<Tag> {
+
+        return if (result.errors.isNotEmpty()) {
+            InstrumentationTags.sanitizeErrorPathsForTags(result).map {


Any particular reason to add the tags here vs in the instrumentation class as part of instrumentExecutionResult?

jkschneider · 2021-03-02T21:23:15Z

.../src/main/kotlin/com/netflix/graphql/dgs/metrics/micrometer/GraphQLMetricsInstrumentation.kt

+
+    override fun instrumentExecutionResult(executionResult: ExecutionResult, parameters: InstrumentationExecutionParameters): CompletableFuture<ExecutionResult> {
+        sanitizeErrorPathsForTags(executionResult).forEach { _ ->
+            registry.counter("gql.error",


Do we need this counter? Timers always ship a count as well as other distribution statistics like max, sum, percentiles, etc. So if I understand this correctly, this data already is retrievable by filtering gql.query{outcome='ERROR'} and looking at the count statistic.

Could you please review the latest code. In the latest implementation a qgl.error is emitted per error in the Grql Request. Per the spec a request can have multiple errors.

berngp · 2021-03-03T02:45:25Z

There is plenty of documentation to write for this feature. In the meantime the MicrometerServletSmokeTest.kt offers a hint on the metrics, and the tags, according to different scenarios.

berngp · 2021-03-04T02:56:59Z

Mining for decent. Most likely will merge tomorrow and have a soft release of this feature since I'm planning on adapting our internal implementation to it.

h2. Context *Work in Progress* h3. Provide metrics that capture the `graphql` endpoint latency. This PR introduces the `graphql-dgs-spring-boot-micrometer` module. It has an _auto-configuration_ that enables the capture of GraphQL query metrics, via [micrometer]. The `GraphQLMetricsInstrumentation` will provide the following metrics: | Name | Kind | Description | | ------------ | ------ | ----------- | | gql.query | Timer | Captures the time it takes to execute the query. | | gql.error | Counter| Captures number of GraphQL errors, note that one GraphQL request can have multiple.| | gql.resolver | Timer | Tagged with the resolver's GraphQL field via `gql.field`, it captures the resolver's execution time.| | gql.dataLoader | Timer | Captures the elapse time for a data loader invocation for batch of queries. This is useful if you want to find data loaders that might be responsible for poor query performance. | All metrics are tagged with the tags provided by the `DefaultGraphQLTagsProvider`. Developers can override the `DefaultGraphQLTagsProvider` by defining their own implementation of the `GraphQLTagsProvider` interface. The `DefaultGraphQLTagsProvider` defines the following tags. | Tag | Description | | ------------ | ----------- | | outcome | `"SUCCESS"` if the [ExecutionResult] has data, else an `"ERROR"`. | | gql.path | (optional) [ValidationError]'s `queryPath` or the [GraphQLError] `path`.[^1]| | gql.errorCode | (optional) The [ErrorType]'s name if the error is a [ValidationError], else it will attempt to extract its value from the Error's extensions via the `errorType` key.[^1]| | gql.errorDetail | (optional) it will attempt to extract its value from the GgraphQLError's extensions via the `errorDetail` key. For details on the cardinality of the value please see refer to the [typederror interface](https://netflix.github.io/dgs/error-handling/#the-typederror-interface). | | gql.loaderName | (specific to gql.Loader) Name of the data loader. | | gql.loaderBatchSize | (specific to gql.Loader) The number of queries executed in the batch. | [^1]: Remember that one request can have multiple errors. h3. Examples * Included as part of this PR we have the [MicrometerServletSmokeTest.kt] test that depicts the metrics emitted on particular scenarios. ---- [micrometer]: https://micrometer.io/ [ErrorType]: https://github.com/graphql-java/graphql-java/blob/master/src/main/java/graphql/ErrorType.java [ExecutionResult]: https://github.com/graphql-java/graphql-java/blob/master/src/main/java/graphql/ExecutionResult.java [GraphQLError]:https://github.com/graphql-java/graphql-java/blob/master/src/main/java/graphql/GraphQLError.java [ValidationError]: https://github.com/graphql-java/graphql-java/blob/master/src/main/java/graphql/validation/ValidationError.java [MicrometerServletSmokeTest.kt]: https://github.com/Netflix/dgs-framework/blob/2894c3fc96388f0e21e4954740a558dda6626f3a/graphql-dgs-spring-boot-micrometer/src/test/kotlin/com/netflix/graphql/dgs/metrics/micrometer/MicrometerServletSmokeTest.kt ---- h2. DGS Resolver Metrics Initially we had the a `dgs.resolver.timer` and a `dgs.resolver.counter`. These were removed in favor of a `dgs.resolver` timer. As mentioned in Micrometer's [Counters](https://micrometer.io/docs/concepts#_counters) documentation... > Never count something you can time with a Timer or summarize with a DistributionSummary! > Both Timer and DistributionSummary always publish a count of events in addition to other measurements. Flags to enable/disable features, Smoke Test metrics for DataLoaders

berngp force-pushed the feature/metrics branch 2 times, most recently from c95dce7 to 8787ac4 Compare February 9, 2021 20:21

berngp added the enhancement New feature or request label Feb 9, 2021

berngp self-assigned this Feb 9, 2021

berngp force-pushed the feature/metrics branch 3 times, most recently from c4a93f7 to af26e4f Compare February 10, 2021 19:52

berngp changed the title ~~Provide metrics that capture the graphql endpoint latency, via Micrometer~~ Provide Micrometer Metrics that capture aspects of GraphQL query latency. Feb 10, 2021

srinivasankavitha requested review from srinivasankavitha, paulbakker and rpalcolea February 10, 2021 21:10

berngp force-pushed the feature/metrics branch 2 times, most recently from 0a45cbc to 95bb8a7 Compare February 11, 2021 05:33

jkschneider reviewed Feb 25, 2021

View reviewed changes

srinivasankavitha approved these changes Feb 26, 2021

View reviewed changes

jkschneider reviewed Mar 2, 2021

View reviewed changes

berngp force-pushed the feature/metrics branch 2 times, most recently from c16d862 to 2894c3f Compare March 3, 2021 02:41

berngp force-pushed the feature/metrics branch from 2894c3f to b5f4ff4 Compare March 3, 2021 22:10

berngp requested a review from srinivasankavitha March 4, 2021 02:31

berngp force-pushed the feature/metrics branch 2 times, most recently from f484baf to 5143ca3 Compare March 4, 2021 02:55

berngp force-pushed the feature/metrics branch from 5143ca3 to a8e5d6c Compare March 4, 2021 02:58

berngp removed the request for review from rpalcolea March 4, 2021 02:59

berngp merged commit 7aae070 into master Mar 4, 2021

berngp deleted the feature/metrics branch April 9, 2021 00:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide Micrometer Metrics that capture aspects of GraphQL query latency. #52

Provide Micrometer Metrics that capture aspects of GraphQL query latency. #52

berngp commented Feb 9, 2021 •

edited

jkschneider commented Feb 25, 2021

jkschneider Feb 25, 2021

berngp Feb 26, 2021

srinivasankavitha Feb 11, 2021

jkschneider Mar 2, 2021

berngp Mar 3, 2021

berngp commented Mar 3, 2021

berngp commented Mar 4, 2021

Provide Micrometer Metrics that capture aspects of GraphQL query latency. #52

Provide Micrometer Metrics that capture aspects of GraphQL query latency. #52

Conversation

berngp commented Feb 9, 2021 • edited

Provide metrics that capture the graphql endpoint latency.

Examples

Footnotes

jkschneider commented Feb 25, 2021

jkschneider Feb 25, 2021

Choose a reason for hiding this comment

berngp Feb 26, 2021

Choose a reason for hiding this comment

srinivasankavitha Feb 11, 2021

Choose a reason for hiding this comment

jkschneider Mar 2, 2021

Choose a reason for hiding this comment

berngp Mar 3, 2021

Choose a reason for hiding this comment

berngp commented Mar 3, 2021

berngp commented Mar 4, 2021

berngp commented Feb 9, 2021 •

edited

Provide metrics that capture the `graphql` endpoint latency.