-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gateway OpenTelemetry improvements #2700
Gateway OpenTelemetry improvements #2700
Conversation
@unflxw: Thank you for submitting a pull request! Before we can merge it, you'll need to sign the Apollo Contributor License Agreement here: https://contribute.apollographql.com/ |
👷 Deploy request for apollo-federation-docs pending review.Visit the deploys page to approve it
|
🦋 Changeset detectedLatest commit: 263bb6d The changes in this PR will be included in the next version bump. This PR includes changesets to release 7 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
This pull request is automatically built and testable in CodeSandbox. To see build info of the built libraries, click here or the icon next to each commit SHA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a changeset by running npx changeset
? I'd consider this a minor. Please be descriptive in the changeset you write, it becomes the changelog entry. It should tell users what change they'll observe, why it might be useful, etc.
Looks pretty good, thanks for the addition! Just a couple comments.
@unflxw FYI I've changed the base branch to |
Thank you for reviewing this @trevor-scheer 🙌 will address the feedback (and rebase on I also tried adding the operation type ( Any thoughts on how it could be obtained? This would be nice to have, as the combined operation type and name make for a good human-readable identifier of the query, with low cardinality, for the purpose of grouping traces. |
When an error is received or thrown, in addition to setting the span status to error, report the error as an OpenTelemetry exception event. If more than one error is received in a GraphQL error array, report only the first one.
On the request OpenTelemetry span, add the GraphQL document as an OpenTelemetry span attribute, using the name for this property recommended by the OpenTelemetry Semantic Conventions.
When an array of errors is returned and the current span's status is set to error, record all errors in the array as exception events in the span, not just the first one.
Implement an intermediate `SpanAttributes` type that documents the specific attributes that are expected in an OpenTelemetry span. Mark the existing `operationName` attribute as deprecated in favor of `graphql.operation.name`, which follows OpenTelemetry's GraphQL Semantic Conventions. Report the attribute with both names.
Obtain the GraphQL operation type from the request context and set it as a span attribute.
f92f200
to
4d2a157
Compare
Done as described @trevor-scheer, please re-review 👍 I also figured out how to add the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@unflxw changes look good!
@dariuszkuc raised a concern to me that capturing the documents could really blow up the size of these spans for some. I think it's best if we make the new behavior configurable and opt-in. I figure we might as well do the same for the new exceptions capturing while we're in there.
Thoughts? Would you mind implementing some configurability into this? I'm open to the naming here, wouldn't mind @dariuszkuc's input as well.
new ApolloGateway({
// ...
telemetry: {
includeDocument: true, // default false
recordExceptions: true, // default false
},
});
@trevor-scheer @dariuszkuc I'm not opposed to adding a configuration option for including the document, I will look into adding it. That said, I think both should default to true. Maybe That said, I think it's worth pointing out that the |
@unflxw In principle I don't disagree with what you're saying, and that might be the ideal future state of this configuration. My primary concern is making that change in a minor version. If this change in default behavior drove up my operating costs in a non-negligible way I would be pretty frustrated and expect that to be part of a major bump that's explicitly called out. I have no idea what to expect as far as span size and cost goes, so I'd rather make no assumptions there. With what I've proposed, it's visible in the changelog and a new feature that users can opt in to. |
Yeah, that makes sense to me. I'll implement it in that way. |
Add a `telemetry` section to the Gateway configuration that allows the user to determine whether the GraphQL document should be sent in the request span (`includeDocument`) and whether exception events should be recorded (`recordException`, which also accepts a maximum number of exceptions to record)
@trevor-scheer @dariuszkuc Take a look! I went with the naming you suggested, feel free to change it if you want. I think that a default value of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for incorporating those changes @unflxw, this looks good to me. I'm looking for @dariuszkuc to take a final pass on this next week just to make sure this aligns with what he and I spoke about.
Sounds good! Ping me if there's anything I can help with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
While opt-in flag is currently the best we can do, ideally open telemetry spec should define a mechanism that would allow you to dynamically opt-in to this additional information on a per trace basis (similar to zipkin B3 debug flag). As Trevor mentioned, those documents could potentially be rather large (as in hundreds of KB big). Sadly TraceFlags
currently only define single sampled
flag so there is no built-in mechanism yet :(.
Thanks for the contribution @unflxw! |
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to next, this PR will be updated. # Releases ## @apollo/composition@2.6.0 ### Minor Changes - Update `license` field in `package.json` to use `Elastic-2.0` SPDX identifier ([apollographql#2741](apollographql#2741)) - Introduce the new `@policy` scope for composition ([apollographql#2818](apollographql#2818)) > Note that this directive will only be _fully_ supported by the Apollo Router as a GraphOS Enterprise feature at runtime. Also note that _composition_ of valid `@policy` directive applications will succeed, but the resulting supergraph will not be _executable_ by the Gateway or an Apollo Router which doesn't have the GraphOS Enterprise entitlement. Users may now compose `@policy` applications from their subgraphs into a supergraph. The directive is defined as follows: ```graphql scalar federation__Policy directive @Policy(policies: [[federation__Policy!]!]!) on | FIELD_DEFINITION | OBJECT | INTERFACE | SCALAR | ENUM ``` The `Policy` scalar is effectively a `String`, similar to the `FieldSet` type. In order to compose your `@policy` usages, you must update your subgraph's federation spec version to v2.6 and add the `@policy` import to your existing imports like so: ```graphql @link(url: "https://specs.apollo.dev/federation/v2.6", import: [..., "@Policy"]) ``` ### Patch Changes - Updated dependencies \[[`b18841be`](apollographql@b18841b), [`e325b499`](apollographql@e325b49)]: - @apollo/query-graphs@2.6.0 - @apollo/federation-internals@2.6.0 ## @apollo/gateway@2.6.0 ### Minor Changes - Add more information to OpenTelemetry spans. ([apollographql#2700](apollographql#2700)) Rename `operationName` to `graphql.operation.name` and add a `graphql.operation.type` attribute, in conformance with the OpenTelemetry Semantic Conventions for GraphQL. The `operationName` attribute is now deprecated, but it is still emitted alongside `graphql.operation.name`. Add a `graphql.document` span attribute to the `gateway.request` span, containing the entire GraphQL source sent in the request. This feature is disable by default. When one or more GraphQL or internal errors occur, report them in the OpenTelemetry span in which they took place, as an exception event. This feature is disabled by default. To enable the `graphql.document` span attribute and the exception event reporting, add the following entries to your `ApolloGateway` instance configuration: ```ts const gateway = new ApolloGateway({ // ... telemetry: { // Set to `true` to include the `graphql.document` attribute includeDocument: true, // Set to `true` to report all exception events, or set to a number // to report at most that number of exception events per span reportExceptions: true, // or: reportExceptions: 1 }, }); ``` - Update `license` field in `package.json` to use `Elastic-2.0` SPDX identifier ([apollographql#2741](apollographql#2741)) - Introduce the new `@policy` scope for composition ([apollographql#2818](apollographql#2818)) > Note that this directive will only be _fully_ supported by the Apollo Router as a GraphOS Enterprise feature at runtime. Also note that _composition_ of valid `@policy` directive applications will succeed, but the resulting supergraph will not be _executable_ by the Gateway or an Apollo Router which doesn't have the GraphOS Enterprise entitlement. Users may now compose `@policy` applications from their subgraphs into a supergraph. The directive is defined as follows: ```graphql scalar federation__Policy directive @Policy(policies: [[federation__Policy!]!]!) on | FIELD_DEFINITION | OBJECT | INTERFACE | SCALAR | ENUM ``` The `Policy` scalar is effectively a `String`, similar to the `FieldSet` type. In order to compose your `@policy` usages, you must update your subgraph's federation spec version to v2.6 and add the `@policy` import to your existing imports like so: ```graphql @link(url: "https://specs.apollo.dev/federation/v2.6", import: [..., "@Policy"]) ``` - Add graphql.operation.name attribute on gateway.plan span ([apollographql#2807](apollographql#2807)) ### Patch Changes - Updated dependencies \[[`b18841be`](apollographql@b18841b), [`e325b499`](apollographql@e325b49)]: - @apollo/query-planner@2.6.0 - @apollo/composition@2.6.0 - @apollo/federation-internals@2.6.0 ## @apollo/federation-internals@2.6.0 ### Minor Changes - Update `license` field in `package.json` to use `Elastic-2.0` SPDX identifier ([apollographql#2741](apollographql#2741)) - Introduce the new `@policy` scope for composition ([apollographql#2818](apollographql#2818)) > Note that this directive will only be _fully_ supported by the Apollo Router as a GraphOS Enterprise feature at runtime. Also note that _composition_ of valid `@policy` directive applications will succeed, but the resulting supergraph will not be _executable_ by the Gateway or an Apollo Router which doesn't have the GraphOS Enterprise entitlement. Users may now compose `@policy` applications from their subgraphs into a supergraph. The directive is defined as follows: ```graphql scalar federation__Policy directive @Policy(policies: [[federation__Policy!]!]!) on | FIELD_DEFINITION | OBJECT | INTERFACE | SCALAR | ENUM ``` The `Policy` scalar is effectively a `String`, similar to the `FieldSet` type. In order to compose your `@policy` usages, you must update your subgraph's federation spec version to v2.6 and add the `@policy` import to your existing imports like so: ```graphql @link(url: "https://specs.apollo.dev/federation/v2.6", import: [..., "@Policy"]) ``` ## @apollo/query-graphs@2.6.0 ### Minor Changes - Update `license` field in `package.json` to use `Elastic-2.0` SPDX identifier ([apollographql#2741](apollographql#2741)) - Introduce the new `@policy` scope for composition ([apollographql#2818](apollographql#2818)) > Note that this directive will only be _fully_ supported by the Apollo Router as a GraphOS Enterprise feature at runtime. Also note that _composition_ of valid `@policy` directive applications will succeed, but the resulting supergraph will not be _executable_ by the Gateway or an Apollo Router which doesn't have the GraphOS Enterprise entitlement. Users may now compose `@policy` applications from their subgraphs into a supergraph. The directive is defined as follows: ```graphql scalar federation__Policy directive @Policy(policies: [[federation__Policy!]!]!) on | FIELD_DEFINITION | OBJECT | INTERFACE | SCALAR | ENUM ``` The `Policy` scalar is effectively a `String`, similar to the `FieldSet` type. In order to compose your `@policy` usages, you must update your subgraph's federation spec version to v2.6 and add the `@policy` import to your existing imports like so: ```graphql @link(url: "https://specs.apollo.dev/federation/v2.6", import: [..., "@Policy"]) ``` ### Patch Changes - Updated dependencies \[[`b18841be`](apollographql@b18841b), [`e325b499`](apollographql@e325b49)]: - @apollo/federation-internals@2.6.0 ## @apollo/query-planner@2.6.0 ### Minor Changes - Update `license` field in `package.json` to use `Elastic-2.0` SPDX identifier ([apollographql#2741](apollographql#2741)) - Introduce the new `@policy` scope for composition ([apollographql#2818](apollographql#2818)) > Note that this directive will only be _fully_ supported by the Apollo Router as a GraphOS Enterprise feature at runtime. Also note that _composition_ of valid `@policy` directive applications will succeed, but the resulting supergraph will not be _executable_ by the Gateway or an Apollo Router which doesn't have the GraphOS Enterprise entitlement. Users may now compose `@policy` applications from their subgraphs into a supergraph. The directive is defined as follows: ```graphql scalar federation__Policy directive @Policy(policies: [[federation__Policy!]!]!) on | FIELD_DEFINITION | OBJECT | INTERFACE | SCALAR | ENUM ``` The `Policy` scalar is effectively a `String`, similar to the `FieldSet` type. In order to compose your `@policy` usages, you must update your subgraph's federation spec version to v2.6 and add the `@policy` import to your existing imports like so: ```graphql @link(url: "https://specs.apollo.dev/federation/v2.6", import: [..., "@Policy"]) ``` ### Patch Changes - Updated dependencies \[[`b18841be`](apollographql@b18841b), [`e325b499`](apollographql@e325b49)]: - @apollo/query-graphs@2.6.0 - @apollo/federation-internals@2.6.0 ## @apollo/subgraph@2.6.0 ### Patch Changes - Updated dependencies \[[`b18841be`](apollographql@b18841b), [`e325b499`](apollographql@e325b49)]: - @apollo/federation-internals@2.6.0 ## apollo-federation-integration-testsuite@2.6.0 ### Minor Changes - Update `license` field in `package.json` to use `Elastic-2.0` SPDX identifier ([apollographql#2741](apollographql#2741)) - Introduce the new `@policy` scope for composition ([apollographql#2818](apollographql#2818)) > Note that this directive will only be _fully_ supported by the Apollo Router as a GraphOS Enterprise feature at runtime. Also note that _composition_ of valid `@policy` directive applications will succeed, but the resulting supergraph will not be _executable_ by the Gateway or an Apollo Router which doesn't have the GraphOS Enterprise entitlement. Users may now compose `@policy` applications from their subgraphs into a supergraph. The directive is defined as follows: ```graphql scalar federation__Policy directive @Policy(policies: [[federation__Policy!]!]!) on | FIELD_DEFINITION | OBJECT | INTERFACE | SCALAR | ENUM ``` The `Policy` scalar is effectively a `String`, similar to the `FieldSet` type. In order to compose your `@policy` usages, you must update your subgraph's federation spec version to v2.6 and add the `@policy` import to your existing imports like so: ```graphql @link(url: "https://specs.apollo.dev/federation/v2.6", import: [..., "@Policy"]) ``` Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
This PR adds the GraphQL document as a span attribute on the gateway's request span, and makes it so that errors are recorded as exception events in the spans in which they occur.
Commits
Record errors as OTel exception events
When an error is received or thrown, in addition to setting the
span status to error, report the error as an OpenTelemetry
exception event.
If more than one error is received in a GraphQL error array, report
only the first one.
Report the GraphQL document as a span attribute
On the request OpenTelemetry span, add the GraphQL document as an
OpenTelemetry span attribute, using the name for this
property recommended by the OpenTelemetry Semantic Conventions.
Notes
operationName
attribute to the recommendedgraphql.operation.name
in the OpenTelemetry Semantic Conventions. Let me know if you want me to do that.