Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apollo-engine-reporting: fix reporting errors from backends #3056

Merged
merged 1 commit into from
Jul 26, 2019

Conversation

zionts
Copy link
Contributor

@zionts zionts commented Jul 18, 2019

The extension stack executionDidEnd method gets called before didEncounterErrors
for GraphQL errors returned from execution (although confusingly the plugin
executionDidEnd method gets called after), which caused the assertion that
nothing gets added to the EngineReportingTreeBuilder after stopTiming to
fail. Fix by moving stopTiming to the last possible moment: format().

Actually test error reporting, including both kinds of rewriting.

Add a comment noting that backend parse and validation errors don't get
reported.

Fixes #3052.

@zionts zionts changed the title Specify whether error serialization timing should be captured AER: Specify whether error serialization timing should be captured Jul 18, 2019
@zionts zionts force-pushed the adam/19/7/error-timing-abstraction branch from 6a09868 to 3b6af6a Compare July 18, 2019 19:34
@zionts zionts changed the title AER: Specify whether error serialization timing should be captured AER: Federated Extension fix-ups Jul 18, 2019
@zionts zionts force-pushed the adam/19/7/error-timing-abstraction branch from 5e6f2f4 to c9945b7 Compare July 18, 2019 22:52
@glasser
Copy link
Member

glasser commented Jul 18, 2019

I sent more details to Adam on slack as I'm in transit but:

  • a much simpler fix is to move the stopTiming call to format(). The whole point of the state machine error checking is to ensure that data isn't lost because (eg) errors show up after the trace is rendered, so that's the most precise way of achieving that goal anyway

  • these throws represent "shouldn't happen" programming errors in our code, not an erroneous use by our users or bad data, so if anything they should be more catastrophic, not less. Eg maybe the code up above that catches them and doesn't crash the program should detect some specific error type that means "Apollo server programming error" and treat them more drastically. Console.error would have meant this bug would have been found later!

@glasser
Copy link
Member

glasser commented Jul 19, 2019

Can you get a test for this PR? Look at the reports a total duration that is longer than the duration of its resolvers for an example of a test of federated tracing (outside the context of the gateway, but still testing integration with apollo-server)

Copy link
Member

@glasser glasser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and add a test please

@@ -34,6 +34,14 @@ export class EngineFederatedTracingExtension<TContext = any>
if (this.enabled) {
this.treeBuilder.startTiming();
}

return () => {
// It's possible that execution never started!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does something here need to check this.enabled?

@@ -100,13 +109,6 @@ export class EngineReportingTreeBuilder {
path: ReadonlyArray<string | number> | undefined,
error: Trace.Error,
) {
if (!this.startHrTime) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we still removing this?

@@ -15,6 +15,7 @@ export class EngineReportingTreeBuilder {
[rootResponsePath, this.rootNode],
]);
private rewriteError?: (err: GraphQLError) => GraphQLError | null;
private consolePrefix = '[apollo-engine-reporting]';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this helpful in errors? isn't the stack trace good enough?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of this as it's a clearer indication that something is seriously fucked and it's not part of the implementing application.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but in that case maybe the message should literally say it's an internal bug in the library?

@glasser
Copy link
Member

glasser commented Jul 19, 2019

and pr desc needs an update

@glasser glasser force-pushed the adam/19/7/error-timing-abstraction branch from 080d65e to ee4ec98 Compare July 26, 2019 17:33
@glasser glasser changed the title AER: Federated Extension fix-ups apollo-engine-reporting: fix reporting errors from backends Jul 26, 2019
@glasser glasser force-pushed the adam/19/7/error-timing-abstraction branch from ee4ec98 to 2a31072 Compare July 26, 2019 17:35
@glasser
Copy link
Member

glasser commented Jul 26, 2019

I rewrote this.

@glasser glasser force-pushed the adam/19/7/error-timing-abstraction branch from 2a31072 to 3d3898b Compare July 26, 2019 18:41
Copy link
Contributor Author

@zionts zionts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good @glasser ! I think we should capture the weirdness of validation/parsing errors not being caught and maybe noodle on how we can capture them here or in Slack.

//
// Note: format() is only called after executing an operation, and
// specifically isn't called for parse or validation errors. Parse and validation
// errors in a federated backend will get reported to the end user as a downstream
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm... can we track this as an issue for followup at least? Seems like it would be useful to know if the query planner is sending sub-operations that fail validation :)

@@ -6,6 +6,10 @@ import {
} from 'graphql';
import { Trace, google } from 'apollo-engine-reporting-protobuf';

function internalError(message: string) {
return new Error(`[internal apollo-server error] ${message}`);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I do think apollo-engine-reporting is probably more appropriate, or internal apollo metrics reporting if we want to avoid the term engine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that's in the stack trace?

The extension stack executionDidEnd method gets called before didEncounterErrors
for GraphQL errors returned from execution (although confusingly the plugin
executionDidEnd method gets called after), which caused the assertion that
nothing gets added to the EngineReportingTreeBuilder after stopTiming to
fail. Fix by moving stopTiming to the last possible moment: format().

Actually test error reporting, including both kinds of rewriting.

Add a comment noting that backend parse and validation errors don't get
reported.

Fixes #3052.
@glasser glasser force-pushed the adam/19/7/error-timing-abstraction branch from 3d3898b to 1fd58cc Compare July 26, 2019 22:49
@glasser glasser merged commit ecbbc6a into master Jul 26, 2019
@glasser glasser deleted the adam/19/7/error-timing-abstraction branch July 26, 2019 22:57
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Apollo engine reporting blocks data and error propagation in '@apollo/gateway' managed federation deployment
3 participants