apollo-engine-reporting: fix reporting errors from backends #3056

zionts · 2019-07-18T19:27:16Z

The extension stack executionDidEnd method gets called before didEncounterErrors
for GraphQL errors returned from execution (although confusingly the plugin
executionDidEnd method gets called after), which caused the assertion that
nothing gets added to the EngineReportingTreeBuilder after stopTiming to
fail. Fix by moving stopTiming to the last possible moment: format().

Actually test error reporting, including both kinds of rewriting.

Add a comment noting that backend parse and validation errors don't get
reported.

Fixes #3052.

glasser · 2019-07-18T23:27:19Z

I sent more details to Adam on slack as I'm in transit but:

a much simpler fix is to move the stopTiming call to format(). The whole point of the state machine error checking is to ensure that data isn't lost because (eg) errors show up after the trace is rendered, so that's the most precise way of achieving that goal anyway
these throws represent "shouldn't happen" programming errors in our code, not an erroneous use by our users or bad data, so if anything they should be more catastrophic, not less. Eg maybe the code up above that catches them and doesn't crash the program should detect some specific error type that means "Apollo server programming error" and treat them more drastically. Console.error would have meant this bug would have been found later!

glasser · 2019-07-19T18:05:04Z

Can you get a test for this PR? Look at the reports a total duration that is longer than the duration of its resolvers for an example of a test of federated tracing (outside the context of the gateway, but still testing integration with apollo-server)

glasser

and add a test please

glasser · 2019-07-18T22:55:34Z

packages/apollo-engine-reporting/src/federatedExtension.ts

@@ -34,6 +34,14 @@ export class EngineFederatedTracingExtension<TContext = any>
    if (this.enabled) {
      this.treeBuilder.startTiming();
    }
+
+    return () => {
+      // It's possible that execution never started!


Does something here need to check this.enabled?

glasser · 2019-07-19T18:34:26Z

packages/apollo-engine-reporting/src/treeBuilder.ts

@@ -100,13 +109,6 @@ export class EngineReportingTreeBuilder {
    path: ReadonlyArray<string | number> | undefined,
    error: Trace.Error,
  ) {
-    if (!this.startHrTime) {


Why are we still removing this?

glasser · 2019-07-19T18:34:46Z

packages/apollo-engine-reporting/src/treeBuilder.ts

@@ -15,6 +15,7 @@ export class EngineReportingTreeBuilder {
    [rootResponsePath, this.rootNode],
  ]);
  private rewriteError?: (err: GraphQLError) => GraphQLError | null;
+  private consolePrefix = '[apollo-engine-reporting]';


how is this helpful in errors? isn't the stack trace good enough?

I'm a fan of this as it's a clearer indication that something is seriously fucked and it's not part of the implementing application.

ok, but in that case maybe the message should literally say it's an internal bug in the library?

glasser · 2019-07-19T18:35:27Z

and pr desc needs an update

glasser · 2019-07-26T18:15:24Z

I rewrote this.

zionts

I think this looks good @glasser ! I think we should capture the weirdness of validation/parsing errors not being caught and maybe noodle on how we can capture them here or in Slack.

zionts · 2019-07-26T20:46:53Z

packages/apollo-engine-reporting/src/federatedExtension.ts

+  //
+  // Note: format() is only called after executing an operation, and
+  // specifically isn't called for parse or validation errors. Parse and validation
+  // errors in a federated backend will get reported to the end user as a downstream


hm... can we track this as an issue for followup at least? Seems like it would be useful to know if the query planner is sending sub-operations that fail validation :)

zionts · 2019-07-26T20:47:31Z

packages/apollo-engine-reporting/src/treeBuilder.ts

@@ -6,6 +6,10 @@ import {
 } from 'graphql';
 import { Trace, google } from 'apollo-engine-reporting-protobuf';

+function internalError(message: string) {
+  return new Error(`[internal apollo-server error] ${message}`);


nit: I do think apollo-engine-reporting is probably more appropriate, or internal apollo metrics reporting if we want to avoid the term engine.

I mean that's in the stack trace?

The extension stack executionDidEnd method gets called before didEncounterErrors for GraphQL errors returned from execution (although confusingly the plugin executionDidEnd method gets called after), which caused the assertion that nothing gets added to the EngineReportingTreeBuilder after stopTiming to fail. Fix by moving stopTiming to the last possible moment: format(). Actually test error reporting, including both kinds of rewriting. Add a comment noting that backend parse and validation errors don't get reported. Fixes #3052.

zionts changed the title ~~Specify whether error serialization timing should be captured~~ AER: Specify whether error serialization timing should be captured Jul 18, 2019

zionts force-pushed the adam/19/7/error-timing-abstraction branch from 6a09868 to 3b6af6a Compare July 18, 2019 19:34

zionts changed the title ~~AER: Specify whether error serialization timing should be captured~~ AER: Federated Extension fix-ups Jul 18, 2019

zionts requested review from abernix and trevor-scheer July 18, 2019 19:58

zionts force-pushed the adam/19/7/error-timing-abstraction branch from 5e6f2f4 to c9945b7 Compare July 18, 2019 22:52

glasser reviewed Jul 19, 2019

View reviewed changes

glasser force-pushed the adam/19/7/error-timing-abstraction branch from 080d65e to ee4ec98 Compare July 26, 2019 17:33

glasser changed the title ~~AER: Federated Extension fix-ups~~ apollo-engine-reporting: fix reporting errors from backends Jul 26, 2019

glasser force-pushed the adam/19/7/error-timing-abstraction branch from ee4ec98 to 2a31072 Compare July 26, 2019 17:35

glasser force-pushed the adam/19/7/error-timing-abstraction branch from 2a31072 to 3d3898b Compare July 26, 2019 18:41

zionts commented Jul 26, 2019

View reviewed changes

glasser force-pushed the adam/19/7/error-timing-abstraction branch from 3d3898b to 1fd58cc Compare July 26, 2019 22:49

glasser merged commit ecbbc6a into master Jul 26, 2019

glasser deleted the adam/19/7/error-timing-abstraction branch July 26, 2019 22:57

glasser added a commit that referenced this pull request Jul 26, 2019

Revert accidental change to package-lock.json in #3056

c2db83a

github-actions bot locked as resolved and limited conversation to collaborators Apr 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apollo-engine-reporting: fix reporting errors from backends #3056

apollo-engine-reporting: fix reporting errors from backends #3056

zionts commented Jul 18, 2019 •

edited by glasser

Loading

glasser commented Jul 18, 2019

glasser commented Jul 19, 2019

glasser left a comment

glasser Jul 18, 2019

glasser Jul 19, 2019

glasser Jul 19, 2019

jacob-ebey Jul 19, 2019

glasser Jul 19, 2019

glasser commented Jul 19, 2019

glasser commented Jul 26, 2019

zionts left a comment

zionts Jul 26, 2019

zionts Jul 26, 2019

glasser Jul 26, 2019

apollo-engine-reporting: fix reporting errors from backends #3056

apollo-engine-reporting: fix reporting errors from backends #3056

Conversation

zionts commented Jul 18, 2019 • edited by glasser Loading

glasser commented Jul 18, 2019

glasser commented Jul 19, 2019

glasser left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glasser commented Jul 19, 2019

glasser commented Jul 26, 2019

zionts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zionts commented Jul 18, 2019 •

edited by glasser

Loading