Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a metric to track request timeouts #2419

Merged
merged 15 commits into from
Jan 25, 2023
Merged

add a metric to track request timeouts #2419

merged 15 commits into from
Jan 25, 2023

Conversation

Geal
Copy link
Contributor

@Geal Geal commented Jan 17, 2023

This adds a counter for timeouts. It will generate separate metrics for client request timeouts and subgraph requests timeouts.

Checklist

Complete the checklist (and note appropriate exceptions) before a final PR is raised.

  • Changes are compatible[^1]
  • Documentation[^2] completed
  • Performance impact assessed and acceptable
  • Tests added and passing[^3]
    • Unit Tests
    • Integration Tests
    • Manual Tests

@github-actions

This comment has been minimized.

apparently there's an issue with the downcast from error here. I think
we should move towards a timeout layer that returns a Ok(response)
instead of Err(Elapsed)
this is a bit complicated because the timeout future only knows about
very limited generic parameters, while we actually want to:
- generate different responses if we're in supergraph or subgraph
- return a response containing the request context
@Geal
Copy link
Contributor Author

Geal commented Jan 18, 2023

🤔 if timeout returns a HTTP response, then that breaks the retry layer, which will only retry on Err. And the 504 status code does not make much sense at the subgraph request level, it's not acting as a proxy there.
It makes more sense to use 504 for the supergraph level timeout though.
So I'll revert those changes to reuse BoxError instead, update the status code in the axum response, and add a test

@Geal Geal marked this pull request as ready for review January 18, 2023 15:04
@Geal Geal requested review from a team, SimonSapin and abernix and removed request for a team January 18, 2023 15:04
Comment on lines +2170 to +2171
// we do the entire supergraph rebuilding instead of using `from_supergraph_mock_callback_and_configuration`
// because we need the plugins to apply on the supergraph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you say more about what this means? The amount of duplicated boilerplate code feels unfortunate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_supergraph_mock_callback_and_configuration creates a mock supergraph without applying plugins on it, because it is meant as a foundation to test the router service. Here I want to test, at the router service response, the result of executing a plugin between the router service and supergraph service. That boiler plate could be extracted in another function if we see another case where we'd need it

@Geal Geal enabled auto-merge (squash) January 25, 2023 08:30
@Geal Geal merged commit 9e9fae8 into dev Jan 25, 2023
@Geal Geal deleted the geal/timeout-metrics branch January 25, 2023 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return a HTTP 504 Gateway Timeout when the router timeout triggers Metrics for upstream request timeouts
3 participants