Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(telemetry): add metrics for query plan warmup and schema loading #3807

Merged
merged 10 commits into from Sep 15, 2023

Conversation

bnjjj
Copy link
Contributor

@bnjjj bnjjj commented Sep 12, 2023

It adds histogram metrics for apollo_router_query_planning_warmup_duration and apollo_router_schema_load_duration.

Example in Prometheus:

# HELP apollo_router_query_planning_warmup_duration apollo_router_query_planning_warmup_duration
# TYPE apollo_router_query_planning_warmup_duration histogram
apollo_router_query_planning_warmup_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="0.05"} 1
apollo_router_query_planning_warmup_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="0.1"} 1
apollo_router_query_planning_warmup_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="0.25"} 1
apollo_router_query_planning_warmup_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="0.5"} 1
apollo_router_query_planning_warmup_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="1"} 1
apollo_router_query_planning_warmup_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="2.5"} 1
apollo_router_query_planning_warmup_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="5"} 1
apollo_router_query_planning_warmup_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="10"} 1
apollo_router_query_planning_warmup_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="20"} 1
apollo_router_query_planning_warmup_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="100"} 1
apollo_router_query_planning_warmup_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="1000"} 1
apollo_router_query_planning_warmup_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="+Inf"} 1
apollo_router_query_planning_warmup_duration_sum{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 0.022390619
apollo_router_query_planning_warmup_duration_count{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 1
# HELP apollo_router_schema_load_duration apollo_router_schema_load_duration
# TYPE apollo_router_schema_load_duration histogram
apollo_router_schema_load_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="0.05"} 8
apollo_router_schema_load_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="0.1"} 8
apollo_router_schema_load_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="0.25"} 8
apollo_router_schema_load_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="0.5"} 8
apollo_router_schema_load_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="1"} 8
apollo_router_schema_load_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="2.5"} 8
apollo_router_schema_load_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="5"} 8
apollo_router_schema_load_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="10"} 8
apollo_router_schema_load_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="20"} 8
apollo_router_schema_load_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="100"} 8
apollo_router_schema_load_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="1000"} 8
apollo_router_schema_load_duration_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="+Inf"} 8
apollo_router_schema_load_duration_sum{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 0.023486205999999996
apollo_router_schema_load_duration_count{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 8

Fixes #3767

Checklist

Complete the checklist (and note appropriate exceptions) before a final PR is raised.

  • Changes are compatible[^1]
  • Documentation[^2] completed
  • Performance impact assessed and acceptable
  • Tests added and passing[^3]
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

[^1]. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.
[^2]. Configuration is an important part of many changes. Where applicable please try to document configuration examples.
[^3]. Tick whichever testing boxes are applicable. If you are adding Manual Tests:
- please document the manual testing (extensively) in the Exceptions.
- please raise a separate issue to automate the test and label it (or ask for it to be labeled) as manual test

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
@github-actions

This comment has been minimized.

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
@bnjjj bnjjj requested a review from a team as a code owner September 12, 2023 10:32
Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
@BrynCooke BrynCooke dismissed their stale review September 13, 2023 10:33

These are existing metrics

@BrynCooke BrynCooke self-requested a review September 13, 2023 10:43
Copy link
Contributor

@BrynCooke BrynCooke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics need to be in otel style and not prom style.
These are new metrics and must be checked for consistency with other new metrics.

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
apollo-router/src/query_planner/caching_query_planner.rs Outdated Show resolved Hide resolved
apollo-router/src/spec/schema.rs Outdated Show resolved Hide resolved
apollo-router/src/query_planner/caching_query_planner.rs Outdated Show resolved Hide resolved
docs/source/configuration/metrics.mdx Outdated Show resolved Hide resolved
Copy link
Contributor

@BrynCooke BrynCooke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new metrics namespaces will also need to be added to the apollo metrics filter.

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
@bnjjj bnjjj enabled auto-merge (squash) September 15, 2023 13:46
@bnjjj bnjjj merged commit f831f68 into dev Sep 15, 2023
11 of 12 checks passed
@bnjjj bnjjj deleted the bnjjj/fix_3767 branch September 15, 2023 13:59
@lrlna lrlna mentioned this pull request Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add metrics tracking schema loading and query planner warm up time
3 participants