prep release: v1.49.0 #5473

lrlna · 2024-06-18T12:07:34Z

Note

When approved, this PR will merge into the 1.49.0 branch which will — upon being approved itself — merge into main.

Things to review in this PR:

Changelog correctness (There is a preview below, but it is not necessarily the most up to date. See the Files Changed for the true reality.)

Version bumps

That it targets the right release branch (1.49.0 in this case!).

🚀 Features

Override tracing span names using custom span selectors (Issue #5261)

Adds the ability to override span names by setting the otel.name attribute on any custom telemetry selectors .

This example changes the span name to router:

telemetry:
  instrumentation:
    spans:
      router:
        otel.name:
           static: router # Override the span name to router

By @bnjjj in #5365

Add description and units to standard instruments (PR #5407)

This PR adds description and units to standard instruments available in the router. These descriptions and units have been copy pasted directly from the OpenTelemetry semantic conventions and are needed for better integrations with APMs.

By @bnjjj in #5407

Add Extensions with_lock() to try and avoid timing issues (PR #5360)

It's easy to trip over issues when interacting with Extensions because we inadvertently hold locks for too long. This can be a source of bugs in the router and causes a lot of tests to be flaky.

with_lock() avoids this kind of problem by explicitly restricting the lifetime of the Extensions lock.

By @garypen in #5360

Add support for `unix_ms_now` in Rhai customizations (Issue #5182)

Rhai customizations can now use the unix_ms_now() function to obtain the current Unix timestamp in milliseconds since the Unix epoch.

For example:

fn supergraph_service(service) {
    let now = unix_ms_now();
}

By @shaikatzz in #5181

🐛 Fixes

Improve error message produced when subgraphs responses don't include an expected `content-type` header value (Issue #5359)

To enhance debuggability when a subgraph response lacks an expected content-type header value, the error message now includes additional details.

Examples:

HTTP fetch failed from 'test': subgraph response contains invalid 'content-type' header value \"application/json,application/json\"; expected content-type: application/json or content-type: application/graphql-response+json

HTTP fetch failed from 'test': subgraph response does not contain 'content-type' header; expected content-type: application/json or content-type: application/graphql-response+json

By @IvanGoncharov in #5223

Prevent formatting in hot path (PR #5405)

Removes unneeded formatting in the hot path for demand control to improve performance.

By @BrynCooke in #5405

Skip hashing the entire schema on every query plan cache lookup (PR #5374)

This fixes performance issues when looking up query plans for large schemas.

⚠️ Because this feature changes the query plan cache key, distributed caches will need to be repopulated.

By @Geal in #5374

Optimize GraphQL instruments (PR #5375)

When processing selectors for GraphQL instruments, heap allocations should be avoided for optimal performance. This change removes Vec allocations that were previously performed per field, yielding significant performance improvements.

By @BrynCooke in #5375

Log metrics overflow as a warning rather than an error (Issue #5173)

If a metric has too high a cardinality, the following is displayed as a warning instead of an error:

OpenTelemetry metric error occurred: Metrics error: Warning: Maximum data points for metric stream exceeded/ Entry added to overflow

By @bnjjj in #5287

Add support of `response_context selector` when in errors (PR #5288)

Provides the ability to configure custom instruments. For example:

http.server.request.timeout:
  type: counter
  value: unit
  description: "request in timeout"
  unit: request
  attributes:
    graphql.operation.name:
      response_context: operation_name
  condition:
    eq:
    - "request timed out"
    - error: reason

By @bnjjj in #5288

Inaccurate `apollo_router_opened_subscriptions` counter (PR #5363)

Fixes the apollo_router_opened_subscriptions counter which previously only incremented. The counter now also decrements.

By @bnjjj in #5363

📃 Configuration

Rename trace telemetry selector (PR #5337)

v1.48.0 introduced the apollo trace_id selector. trace_id is a misnomer for this metric, since the selector actually represents a GraphOS Studio operation ID. To access this selector, use studio_operation_id:

telemetry:
  instrumentation:
    spans:
      router:
        "studio.operation.id":
            studio_operation_id: true

By @bnjjj in #5337

Set Apollo metrics generation mode to `new` by default (PR #5265)

Changes the default value of experimental_apollo_metrics_generation_mode to new. All metrics are showing that identical signatures are being generated in this mode.

By @bonnici in #5265

🛠 Maintenance

Skip GraphOS tests when Apollo key not present (PR #5362)

Some tests require APOLLO_KEY and APOLLO_GRAPH_REF to execute successfully.
These are now skipped if these env variables are not present.

By @BrynCooke in #5362

📚 Documentation

Standard instrument configuration documentation for subgraphs (PR #5422)

Added documentation about standard instruments available at the subgraph service level:

http.client.request.body.size - A histogram of request body sizes for requests handled by subgraphs.
http.client.request.duration - A histogram of request durations for requests handled by subgraphs.
http.client.response.body.size - A histogram of response body sizes for requests handled by subgraphs.

These instruments are configurable in router.yaml:

telemetry:
  instrumentation:
    instruments:
      subgraph:
        http.client.request.body.size: true # (default false)
        http.client.request.duration: true # (default false)
        http.client.response.body.size: true # (default false)

By @bnjjj in #5422

Update docs frontmatter for consistency and discoverability (PR #5164)

Makes title case consistent for page titles and adds subtitles and meta-descriptions are updated for better discoverability.

By @Meschreiber in #5164

🧪 Experimental

Warm query plan cache using persisted queries on startup (Issue #5334)

Adds support for the router to use persisted queries to warm the query plan cache upon startup using a new experimental_prewarm_query_plan_cache configuration option under persisted_queries.

To enable:

persisted_queries:
  enabled: true
  experimental_prewarm_query_plan_cache: true

By @lleadbet in #5340

Apollo reporting signature enhancements (PR #5062)

Adds a new experimental configuration option to turn on some enhancements for the Apollo reporting stats report key:

Signatures will include the full normalized form of input objects
Signatures will include aliases
Some small normalization improvements

This new configuration (telemetry.apollo.experimental_apollo_signature_normalization_algorithm) only works when in experimental_apollo_metrics_generation_mode: new mode and we don't yet recommend enabling it while we continue to verify that the new functionality works as expected.

By @bonnici in #5062

Add experimental support for sending traces to Studio via OTLP (PR #4982)

As the ecosystem around OpenTelemetry (OTel) has been expanding rapidly, we are evaluating a migration of Apollo's internal
tracing system to use an OTel-based protocol.

In the short-term, benefits include:

A comprehensive way to visualize the router execution path in GraphOS Studio.
Additional spans that were previously not included in Studio traces, such as query parsing, planning, execution, and more.
Additional metadata such as subgraph fetch details, router idle / busy timing, and more.

Long-term, we see this as a strategic enhancement to consolidate these two disparate tracing systems.
This will pave the way for future enhancements to more easily plug into the Studio trace visualizer.

Configuration

This change adds a new configuration option experimental_otlp_tracing_sampler. This can be used to send
a percentage of traces via OTLP instead of the native Apollo Usage Reporting protocol. Supported values:

always_off (default): send all traces via Apollo Usage Reporting protocol.
always_on: send all traces via OTLP.
0.0 - 1.0: the ratio of traces to send via OTLP (0.5 = 50 / 50).

Note that this sampler is only applied after the common tracing sampler, for example:

Sample 1% of traces, send all traces via OTLP:

telemetry:
  apollo:
    # Send all traces via OTLP
    experimental_otlp_tracing_sampler: always_on

  exporters:
    tracing:
      common:
        # Sample traces at 1% of all traffic
        sampler: 0.01

by @timbotnik in #4982

router-perf · 2024-06-18T12:08:37Z

CHANGELOG.md

abernix

I mistakenly pressed Enter, but meant to push Request changes with my last review.

Co-authored-by: Jesse Rosenberger <git@jro.cc>

CHANGELOG.md

Co-authored-by: Bryn Cooke <BrynCooke@gmail.com>

abernix

LGTM.

lrlna added 2 commits June 18, 2024 13:58

prep release: v1.49.0

4b34080

formatting changelog changes

e7f1623

lrlna requested review from a team, dariuszkuc, sachindshinde, goto-bus-stop, SimonSapin, TylerBloom and duckki as code owners June 18, 2024 12:07

apollo-bot2 assigned lrlna Jun 18, 2024

move experimetnal features to 'Experimental' section of the changelog

a139c36

abernix reviewed Jun 18, 2024

View reviewed changes

abernix requested changes Jun 18, 2024

View reviewed changes

lrlna and others added 4 commits June 18, 2024 15:00

move metrics generation config to experimental

9def452

Update CHANGELOG.md

dcbabb4

Co-authored-by: Jesse Rosenberger <git@jro.cc>

Update CHANGELOG.md

fd6be09

Co-authored-by: Jesse Rosenberger <git@jro.cc>

Update CHANGELOG.md

d2e91b3

Co-authored-by: Jesse Rosenberger <git@jro.cc>

BrynCooke reviewed Jun 18, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

BrynCooke reviewed Jun 18, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

abernix reviewed Jun 18, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

abernix and others added 3 commits June 18, 2024 17:58

Update CHANGELOG.md

c992543

Co-authored-by: Bryn Cooke <BrynCooke@gmail.com>

Apply suggestions from code review

05aca59

Co-authored-by: Bryn Cooke <BrynCooke@gmail.com>

Update CHANGELOG.md

a1d33e9

abernix approved these changes Jun 18, 2024

View reviewed changes

abernix requested a review from BrynCooke June 18, 2024 15:25

garypen approved these changes Jun 18, 2024

View reviewed changes

bnjjj approved these changes Jun 18, 2024

View reviewed changes

lrlna merged commit a8a977e into 1.49.0 Jun 18, 2024
12 checks passed

lrlna deleted the prep-1.49.0 branch June 18, 2024 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prep release: v1.49.0 #5473

prep release: v1.49.0 #5473

lrlna commented Jun 18, 2024 •

edited

Loading

router-perf bot commented Jun 18, 2024

abernix left a comment

abernix left a comment

prep release: v1.49.0 #5473

prep release: v1.49.0 #5473

Conversation

lrlna commented Jun 18, 2024 • edited Loading

🚀 Features

Override tracing span names using custom span selectors (Issue #5261)

Add description and units to standard instruments (PR #5407)

Add Extensions with_lock() to try and avoid timing issues (PR #5360)

Add support for unix_ms_now in Rhai customizations (Issue #5182)

🐛 Fixes

Improve error message produced when subgraphs responses don't include an expected content-type header value (Issue #5359)

Prevent formatting in hot path (PR #5405)

Skip hashing the entire schema on every query plan cache lookup (PR #5374)

Optimize GraphQL instruments (PR #5375)

Log metrics overflow as a warning rather than an error (Issue #5173)

Add support of response_context selector when in errors (PR #5288)

Inaccurate apollo_router_opened_subscriptions counter (PR #5363)

📃 Configuration

Rename trace telemetry selector (PR #5337)

Set Apollo metrics generation mode to new by default (PR #5265)

🛠 Maintenance

Skip GraphOS tests when Apollo key not present (PR #5362)

📚 Documentation

Standard instrument configuration documentation for subgraphs (PR #5422)

Update docs frontmatter for consistency and discoverability (PR #5164)

🧪 Experimental

Warm query plan cache using persisted queries on startup (Issue #5334)

Apollo reporting signature enhancements (PR #5062)

Add experimental support for sending traces to Studio via OTLP (PR #4982)

Configuration

Sample 1% of traces, send all traces via OTLP:

router-perf bot commented Jun 18, 2024

abernix left a comment

Choose a reason for hiding this comment

abernix left a comment

Choose a reason for hiding this comment

lrlna commented Jun 18, 2024 •

edited

Loading

Add support for `unix_ms_now` in Rhai customizations (Issue #5182)

Improve error message produced when subgraphs responses don't include an expected `content-type` header value (Issue #5359)

Add support of `response_context selector` when in errors (PR #5288)

Inaccurate `apollo_router_opened_subscriptions` counter (PR #5363)

Set Apollo metrics generation mode to `new` by default (PR #5265)