Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prep release: v1.49.0 #5473

Merged
merged 10 commits into from
Jun 18, 2024
Merged

prep release: v1.49.0 #5473

merged 10 commits into from
Jun 18, 2024

Conversation

lrlna
Copy link
Member

@lrlna lrlna commented Jun 18, 2024

Note

When approved, this PR will merge into the 1.49.0 branch which will — upon being approved itself — merge into main.

Things to review in this PR:

  • Changelog correctness (There is a preview below, but it is not necessarily the most up to date. See the Files Changed for the true reality.)
  • Version bumps
  • That it targets the right release branch (1.49.0 in this case!).

🚀 Features

Override tracing span names using custom span selectors (Issue #5261)

Adds the ability to override span names by setting the otel.name attribute on any custom telemetry selectors .

This example changes the span name to router:

telemetry:
  instrumentation:
    spans:
      router:
        otel.name:
           static: router # Override the span name to router

By @bnjjj in #5365

Add description and units to standard instruments (PR #5407)

This PR adds description and units to standard instruments available in the router. These descriptions and units have been copy pasted directly from the OpenTelemetry semantic conventions and are needed for better integrations with APMs.

By @bnjjj in #5407

Add Extensions with_lock() to try and avoid timing issues (PR #5360)

It's easy to trip over issues when interacting with Extensions because we inadvertently hold locks for too long. This can be a source of bugs in the router and causes a lot of tests to be flaky.

with_lock() avoids this kind of problem by explicitly restricting the lifetime of the Extensions lock.

By @garypen in #5360

Add support for unix_ms_now in Rhai customizations (Issue #5182)

Rhai customizations can now use the unix_ms_now() function to obtain the current Unix timestamp in milliseconds since the Unix epoch.

For example:

fn supergraph_service(service) {
    let now = unix_ms_now();
}

By @shaikatzz in #5181

🐛 Fixes

Improve error message produced when subgraphs responses don't include an expected content-type header value (Issue #5359)

To enhance debuggability when a subgraph response lacks an expected content-type header value, the error message now includes additional details.

Examples:

HTTP fetch failed from 'test': subgraph response contains invalid 'content-type' header value \"application/json,application/json\"; expected content-type: application/json or content-type: application/graphql-response+json
HTTP fetch failed from 'test': subgraph response does not contain 'content-type' header; expected content-type: application/json or content-type: application/graphql-response+json

By @IvanGoncharov in #5223

Prevent formatting in hot path (PR #5405)

Removes unneeded formatting in the hot path for demand control to improve performance.

By @BrynCooke in #5405

Skip hashing the entire schema on every query plan cache lookup (PR #5374)

This fixes performance issues when looking up query plans for large schemas.

⚠️ Because this feature changes the query plan cache key, distributed caches will need to be repopulated.

By @Geal in #5374

Optimize GraphQL instruments (PR #5375)

When processing selectors for GraphQL instruments, heap allocations should be avoided for optimal performance. This change removes Vec allocations that were previously performed per field, yielding significant performance improvements.

By @BrynCooke in #5375

Log metrics overflow as a warning rather than an error (Issue #5173)

If a metric has too high a cardinality, the following is displayed as a warning instead of an error:

OpenTelemetry metric error occurred: Metrics error: Warning: Maximum data points for metric stream exceeded/ Entry added to overflow

By @bnjjj in #5287

Add support of response_context selector when in errors (PR #5288)

Provides the ability to configure custom instruments. For example:

http.server.request.timeout:
  type: counter
  value: unit
  description: "request in timeout"
  unit: request
  attributes:
    graphql.operation.name:
      response_context: operation_name
  condition:
    eq:
    - "request timed out"
    - error: reason

By @bnjjj in #5288

Inaccurate apollo_router_opened_subscriptions counter (PR #5363)

Fixes the apollo_router_opened_subscriptions counter which previously only incremented. The counter now also decrements.

By @bnjjj in #5363

📃 Configuration

Rename trace telemetry selector (PR #5337)

v1.48.0 introduced the apollo trace_id selector. trace_id is a misnomer for this metric, since the selector actually represents a GraphOS Studio operation ID. To access this selector, use studio_operation_id:

telemetry:
  instrumentation:
    spans:
      router:
        "studio.operation.id":
            studio_operation_id: true

By @bnjjj in #5337

Set Apollo metrics generation mode to new by default (PR #5265)

Changes the default value of experimental_apollo_metrics_generation_mode to new. All metrics are showing that identical signatures are being generated in this mode.

By @bonnici in #5265

🛠 Maintenance

Skip GraphOS tests when Apollo key not present (PR #5362)

Some tests require APOLLO_KEY and APOLLO_GRAPH_REF to execute successfully.
These are now skipped if these env variables are not present.

By @BrynCooke in #5362

📚 Documentation

Standard instrument configuration documentation for subgraphs (PR #5422)

Added documentation about standard instruments available at the subgraph service level:

  • http.client.request.body.size - A histogram of request body sizes for requests handled by subgraphs.
  • http.client.request.duration - A histogram of request durations for requests handled by subgraphs.
  • http.client.response.body.size - A histogram of response body sizes for requests handled by subgraphs.

These instruments are configurable in router.yaml:

telemetry:
  instrumentation:
    instruments:
      subgraph:
        http.client.request.body.size: true # (default false)
        http.client.request.duration: true # (default false)
        http.client.response.body.size: true # (default false)

By @bnjjj in #5422

Update docs frontmatter for consistency and discoverability (PR #5164)

Makes title case consistent for page titles and adds subtitles and meta-descriptions are updated for better discoverability.

By @Meschreiber in #5164

🧪 Experimental

Warm query plan cache using persisted queries on startup (Issue #5334)

Adds support for the router to use persisted queries to warm the query plan cache upon startup using a new experimental_prewarm_query_plan_cache configuration option under persisted_queries.

To enable:

persisted_queries:
  enabled: true
  experimental_prewarm_query_plan_cache: true

By @lleadbet in #5340

Apollo reporting signature enhancements (PR #5062)

Adds a new experimental configuration option to turn on some enhancements for the Apollo reporting stats report key:

  • Signatures will include the full normalized form of input objects
  • Signatures will include aliases
  • Some small normalization improvements

This new configuration (telemetry.apollo.experimental_apollo_signature_normalization_algorithm) only works when in experimental_apollo_metrics_generation_mode: new mode and we don't yet recommend enabling it while we continue to verify that the new functionality works as expected.

By @bonnici in #5062

Add experimental support for sending traces to Studio via OTLP (PR #4982)

As the ecosystem around OpenTelemetry (OTel) has been expanding rapidly, we are evaluating a migration of Apollo's internal
tracing system to use an OTel-based protocol.

In the short-term, benefits include:

  • A comprehensive way to visualize the router execution path in GraphOS Studio.
  • Additional spans that were previously not included in Studio traces, such as query parsing, planning, execution, and more.
  • Additional metadata such as subgraph fetch details, router idle / busy timing, and more.

Long-term, we see this as a strategic enhancement to consolidate these two disparate tracing systems.
This will pave the way for future enhancements to more easily plug into the Studio trace visualizer.

Configuration

This change adds a new configuration option experimental_otlp_tracing_sampler. This can be used to send
a percentage of traces via OTLP instead of the native Apollo Usage Reporting protocol. Supported values:

  • always_off (default): send all traces via Apollo Usage Reporting protocol.
  • always_on: send all traces via OTLP.
  • 0.0 - 1.0: the ratio of traces to send via OTLP (0.5 = 50 / 50).

Note that this sampler is only applied after the common tracing sampler, for example:

Sample 1% of traces, send all traces via OTLP:

telemetry:
  apollo:
    # Send all traces via OTLP
    experimental_otlp_tracing_sampler: always_on

  exporters:
    tracing:
      common:
        # Sample traces at 1% of all traffic
        sampler: 0.01

by @timbotnik in #4982

@router-perf
Copy link

router-perf bot commented Jun 18, 2024

CI performance tests

  • reload - Reload test over a long period of time at a constant rate of users
  • no-tracing - Basic stress test, no tracing
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • xlarge-request - Stress test with 10 MB request payload
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • step - Basic stress test that steps up the number of users over time
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • xxlarge-request - Stress test with 100 MB request payload
  • large-request - Stress test with a 1 MB request payload
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • const - Basic stress test that runs with a constant number of users

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Member

@abernix abernix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mistakenly pressed Enter, but meant to push Request changes with my last review.

lrlna and others added 4 commits June 18, 2024 15:00
Co-authored-by: Jesse Rosenberger <git@jro.cc>
Co-authored-by: Jesse Rosenberger <git@jro.cc>
Co-authored-by: Jesse Rosenberger <git@jro.cc>
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
abernix and others added 3 commits June 18, 2024 17:58
Co-authored-by: Bryn Cooke <BrynCooke@gmail.com>
Co-authored-by: Bryn Cooke <BrynCooke@gmail.com>
Copy link
Member

@abernix abernix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@abernix abernix requested a review from BrynCooke June 18, 2024 15:25
@lrlna lrlna merged commit a8a977e into 1.49.0 Jun 18, 2024
12 checks passed
@lrlna lrlna deleted the prep-1.49.0 branch June 18, 2024 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants