New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prep release: v1.13.0 #2841
Merged
Merged
prep release: v1.13.0 #2841
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
abernix
commented
Mar 23, 2023
abernix
requested review from
garypen,
SimonSapin,
BrynCooke,
chandrikas and
Geal
March 23, 2023 13:21
Geal
reviewed
Mar 23, 2023
garypen
approved these changes
Mar 23, 2023
BrynCooke
approved these changes
Mar 23, 2023
chandrikas
reviewed
Mar 23, 2023
Co-authored-by: Chandrika Srinivasan <chandrikas@users.noreply.github.com> Co-authored-by: Geoffroy Couprie <apollo@geoffroycouprie.com>
chandrikas
approved these changes
Mar 23, 2023
Merged
abernix
added a commit
that referenced
this pull request
Apr 13, 2023
The rename of a newly introduced metric in Apollo Router 1.13.0 was logged in the CHANGELOG using the _wrong_ metric name. The metric was renamed from `apollo_router_uplink_duration_seconds_bucket` to `apollo_router_uplink_fetch_duration_seconds_bucket` in #2826, but we failed to catch this discrepancy in the changelog for the [v1.13.0 release]. Ref: #2826 [v1.13.0 release]: #2841
abernix
added a commit
that referenced
this pull request
Apr 14, 2023
The rename of a newly introduced metric in Apollo Router 1.13.0 was logged in the CHANGELOG using the _wrong_ metric name. The metric was renamed from `apollo_router_uplink_duration_seconds_bucket` to `apollo_router_uplink_fetch_duration_seconds_bucket` in #2826, but we failed to catch this discrepancy in the changelog for the v1.13.0 [release]. Ref: #2826 [release]: #2841
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚀 Features
Uplink metrics and improved logging (Issue #2769, Issue #2815, Issue #2816)
For monitoring, observability and debugging requirements around Uplink-related behaviors (those which occur as part of Managed Federation) the router now emits better log messages and emits new metrics around these facilities. The new metrics are:
apollo_router_uplink_duration_seconds_bucket
: A histogram of durations with the following attributes:url
: The URL that was polledquery
:SupergraphSdl
orEntitlement
type
:new
,unchanged
,http_error
,uplink_error
, orignored
code
: The error code, depending ontype
error
: The error messageapollo_router_uplink_fetch_count_total
: A gauge that counts the overall success (status="success"
) or failure (status="failure"
) counts occur when communicating to Uplink without taking into account fallback.Here's an example of what these new metrics look like from the Prometheus scraping endpoint:
By @BrynCooke in #2779, #2817, #2819 #2826
🐛 Fixes
Only process Uplink messages that are deemed to be newer (Issue #2794)
Uplink is backed by multiple cloud providers to ensure high availability. However, this means that there will be periods of time where Uplink endpoints do not agree on what the latest data is. They are eventually consistent.
This has not been a problem for most users, as the default mode of operation for the router is to fallback to the secondary Uplink endpoint if the first fails.
The other mode of operation, is round-robin, which is triggered only when setting the
APOLLO_UPLINK_ENDPOINTS
environment variable. In this mode there is a much higher chance that the router will end up flapping due to disagreement between the Apollo Uplink servers or any user-provided proxies set into this variable.This change introduces two fixes:
We will be improving the robustness of the solution over the next weeks, including via other fixes in this release, so this can be seen as an incremental improvement.
By @BrynCooke in #2803 #2826
Distributed caching: Don't send Redis'
CLIENT SETNAME
We won't send the
CLIENT SETNAME
command to connected Redis servers. This resolves an incompatibility with some Redis-compatible servers since not all "Redis-compatible" offerings (like Google Memorystore) actually support every Redis command. We weren't actually necessitating this feature, it was just a feature that could be enabled optionally on our Redis client. No Router functionality is impacted.By @Geal in #2825
Support bare top-level
__typename
when aliased (Issue #2792)PR #1762 implemented support for the query
{ __typename }
but it did not work properly if the top-level standalone__typename
field was aliased. This now works properly.By @glasser in #2791
Maintain errors set on
_entities
(Issue #2731)In their responses, some subgraph implementations do not return errors per entity but instead on the entire path. We now transmit those, irregardless.
By @Geal in #2756
📃 Configuration
Custom OpenTelemetry Datadog exporter mapping (Issue #2228)
This PR fixes the issue with the Datadog exporter not providing meaningful contextual data in the Datadog traces.
There is a known issue where OpenTelemetry is not fully compatible with Datadog.
To fix this, the
opentelemetry-datadog
crate added custom mapping functions.Now, when
enable_span_mapping
is set totrue
, the Apollo Router will perform the following mapping:For example:
Let's say we send a query
MyQuery
to the Apollo Router, then the Router using the operation's query plan will send a query tomy-subgraph-name
, producing the following trace:As you can see, there is no clear information about the name of the query, the name of the subgraph, or the name of query sent to the subgraph.
Instead, with this new
enable_span_mapping
setting set totrue
, the following trace will be created:All this logic is gated behind the configuration
enable_span_mapping
which, if set totrue
, will take the values from the span attributes.By @samuelAndalon in #2790
🛠 Maintenance
Migrate
xtask
CLI parsing fromStructOpt
toClap
(Issue #2807)As an internal improvement to our tooling, we've migrated our
xtask
toolset fromStructOpt
toClap
, sinceStructOpt
is in maintenance mode.By @BrynCooke in #2808
Subgraph configuration override (Issue #2426)
We've introduced a new generic wrapper type for subgraph-level configuration, with the following behaviour:
all
, it applies to all subgraphs. If it is not there, the default values applysubgraphs
for a specific named subgraph:all
all
, or default values, if applicableBy @Geal in #2453
Add integration tests for Uplink URLs (Issue #2827)
We've added integration tests to ensure that all Uplink URLs can be contacted and data can be retrieved in an expected format.
We've also changed our URLs to align exactly with Gateway, to simplify our own documentation. Existing Router users do not need to take any action as we support both on our infrastructure.
By @BrynCooke in #2830, #2834
Improve integration test harness (Issue #2809)
Our internal integration test harness has been simplified.
By @BrynCooke in #2810
Use
kubeconform
to validate the Router's Helm manifest (Issue #1914)We've had a couple cases where errors have been inadvertently introduced to our Helm charts. These have required fixes such as this fix. So far, we've been relying on manual testing and inspection, but we've reached the point where automation is desired. This change uses
kubeconform
to ensure that the YAML generated by our Helm manifest is indeed valid. Errors may still be possible, but this should at least prevent basic errors from occurring. This information will be surfaced in our CI checks.By @garypen in #2835
📚 Documentation
Re-point links going via redirect to their true sources
Some of our documentation links were pointing to pages which have been renamed and received new page names during routine documentation updates. While the links were not broken (the former links redirected to the new URLs) we've updated them to avoid the extra hop
By @o0Ignition0o in #2780
Fix coprocessor docs about subgraph URI mutability
The subgraph
uri
is (and always has been) mutable when responding to theSubgraphRequest
stage in a coprocessor.By @lennyburdette in #2801