Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progressive override support #4521

Merged
merged 54 commits into from
Feb 1, 2024
Merged

Progressive override support #4521

merged 54 commits into from
Feb 1, 2024

Conversation

trevor-scheer
Copy link
Member

@trevor-scheer trevor-scheer commented Jan 18, 2024

This change brings support for progressive @override, which allows dynamically overriding root fields and entity fields in the schema. This feature is enterprise only and requires a license key to be used.

Support for this is implemented via plugin and support for a new context key whose value is provided as options to query planning.

We expect users to interact with and resolve labels themselves via coprocessors, so we're providing an interface via a known context key in order to communicate to and from. The coprocessors need to know which labels are available in the schema and they need to tell the router which labels to set.

We also expect users to use a "special-case" label format percent(x) which the plugin will resolve out of the box.

All of this is to say, the plugin has a few responsibilities:

  1. At the beginning of a request (router_service), the plugin will add all (non-percent) labels from the schema to the known context key apollo_override::unresolved_labels.
  2. Coprocessors (during supergraph_request) may read from this context entry and append labels to be overridden to the apollo_override::labels_to_override context entry.
  3. The plugin (during its supergraph_request) will resolve percentage-based labels and append to the apollo_override::labels_to_override context entry.
  4. Additionally, the plugin will traverse the operation to obtain only the relevant labels for the given operation. It will then take the intersection of those labels and all labels in the apollo_override::labels_to_override context entry to ensure the set of labels is minimized, sorted, etc.

It's important to note that the set of overridden labels is an input to query planning, and thus becomes part of the cache key. For every relevant label in an operation, the number of potential QP cache entries doubles. Step 4 is important to ensure we don't create any more cache entries per operation than is necessary.

TODO:

  • Add launchdarkly coprocessor example in examples folder
  • More tests

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@router-perf
Copy link

router-perf bot commented Jan 18, 2024

CI performance tests

  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • large-request - Stress test with a 1 MB request payload
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • reload - Reload test over a long period of time at a constant rate of users
  • no-graphos - Basic stress test, no GraphOS.
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • xxlarge-request - Stress test with 100 MB request payload
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • const - Basic stress test that runs with a constant number of users
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode

@trevor-scheer trevor-scheer force-pushed the trevor/progressive-override-plugin branch from ed08b96 to 9be2786 Compare January 20, 2024 20:58

This comment has been minimized.

@trevor-scheer trevor-scheer force-pushed the trevor/progressive-override-plugin branch from 07ae1fe to 0fe90a7 Compare January 24, 2024 22:26
@trevor-scheer trevor-scheer changed the title WIP: Progressive override support Progressive override support Jan 25, 2024
@trevor-scheer trevor-scheer marked this pull request as ready for review January 25, 2024 02:55
})
}

pub(crate) fn directive_name(
schema: &apollo_compiler::schema::Schema,
url: &str,
base_url: &str,
min_version: &str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only min_version? If we change a directive to add a field and expect a meaningful behaviour change from it, a router that does not know how to interpret that directive should reject it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you think here
9803b4b

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this implementation meant to align with the Versioning section of the Core specification? (I started an internal conversation about link vs core, but for the purposes of this PR, I am mostly just questioning whether — for review purposes — the intent is for this to match that implementation as defined in the core spec, irregardless of the overall spec-status itself.)

apollo-router/src/uplink/license_enforcement.rs Outdated Show resolved Hide resolved
apollo-router/src/query_planner/caching_query_planner.rs Outdated Show resolved Hide resolved
apollo-router/src/query_planner/caching_query_planner.rs Outdated Show resolved Hide resolved
apollo-router/src/query_planner/bridge_query_planner.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/progressive_override/mod.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/progressive_override/mod.rs Outdated Show resolved Hide resolved
@trevor-scheer trevor-scheer force-pushed the trevor/progressive-override-plugin branch from 0d66273 to af91538 Compare January 29, 2024 23:37
apollo-router/src/plugin/mod.rs Outdated Show resolved Hide resolved
apollo-router/src/plugin/mod.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/progressive_override/mod.rs Outdated Show resolved Hide resolved
examples/coprocessor-override-launchdarkly/src/index.ts Outdated Show resolved Hide resolved
apollo-router/src/uplink/license_enforcement.rs Outdated Show resolved Hide resolved
.build(),
SchemaRestriction::Directive {
name: "@authenticated".to_string(),
url: "https://specs.apollo.dev/authenticated/v0.1".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a topic for a follow up PR: should the schema restrictions also work on a version range? Or a URL prefix like https://specs.apollo.dev/authenticated/?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I resolved a couple related bugs and implemented the version range in a separate PR.
#4570

* Require schema spec enforcements to specify a version req
* Require directive argument enforcements to specify a version req
* Handle `as` renames gracefully for the directive argument enforcement
* Search the schema for actual usages instead of the definition for
  directive argument enforcements
The initial implementation of directive argument schema enforcement is
lacking:
* It doesn't handle renames of the `join` spec via `as`, meaning the
restriction could be circumvented via a `join` rename.
* It only looked for the directive argument on the definition rather
than actual usages. If the arg is defined but unused, we shouldn't
enforce anything. It might exist in the definition simply as a result of
using the latest versions of subgraph libs/composition.

Additionally, this PR introduces the requirement of specifying a
`semver::VersionReq` for enforcement.
@Geal Geal merged commit 083d234 into dev Feb 1, 2024
14 checks passed
@Geal Geal deleted the trevor/progressive-override-plugin branch February 1, 2024 08:04
This was referenced Feb 1, 2024
Geal added a commit that referenced this pull request Feb 5, 2024
## 🚀 Features

### Specify Trace ID Formatting ([PR
#4530](#4530))

This adds the ability to specify the format of the trace ID in the
response headers of the supergraph service.

An example configuration making use of this feature is shown below:
```yaml
telemetry:
  apollo:
    client_name_header: name_header
    client_version_header: version_header
  exporters:
    tracing:
      experimental_response_trace_id:
        enabled: true
        header_name: trace_id
        format: decimal # Optional, defaults to hexadecimal
```

If the format is not specified, then the trace ID will continue to be in
hexadecimal format.

By [@nicholascioli](https://github.com/nicholascioli) in
#4530

### Introduce support for progressive `@override` ([PR
#4521](#4521))

The change brings support for progressive `@override`, which allows
dynamically overriding root fields and entity fields in the schema. This
feature is enterprise only and requires a license key to be used.

A new `label` argument is added to the `@override` directive in order to
indicate the field is dynamically overridden. Labels can come in two
forms:
1) String matching the form `percent(x)`: The router resolves these
labels based on the `x` value. For example, `percent(50)` will route 50%
of requests to the overridden field and 50% of requests to the original
field.
2) Arbitrary string matching the regex `^[a-zA-Z][a-zA-Z0-9_-:./]*$`:
These labels are expected to be resolved externally via coprocessor. A
coprocessor a supergraph request hook can inspect and modify the context
of a request in order to inform the router which labels to use during
query planning.

Please consult the docs for more information on how to use this feature
and how to implement a coprocessor for label resolution.

By [@trevorscheer](https://github.com/TrevorScheer) in
#4521

### Add a new selector to get all baggage key values in span attributes
([Issue #4425](#4425))

If you have several baggage items and would like to add all of them
directly as span attributes, for example `baggage: my_item=test,
my_second_item=bar` will add 2 span attributes `my_item=test` and
`my_second_item=bar`.

Example of configuration:

```yaml
telemetry:
  instrumentation:
    spans:
      router:
        attributes:
          baggage: true
```

By [@bnjjj](https://github.com/bnjjj) in
#4537

### Create a trace during router creation and plugin initialization
([Issue #4472](#4472))

When the router starts or reload, it will now generate a trace with
spans for query planner creation, schema parsing, plugin initialization
and request pipeline creation. This will help in debugging any issue
during startup, especially in plugins creation

By [@Geal](https://github.com/Geal) in
#4480

### Add static attribute on specific span in telemetry settings ([Issue
#4561](#4561))

Example of configuration:

```yaml
telemetry:
  instrumentation:
    spans:
      router:
        attributes:
          "my_attribute": "constant_value"
      supergraph:
        attributes:
          "my_attribute": "constant_value"
      subgraph:
        attributes:
          "my_attribute": "constant_value"
```

By [@bnjjj](https://github.com/bnjjj) in
#4566

## 🐛 Fixes

### Order HPA target so that kubernetes does not rewrite ([Issue
#4435](#4435))

This update addresses an OutOfSync issue in ArgoCD applications when
Horizontal Pod Autoscaler (HPA) is configured with both memory and CPU
limits.
Previously, the live and desired manifests within Kubernetes were not
consistently sorted, leading to persistent OutOfSync states in ArgoCD.
This change implements a sorting mechanism for HPA targets within the
Helm chart, ensuring alignment with Kubernetes' expected order.
This fix proactively resolves the sync discrepancies while using HPA,
circumventing the need to wait for Kubernetes' issue resolution
(kubernetes/kubernetes#74099).

By [@cyberhck](https://github.com/cyberhck) in
#4436

### Reactivate log events in traces ([PR
#4486](#4486))

This fixes a regression introduced in #2999, where events were not sent
with traces anymore due to too aggressive sampling

By [@Geal](https://github.com/Geal) in
#4486

### Fix inconsistency in environment variable parsing for telemetry
([Issue
#3203](https://github.com/apollographql/router/issues/ISSUE_NUMBER))

Previously, the router would complain when using the rover
recommendation of `APOLLO_TELEMETRY_DISABLED=1` environment
variable. Now any non-falsey value can be used, such as 1, yes, on,
etc..

By [@nicholascioli](https://github.com/nicholascioli) in
#4549

### Store static pages in `Bytes` structure to avoid expensive
allocation per request ([PR
#4528](#4528))

The `CheckpointService` created by the `StaticPageLayer` caused a
non-insignificant amount of memory to be allocated on every request. The
service stack gets cloned on every request, and so does the rendered
template.

The template is now stored in a `Bytes` struct instead which is cheap to
clone.

By [@xuorig](https://github.com/xuorig) in
#4528

### Fix header propagation issues ([Issue
#4312](#4312)), ([Issue
#4398](#4398))

This fixes two header propagation issues:
* if a client request header has already been added to a subgraph
request due to another header propagation rule, then it is only added
once
* `Accept`, `Accept-Encoding` and `Content-Encoding` were not in the
list of reserved headers that cannot be propagated. They are now in that
list because those headers are set explicitely by the Router in its
subgraph requests

There is a potential regression: if a router deployment was accidentally
relying on header propagation to compress subgraph requests, then it
will not work anymore because `Content-Encoding` is not propagated
anymore. Instead it should be set up from the `traffic_shaping` section
of the Router configuration:

```yaml
traffic_shaping:
  all:
    compression: gzip
  subgraphs: # Rules applied to requests from the router to individual subgraphs
    products:
      compression: identity
```

By [@Geal](https://github.com/Geal) in
#4535

## 🧪 Experimental

### Move cacheability metrics to the entity cache plugin ([Issue
#4253](#4253))

The metric was generated in the telemetry plugin before, but it was not
very convenient to keep it there. This adds more configuration:
- enable or disable the metrics
- set the metrics storage TTL (default is 60s)
- the metric's typename attribute is disabled by default. Activating it
can greatly increase the cardinality

This also includes some cleanup and performance improvements

By [@Geal](https://github.com/Geal) in
#4469

---------

Co-authored-by: Edward Huang <edward.huang@apollographql.com>
Co-authored-by: Jeremy Lempereur <jeremy.lempereur@iomentum.com>
Co-authored-by: Jesse Rosenberger <git@jro.cc>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants