Dynamic Custom Resource Deployments #376

timothysmith0609 · 2018-11-12T20:02:49Z

Motivation and Goals

required for #229
see #128

This PR represents the kubernetes-deploy side of our custom resource status implementation. The goal of this PR is to create a backwards compatible means of meaningfully monitoring the rollouts of custom resources. It has the following goals:

Hardcoded custom resources (e.g. redis.rb, cloudsql.rb) are deployed using the custom logic that exists for them.
Custom resources with no hardcoded logic and no annotations notifying kubernetes-deploy to treat the CR as a generic custom resource using our Pass/Fail convention will be deployed as before. E.g. warn the user that we don't know how to monitor the deployment of such a resource and assume it has passed
New case: A custom resource with no hardcoded logic but with an annotation declaring it implements our Pass/Fail status convention will be deployed using the generic CR watcher. This watcher observes the following states:
- deploy_succeeded? == true if the Ready condition on the CR status is true
- deploy_failed? == true if the Failed condition on the CR status is true
- The deploy is progressing if both Ready and Failed are false
- Custom timeouts for CRs can be placed on the owning CRD or, for more granularity, on specific instances of a CR spec

Implementation details

(OUTDATED) Changes to `PREDEPLOY_SEQUENCE`

Since we cannot know, a priori, the types of custom resources in a cluster, we must dynamically find them during the discovery phase. As an additional concern, I argue that the common case is that custom resources must be deployed before other kubernetes objects. That is, we need a working cloudsql before we can think of running a db-migrate pod, e.g. As it stands, we hardcode this priority inside the PREDEPLOY_SEQUENCE constant in deploy_task.rb. In order to maintain the rough ordering of the PREDEPLOY_SEQUENCE const while also handling the case of dynamic custom resource discovery, I have moved the creation of PREDEPLOY_SEQUENCE into 2 separate phases.

In the first phase, we hardcode the core kubernetes resources that we know ante deploy and place them in BASE_PREDEPLOY_SEQUENCE.
During discovery, we find all the CRDs and union them with BASE_PREDEPLOY_SEQUENCE. Using the result of this union, we then set the PREDEPLOY_SEQUENCE constant. See here
@stefanmb has proposed using a dependency graph to model deployment priority, but unless an explicit case can be made against the proposed implementation I think we can defer that issue for now.

(OUTDATED) Caching discovered CRDs

As an implementation detail, I have opted to cache the value of ResourceDiscovery.crds in an instance variable. Linking together CRs with their parent CRDs requires passing around the list of CRDs in a number of places, and it doesn't seem risky to avoid the extra work of rediscovering them for every call.

TODO

Tests
Annotation scheme for CRs -> e.g. annotation for declaring monitorability? Do we want additional annotations to declare which conditions map to ready/failed or should we enforce Ready and Failed?

cc @Shopify/cloudx

lib/kubernetes-deploy/deploy_task.rb

karanthukral

The method seems valid to me. I like having the generic CR class instead of attempting to dynamically define classes

timothysmith0609 · 2018-11-13T01:55:23Z

Example rollout.

Bucket exposes the monitor-rollout annotation and is processed as a generic CR (e.g. waiting for Ready status
Redis is hardcoded, we wait for the deployment
Memcached neither exposes the monitor-rollout annotation nor has a hardcoded implementation (it is removed for this example). It defaults to the "assuming successful deploy" behaviour

lib/kubernetes-deploy/deploy_task.rb

lib/kubernetes-deploy/kubernetes_resource.rb

lib/kubernetes-deploy/deploy_task.rb

lib/kubernetes-deploy/kubernetes_resource.rb

lib/kubernetes-deploy/deploy_task.rb

dturn · 2018-11-13T18:09:17Z

lib/kubernetes-deploy/resource_discovery.rb

@@ -10,7 +10,7 @@ def initialize(namespace:, context:, logger:, namespace_tags:)
    end

    def crds(sync_mediator)
-      sync_mediator.get_all(CustomResourceDefinition.kind).map do |r_def|
+      @crds ||= sync_mediator.get_all(CustomResourceDefinition.kind).map do |r_def|


What happens if you deploy a CRD and a CR at same time?

That's one shortcoming of the current approach, unfortunately. There's a few solutions available to us here:

Add discovered CRD specs to the top of the priority list. This seems risk-free since they should have no external dependencies

In the future, think about using a dependency graph to produce the priority list (e.g. a CloudSQL has something that says iDependOn: cloudsqls.stable.shopify.io. I'd say this is out of scope for this PR.

It's unclear to me whether this issue is handled right now, anyway. Not saying we shouldn't fix it here, just a note

That's one shortcoming of the current approach, unfortunately.

What is the exact behaviour though? It still falls back on a KubernetesResource with the "dunno what to do here" message, right?

I'm on the fence about introducing an annotation-driven dependency graph, but FWIW we did have someone request customized sequencing for another reason earlier this week. Regardless, I agree it doesn't need to be handled in this PR.

lib/kubernetes-deploy/kubernetes_resource/custom_resource.rb

dturn · 2018-11-13T18:16:47Z

lib/kubernetes-deploy/kubernetes_resource/custom_resource_definition.rb

@@ -2,6 +2,8 @@
 module KubernetesDeploy
  class CustomResourceDefinition < KubernetesResource
    TIMEOUT = 2.minutes
+    CHILD_CR_TIMEOUT_ANNOTATION = "kubernetes-deploy.shopify.io/cr-timeout-override"


You'll need to update the README.md

Why CHILD_ ?

Open to suggestions. I'm mainly trying to avoid the perennial issue of confusing CRDs with CRs by being more explicit

Why isn't this part of the other child rollout annotation?

Removing this annotation for now as I'm more concerned with the status aspect of this PR. If necessary, the base timeout-override annotation can be used in the meantime.

lib/kubernetes-deploy/deploy_task.rb

lib/kubernetes-deploy/kubernetes_resource.rb

lib/kubernetes-deploy/deploy_task.rb

timothysmith0609 · 2018-11-20T23:16:19Z

Configurable success/failure conditions

I've decided to add configurable success/failure statuses as part of this PR. Users can supply a JSON string to the kubernetes.shopify.io/cr-rollout-params that takes:

A JSON array of success conditions (JsonPath/expected value pairs)
A JSON array of failure conditions
... we can add whatever other configurable fields we desire

For convenience, default values (which conform to our buddies Status implementation) are used if such fields are missing.

Limitations

Currently, resources, such as CloudSQL, reference other Kubernetes objects to discern their readiness (in CloudSQLs case, its deployment + service). In the new implementation, deploying resources are only able to observe themselves.

lib/kubernetes-deploy/kubernetes_resource/custom_resource_definition.rb

lib/kubernetes-deploy/deploy_task.rb

lib/kubernetes-deploy/kubernetes_resource/custom_resource.rb

kubernetes-deploy.gemspec

KnVerey · 2018-12-11T23:25:09Z

lib/kubernetes-deploy/kubernetes_resource/custom_resource.rb

+      @definition["kind"]
+    end
+
+    def rollout_params


This concept of rollout_params and its structure isn't super clear to me in my first read of this class, and it seems like we always use it one piece at a time. Is there a better abstraction we can come up with here? Maybe CRD has a private ChildConfiguration object that it exposes, and that has methods like error_message_path and failure_status_path and such? (I dunno--don't take that specific suggestion too seriously)

lib/kubernetes-deploy/kubernetes_resource/custom_resource_definition.rb

KnVerey · 2018-12-11T23:43:13Z

lib/kubernetes-deploy/kubernetes_resource/custom_resource_definition.rb

@@ -2,6 +2,8 @@
 module KubernetesDeploy
  class CustomResourceDefinition < KubernetesResource
    TIMEOUT = 2.minutes
+    CHILD_CR_TIMEOUT_ANNOTATION = "kubernetes-deploy.shopify.io/cr-timeout-override"


Why isn't this part of the other child rollout annotation?

KnVerey · 2018-12-12T00:00:29Z

lib/kubernetes-deploy/resource_discovery.rb

@@ -10,7 +10,7 @@ def initialize(namespace:, context:, logger:, namespace_tags:)
    end

    def crds(sync_mediator)
-      sync_mediator.get_all(CustomResourceDefinition.kind).map do |r_def|
+      @crds ||= sync_mediator.get_all(CustomResourceDefinition.kind).map do |r_def|


That's one shortcoming of the current approach, unfortunately.

What is the exact behaviour though? It still falls back on a KubernetesResource with the "dunno what to do here" message, right?

I'm on the fence about introducing an annotation-driven dependency graph, but FWIW we did have someone request customized sequencing for another reason earlier this week. Regardless, I agree it doesn't need to be handled in this PR.

KnVerey

I think there's room for improvement in the way we model the rollout configuration data, but the overall class CustomResource approach seems good to me. When you start the tests, please make sure to include one that proves the correct classes get instantiated (KubernetesResource vs CustomResource vs hardcoded CR class).

lib/kubernetes-deploy/kubernetes_resource/custom_resource_definition.rb

KnVerey · 2018-12-12T00:03:19Z

lib/kubernetes-deploy/kubernetes_resource/custom_resource_definition.rb

+      params if validate_params(params)
+
+    rescue JSON::ParserError
+      raise FatalDeploymentError, "custom rollout params are not valid JSON: '#{rollout_params_string}'"


Should we actually fail the whole deploy on this, or is there a more graceful fallback behaviour we could adopt?

This is a value-judgement we'll need to make. My opinion is that, if users are opting-in to use this feature, we should consider it critical and fail fast if something goes wrong.

On the other hand, aborting a deploy because of some bad JSON might cause too much friction.

lib/kubernetes-deploy/kubernetes_resource/custom_resource_definition.rb

lib/kubernetes-deploy/kubernetes_resource/custom_resource.rb

timothysmith0609 · 2019-01-04T15:47:00Z

Example of custom query parameters:

'{
      "success_queries": [
        {
          "path": "$.status.conditions[?(@.type == "Ready")].status",
          "value":"success_value"
        },
        {
          "path":"$.spec.test_field",
          "value":"success_value"
        }
      ],
      "failure_queries": [
        {
          "path":"$.status.condition",
          "value":"failure_value",
          "custom_error_msg":"test custom error message"
        },
        {
          "path":"$.spec.test_field",
          "value":"failure_value",
          "error_msg_path":"$.spec.error_msg"
        }
      ]
    }'

lib/kubernetes-deploy/deploy_task.rb

lib/kubernetes-deploy/kubernetes_resource/custom_resource.rb

KnVerey · 2019-01-04T17:42:37Z

Two questions about the example:

How do multiple queries combine? Is it the same for success queries and failure queries?
Is the inclusion of custom_error_msg based on a real use case? This is not always true of course, but in cases where the CRD is the company's own, the error messages are already their own too.

add cr- prefix to rollout-conditions annotation README edit rename config ->conditions logs for serial deploy test police

serial integration test tweak" test failure_conditions optional policial

… prefix)

Co-Authored-By: timothysmith0609 <31742287+timothysmith0609@users.noreply.github.com>

KnVerey

Excellent work. Thanks for sticking with this long-running but very impactful feature!

test/integration-serial/serial_deploy_test.rb

lib/kubernetes-deploy/rollout_conditions.rb

timothysmith0609 self-assigned this Nov 12, 2018

timothysmith0609 requested review from dturn and KnVerey November 12, 2018 20:05

mkobetic reviewed Nov 12, 2018

View reviewed changes

lib/kubernetes-deploy/deploy_task.rb Outdated Show resolved Hide resolved

karanthukral reviewed Nov 12, 2018

View reviewed changes

timothysmith0609 force-pushed the dynamic_cr_capturing branch from bc6bf72 to b8d249a Compare November 13, 2018 02:11

timothysmith0609 commented Nov 13, 2018

View reviewed changes

lib/kubernetes-deploy/deploy_task.rb Outdated Show resolved Hide resolved

mkobetic reviewed Nov 13, 2018

View reviewed changes

lib/kubernetes-deploy/kubernetes_resource.rb Outdated Show resolved Hide resolved

mkobetic reviewed Nov 13, 2018

View reviewed changes

lib/kubernetes-deploy/deploy_task.rb Outdated Show resolved Hide resolved

dturn suggested changes Nov 13, 2018

View reviewed changes

timothysmith0609 changed the title ~~-WIP- Dynamic Custom Resource Deployments~~ Dynamic Custom Resource Deployments Nov 16, 2018

timothysmith0609 requested a review from stefanmb November 16, 2018 20:59

dturn suggested changes Nov 16, 2018

View reviewed changes

lib/kubernetes-deploy/deploy_task.rb Outdated Show resolved Hide resolved

dturn reviewed Nov 16, 2018

View reviewed changes

lib/kubernetes-deploy/kubernetes_resource.rb Outdated Show resolved Hide resolved

timothysmith0609 commented Nov 19, 2018

View reviewed changes

lib/kubernetes-deploy/deploy_task.rb Outdated Show resolved Hide resolved

timothysmith0609 force-pushed the dynamic_cr_capturing branch 2 times, most recently from f47b1aa to 3454521 Compare November 20, 2018 23:09

timothysmith0609 force-pushed the dynamic_cr_capturing branch from ffd511e to ad1d6ec Compare November 21, 2018 15:15

mkobetic reviewed Nov 21, 2018

View reviewed changes

lib/kubernetes-deploy/kubernetes_resource/custom_resource_definition.rb Outdated Show resolved Hide resolved

mkobetic reviewed Nov 21, 2018

View reviewed changes

lib/kubernetes-deploy/deploy_task.rb Outdated Show resolved Hide resolved

dturn reviewed Nov 26, 2018

View reviewed changes

lib/kubernetes-deploy/deploy_task.rb Outdated Show resolved Hide resolved

lib/kubernetes-deploy/deploy_task.rb Outdated Show resolved Hide resolved

KnVerey reviewed Dec 12, 2018

View reviewed changes

KnVerey suggested changes Dec 12, 2018

View reviewed changes

timothysmith0609 force-pushed the dynamic_cr_capturing branch from 3c9cae8 to d044ef7 Compare January 4, 2019 15:45

dturn reviewed Jan 4, 2019

View reviewed changes

lib/kubernetes-deploy/deploy_task.rb Outdated Show resolved Hide resolved

lib/kubernetes-deploy/kubernetes_resource/custom_resource.rb Outdated Show resolved Hide resolved

timothysmith0609 force-pushed the dynamic_cr_capturing branch from 84e997c to 2355fa3 Compare January 4, 2019 18:17

timothysmith0609 and others added 18 commits January 18, 2019 13:15

README, tests, renaming + small edits

87de50b

add cr- prefix to rollout-conditions annotation README edit rename config ->conditions logs for serial deploy test police

typo

26c873c

README explain timeout syntax

1f1b29a

[ci skip] wip

4982c76

tweaks for test, rollout_conditions, cr, crd

85d70cc

serial integration test tweak" test failure_conditions optional policial

integration test when no failure_conditions present

85e8ab6

out-of-band invalid CRD fails CR deploy. Annotation change (remove cr…

ce6b919

… prefix)

rubocop

4628ffa

Apply suggestions from code review

4033ae1

Co-Authored-By: timothysmith0609 <31742287+timothysmith0609@users.noreply.github.com>

merge PR suggestions

6eea88e

README updates

e57de3b

small fixes

de7248b

fix test from moving CRD to predeploy sequence

6c86654

No global resources (CRD) in predeploy sequence

5919593

pr review

9e2f1d2

partial pr review

b401f71

more pr review

2e7e5bf

pr review

44f890a

timothysmith0609 force-pushed the dynamic_cr_capturing branch from e4dbe90 to 44f890a Compare January 18, 2019 18:37

timothysmith0609 added 6 commits January 18, 2019 14:10

Better timeout message, replace @statsd_tags with [] in relevant tests

49d7d8e

Final status handling

a1e99b2

typo

00b45cf

remove old code

afde0fd

errors per CR test fix

68a8dea

RuntimeError -> StandardError

3a43c06

KnVerey approved these changes Jan 18, 2019

View reviewed changes

test/integration-serial/serial_deploy_test.rb Show resolved Hide resolved

test/integration-serial/serial_deploy_test.rb Show resolved Hide resolved

test/integration-serial/serial_deploy_test.rb Show resolved Hide resolved

lib/kubernetes-deploy/rollout_conditions.rb Show resolved Hide resolved

timothysmith0609 added 3 commits January 18, 2019 15:49

pr review

f1b8fed

police

81fab4f

update changelog

d217b15

timothysmith0609 merged commit a591e52 into master Jan 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic Custom Resource Deployments #376

Dynamic Custom Resource Deployments #376

timothysmith0609 commented Nov 12, 2018 •

edited

Loading

karanthukral left a comment

timothysmith0609 commented Nov 13, 2018 •

edited

Loading

dturn Nov 13, 2018

timothysmith0609 Nov 13, 2018

KnVerey Dec 12, 2018

dturn Nov 13, 2018

mkobetic Nov 21, 2018

timothysmith0609 Nov 21, 2018

KnVerey Dec 11, 2018

timothysmith0609 Jan 4, 2019

timothysmith0609 commented Nov 20, 2018 •

edited

Loading

KnVerey Dec 11, 2018 •

edited

Loading

KnVerey Dec 11, 2018

KnVerey Dec 12, 2018

KnVerey left a comment

KnVerey Dec 12, 2018

timothysmith0609 Dec 19, 2018

timothysmith0609 commented Jan 4, 2019

KnVerey commented Jan 4, 2019

KnVerey left a comment

Dynamic Custom Resource Deployments #376

Dynamic Custom Resource Deployments #376

Conversation

timothysmith0609 commented Nov 12, 2018 • edited Loading

Motivation and Goals

Implementation details

(OUTDATED) Changes to PREDEPLOY_SEQUENCE

(OUTDATED) Caching discovered CRDs

TODO

karanthukral left a comment

Choose a reason for hiding this comment

timothysmith0609 commented Nov 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timothysmith0609 commented Nov 20, 2018 • edited Loading

Configurable success/failure conditions

Limitations

KnVerey Dec 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KnVerey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timothysmith0609 commented Jan 4, 2019

KnVerey commented Jan 4, 2019

KnVerey left a comment

Choose a reason for hiding this comment

timothysmith0609 commented Nov 12, 2018 •

edited

Loading

(OUTDATED) Changes to `PREDEPLOY_SEQUENCE`

timothysmith0609 commented Nov 13, 2018 •

edited

Loading

timothysmith0609 commented Nov 20, 2018 •

edited

Loading

KnVerey Dec 11, 2018 •

edited

Loading