Skip to content

Commit

Permalink
Dynamic Custom Resource Deployments (#376)
Browse files Browse the repository at this point in the history
  • Loading branch information
timothysmith0609 committed Jan 21, 2019
1 parent a3d2f05 commit a591e52
Show file tree
Hide file tree
Showing 19 changed files with 890 additions and 48 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
@@ -1,5 +1,9 @@
## next

*Features*
- Add support for specifying pass/fail conditions of Custom Resources ([#376](https://github.com/Shopify/kubernetes-deploy/pull/376)).
- Add support for custom timeouts for Custom Resources([#376](https://github.com/Shopify/kubernetes-deploy/pull/376))

*Enhancements*
- Officially support Kubernetes 1.13 ([#409](https://github.com/Shopify/kubernetes-deploy/pull/409))

Expand Down
89 changes: 89 additions & 0 deletions README.md
Expand Up @@ -42,6 +42,7 @@ This repo also includes related tools for [running tasks](#kubernetes-run) and [
* [Customizing behaviour with annotations](#customizing-behaviour-with-annotations)
* [Running tasks at the beginning of a deploy](#running-tasks-at-the-beginning-of-a-deploy)
* [Deploying Kubernetes secrets (from EJSON)](#deploying-kubernetes-secrets-from-ejson)
* [Deploying custom resources](#deploying-custom-resources)

**KUBERNETES-RESTART**
* [Usage](#usage-1)
Expand Down Expand Up @@ -312,7 +313,95 @@ Since their data is only base64 encoded, Kubernetes secrets should not be commit
}
```

### Deploying custom resources

By default, kubernetes-deploy does not check the status of custom resources; it simply assumes that they deployed successfully. In order to meaningfully monitor the rollout of custom resources, kubernetes-deploy supports configuring pass/fail conditions using annotations on CustomResourceDefinitions (CRDs).

>Note:
This feature is only available on clusters running Kubernetes 1.11+ since it relies on the `metadata.generation` field being updated when custom resource specs are changed.

*Requirements:*

* The custom resource must expose a `status` subresource with an `observedGeneration` field.
* The `kubernetes-deploy.shopify.io/instance-rollout-conditions` annotation must be present on the CRD that defines the custom resource.
* (optional) The `kubernetes-deploy.shopify.io/instance-timeout` annotation can be added to the CRD that defines the custom resource to override the global default timeout for all instances of that resource. This annotation can use ISO8601 format or unprefixed ISO8601 time components (e.g. '1H', '60S').

#### Specifying pass/fail conditions

The presence of a valid `kubernetes-deploy.shopify.io/instance-rollout-conditions` annotation on a CRD will cause kubernetes-deploy to monitor the rollout of all instances of that custom resource. Its value can either be `"true"` (giving you the defaults described in the next section) or a valid JSON string with the following format:
```
'{
"success_conditions": [
{ "path": <JsonPath expression>, "value": <target value> }
... more success conditions
],
"failure_conditions": [
{ "path": <JsonPath expression>, "value": <target value> }
... more failure conditions
]
}'
```

For all conditions, `path` must be a valid JsonPath expression that points to a field in the custom resource's status. `value` is the value that must be present at `path` in order to fulfill a condition. For a deployment to be successful, _all_ `success_conditions` must be fulfilled. Conversely, the deploy will be marked as failed if _any one of_ `failure_conditions` is fulfilled. `success_conditions` are mandatory, but `failure_conditions` can be omitted (the resource will simply time out if it never reaches a successful state).

In addition to `path` and `value`, a failure condition can also contain `error_msg_path` or `custom_error_msg`. `error_msg_path` is a JsonPath expression that points to a field you want to surface when a failure condition is fulfilled. For example, a status condition may expose a `message` field that contains a description of the problem it encountered. `custom_error_msg` is a string that can be used if your custom resource doesn't contain sufficient information to warrant using `error_msg_path`. Note that `custom_error_msg` has higher precedence than `error_msg_path` so it will be used in favor of `error_msg_path` when both fields are present.

**Warning:**

You **must** ensure that your custom resource controller sets `.status.observedGeneration` to match the observed `.metadata.generation` of the monitored resource once its sync is complete. If this does not happen, kubernetes-deploy will not check success or failure conditions and the deploy will time out.

#### Example

As an example, the following is the default configuration that will be used if you set `kubernetes-deploy.shopify.io/instance-rollout-conditions: "true"` on the CRD that defines the custom resources you wish to monitor:

```
'{
"success_conditions": [
{
"path": "$.status.conditions[?(@.type == \"Ready\")].status",
"value": "True",
},
],
"failure_conditions": [
{
"path": '$.status.conditions[?(@.type == \"Failed\")].status',
"value": "True",
"error_msg_path": '$.status.conditions[?(@.type == \"Failed\")].message',
},
],
}'
```

The paths defined here are based on the [typical status properties](https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#typical-status-properties) as defined by the Kubernetes community. It expects the `status` subresource to contain a `conditions` array whose entries minimally specify `type`, `status`, and `message` fields.

You can see how these conditions relate to the following resource:

```
apiVersion: stable.shopify.io/v1
kind: Example
metadata:
generation: 2
name: example
namespace: namespace
spec:
...
status:
observedGeneration: 2
conditions:
- type: "Ready"
status: "False"
reason: "exampleNotReady"
message: "resource is not ready"
- type: "Failed"
status: "True"
reason: "exampleFailed"
message: "resource is failed"
```

- `observedGeneration == metadata.generation`, so kubernetes-deploy will check this resource's success and failure conditions.
- Since `$.status.conditions[?(@.type == "Ready")].status == "False"`, the resource is not considered successful yet.
- `$.status.conditions[?(@.type == "Failed")].status == "True"` means that a failure condition has been fulfilled and the resource is considered failed.
- Since `error_msg_path` is specified, kubernetes-deploy will log the contents of `$.status.conditions[?(@.type == "Failed")].message`, which in this case is: `resource is failed`.

# kubernetes-restart

Expand Down
1 change: 1 addition & 0 deletions kubernetes-deploy.gemspec
Expand Up @@ -31,6 +31,7 @@ Gem::Specification.new do |spec|
spec.add_dependency("statsd-instrument", '~> 2.3', '>= 2.3.2')
spec.add_dependency("oj", "~> 3.7")
spec.add_dependency("concurrent-ruby", "~> 1.1")
spec.add_dependency("jsonpath", "~> 0.9.6")

spec.add_development_dependency("bundler")
spec.add_development_dependency("rake", "~> 10.0")
Expand Down
41 changes: 23 additions & 18 deletions lib/kubernetes-deploy/deploy_task.rb
Expand Up @@ -6,6 +6,7 @@
require 'fileutils'
require 'kubernetes-deploy/kubernetes_resource'
%w(
custom_resource
cloudsql
config_map
deployment
Expand All @@ -24,7 +25,6 @@
elasticsearch
statefulservice
topic
bucket
stateful_set
cron_job
job
Expand All @@ -46,19 +46,6 @@ class DeployTask
include KubeclientBuilder
extend KubernetesDeploy::StatsD::MeasureMethods

PREDEPLOY_SEQUENCE = %w(
ResourceQuota
Cloudsql
Redis
Memcached
ConfigMap
PersistentVolumeClaim
ServiceAccount
Role
RoleBinding
Pod
)

PROTECTED_NAMESPACES = %w(
default
kube-system
Expand All @@ -73,6 +60,22 @@ class DeployTask
# extensions/v1beta1/ReplicaSet -- managed by deployments
# core/v1/Secret -- should not committed / managed by shipit

def predeploy_sequence
before_crs = %w(
ResourceQuota
)
after_crs = %w(
ConfigMap
PersistentVolumeClaim
ServiceAccount
Role
RoleBinding
Pod
)

before_crs + cluster_resource_discoverer.crds.map(&:kind) + after_crs
end

def prune_whitelist
wl = %w(
core/v1/ConfigMap
Expand Down Expand Up @@ -194,11 +197,11 @@ def cluster_resource_discoverer
end

def deploy_has_priority_resources?(resources)
resources.any? { |r| PREDEPLOY_SEQUENCE.include?(r.type) }
resources.any? { |r| predeploy_sequence.include?(r.type) }
end

def predeploy_priority_resources(resource_list)
PREDEPLOY_SEQUENCE.each do |resource_type|
predeploy_sequence.each do |resource_type|
matching_resources = resource_list.select { |r| r.type == resource_type }
next if matching_resources.empty?
deploy_resources(matching_resources, verify: true, record_summary: false)
Expand Down Expand Up @@ -254,12 +257,14 @@ def create_ejson_secrets(prune)

def discover_resources
resources = []
crds = cluster_resource_discoverer.crds.group_by(&:kind)
@logger.info("Discovering templates:")

TemplateDiscovery.new(@template_dir).templates.each do |filename|
split_templates(filename) do |r_def|
r = KubernetesResource.build(namespace: @namespace, context: @context, logger: @logger,
definition: r_def, statsd_tags: @namespace_tags)
crd = crds[r_def["kind"]]&.first
r = KubernetesResource.build(namespace: @namespace, context: @context, logger: @logger, definition: r_def,
statsd_tags: @namespace_tags, crd: crd)
resources << r
@logger.info(" - #{r.id}")
end
Expand Down
18 changes: 10 additions & 8 deletions lib/kubernetes-deploy/kubernetes_resource.rb
Expand Up @@ -31,24 +31,26 @@ class KubernetesResource
TIMEOUT_OVERRIDE_ANNOTATION = "kubernetes-deploy.shopify.io/timeout-override"

class << self
def build(namespace:, context:, definition:, logger:, statsd_tags:)
def build(namespace:, context:, definition:, logger:, statsd_tags:, crd: nil)
opts = { namespace: namespace, context: context, definition: definition, logger: logger,
statsd_tags: statsd_tags }
if definition["kind"].blank?
raise InvalidTemplateError.new("Template missing 'Kind'", content: definition.to_yaml)
end

begin
if KubernetesDeploy.const_defined?(definition["kind"])
klass = KubernetesDeploy.const_get(definition["kind"])
return klass.new(**opts)
end
rescue NameError
end

inst = new(**opts)
inst.type = definition["kind"]
inst
if crd
CustomResource.new(crd: crd, **opts)
else
inst = new(**opts)
inst.type = definition["kind"]
inst
end
end

def timeout
Expand Down Expand Up @@ -167,13 +169,13 @@ def exists?

def current_generation
return -1 unless exists? # must be different default than observed_generation
@instance_data["metadata"]["generation"]
@instance_data.dig("metadata", "generation")
end

def observed_generation
return -2 unless exists?
# populating this is a best practice, but not all controllers actually do it
@instance_data["status"]["observedGeneration"]
@instance_data.dig('status', 'observedGeneration')
end

def status
Expand Down
22 changes: 0 additions & 22 deletions lib/kubernetes-deploy/kubernetes_resource/bucket.rb

This file was deleted.

87 changes: 87 additions & 0 deletions lib/kubernetes-deploy/kubernetes_resource/custom_resource.rb
@@ -0,0 +1,87 @@
# frozen_string_literal: true
require 'jsonpath'

module KubernetesDeploy
class CustomResource < KubernetesResource
TIMEOUT_MESSAGE_DIFFERENT_GENERATIONS = <<~MSG
This resource's status could not be used to determine rollout success because it is not up-to-date
(.metadata.generation != .status.observedGeneration).
MSG

def initialize(namespace:, context:, definition:, logger:, statsd_tags: [], crd:)
super(namespace: namespace, context: context, definition: definition,
logger: logger, statsd_tags: statsd_tags)
@crd = crd
end

def timeout
timeout_override || @crd.timeout_for_instance || TIMEOUT
end

def deploy_succeeded?
return super unless rollout_conditions
return false unless observed_generation == current_generation

rollout_conditions.rollout_successful?(@instance_data)
end

def deploy_failed?
return super unless rollout_conditions
return false unless observed_generation == current_generation

rollout_conditions.rollout_failed?(@instance_data)
end

def failure_message
return super unless rollout_conditions
messages = rollout_conditions.failure_messages(@instance_data)
messages.join("\n") if messages.present?
end

def timeout_message
if rollout_conditions && current_generation != observed_generation
TIMEOUT_MESSAGE_DIFFERENT_GENERATIONS
else
super
end
end

def status
if !exists? || rollout_conditions.nil?
super
elsif deploy_succeeded?
"Healthy"
elsif deploy_failed?
"Unhealthy"
else
"Unknown"
end
end

def type
kind
end

def validate_definition(kubectl)
super

@crd.validate_rollout_conditions
rescue RolloutConditionsError => e
@validation_errors << "The CRD that specifies this resource is using invalid rollout conditions. " \
"Kubernetes-deploy will not be able to continue until those rollout conditions are fixed.\n" \
"Rollout conditions can be found on the CRD that defines this resource (#{@crd.name}), " \
"under the annotation #{CustomResourceDefinition::ROLLOUT_CONDITIONS_ANNOTATION}.\n" \
"Validation failed with: #{e}"
end

private

def kind
@definition["kind"]
end

def rollout_conditions
@crd.rollout_conditions
end
end
end

0 comments on commit a591e52

Please sign in to comment.