Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return success before fully deployed #208

Merged
merged 7 commits into from
Jan 11, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,10 +122,19 @@ You can add additional variables using the `--bindings=BINDINGS` option. For exa


### Customizing behaviour with annotations

- `kubernetes-deploy.shopify.io/timeout-override`: Override the tool's hard timeout for one specific resource. Both full ISO8601 durations and the time portion of ISO8601 durations are valid. Value must be between 1 second and 24 hours.
- _Example values_: 45s / 3m / 1h / PT0.25H
- _Compatibility_: all resource types (Note: `Deployment` timeouts are based on `spec.progressDeadlineSeconds` if present, and that field has a default value as of the `apps/v1beta1` group version. Using this annotation will have no effect on `Deployment`s that time out with "Timeout reason: ProgressDeadlineExceeded".)
- `kubernetes-deploy.shopify.io/required-rollout`: Modifies how much of the rollout needs to finish
before the deployment is considered successful.
- _Compatibility_: Deployment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be consistent about how we reference objects and whether they're in backticks. The new uses both "ReplicaSet" and "replicaSet", and has "deployment" and "Deployment" and "deploy". The existing text above uses "Deployment".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eye. I'm going to standardize on deployment and replicaSet

- `full`: The deployment is successful when all pods in the new `replicaSet` are ready.
- `none`: The deployment is successful as soon as the new `replicaSet` is created for the deployment.
- `maxUnavailable`: The deploy is successful when minimum availability is reached in the new `replicaSet`.
In other words, the number of new pods that must be ready is equal to `spec.replicas` - `strategy.RollingUpdate.maxUnavailable`
(converted from percentages by rounding up, if applicable). This option is only valid for deployments
that use the `RollingUpdate` strategy.


### Running tasks at the beginning of a deploy

Expand Down
74 changes: 65 additions & 9 deletions lib/kubernetes-deploy/kubernetes_resource/deployment.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
module KubernetesDeploy
class Deployment < KubernetesResource
TIMEOUT = 7.minutes
REQUIRED_ROLLOUT_ANNOTATION = 'kubernetes-deploy.shopify.io/required-rollout'
REQUIRED_ROLLOUT_TYPES = %w(maxUnavailable full none).freeze
DEFAULT_REQUIRED_ROLLOUT = 'full'

def sync
raw_json, _err, st = kubectl.run("get", type, @name, "--output=json")
Expand All @@ -19,13 +22,15 @@ def sync
conditions = deployment_data.fetch("status", {}).fetch("conditions", [])
@progress_condition = conditions.find { |condition| condition['type'] == 'Progressing' }
@progress_deadline = deployment_data['spec']['progressDeadlineSeconds']
@max_unavailable = deployment_data.dig('spec', 'strategy', 'rollingUpdate', 'maxUnavailable')
else # reset
@latest_rs = nil
@rollout_data = { "replicas" => 0 }
@status = nil
@progress_condition = nil
@progress_deadline = @definition['spec']['progressDeadlineSeconds']
@desired_replicas = -1
@max_unavailable = @definition.dig('spec', 'strategy', 'rollingUpdate', 'maxUnavailable')
end
end

Expand All @@ -43,10 +48,23 @@ def fetch_logs
def deploy_succeeded?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to start unit testing this now that it's pretty complex. It could reduce the number of possibly-flakey integration tests we'd need (i.e. separate %/# tests would be unnecessary).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit tests

return false unless @latest_rs.present?

@latest_rs.deploy_succeeded? &&
@latest_rs.desired_replicas == @desired_replicas && # latest RS fully scaled up
@rollout_data["updatedReplicas"].to_i == @desired_replicas &&
@rollout_data["updatedReplicas"].to_i == @rollout_data["availableReplicas"].to_i
case required_rollout
when 'full'
@latest_rs.deploy_succeeded? &&
@latest_rs.desired_replicas == @desired_replicas && # latest RS fully scaled up
@rollout_data["updatedReplicas"].to_i == @desired_replicas &&
@rollout_data["updatedReplicas"].to_i == @rollout_data["availableReplicas"].to_i
when 'none'
true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is untested. I think a test using a bad readiness probe + the annotation could work nicely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a unit test for this

when 'maxUnavailable'
minimum_needed = min_available_replicas

@latest_rs.desired_replicas >= minimum_needed &&
@latest_rs.ready_replicas >= minimum_needed &&
@latest_rs.available_replicas >= minimum_needed
else
raise FatalDeploymentError, rollout_annotation_err_msg
end
end

def deploy_failed?
Expand Down Expand Up @@ -81,8 +99,29 @@ def exists?
@found
end

def validate_definition
super

unless REQUIRED_ROLLOUT_TYPES.include?(required_rollout)
@validation_errors << rollout_annotation_err_msg
end

strategy = @definition.dig('spec', 'strategy', 'type').to_s
if required_rollout.downcase == 'maxunavailable' && strategy.downcase != 'rollingupdate'
@validation_errors << "'#{REQUIRED_ROLLOUT_ANNOTATION}: #{required_rollout}' is incompatible "\
"with strategy '#{strategy}'"
end

@validation_errors.empty?
end

private

def rollout_annotation_err_msg
"'#{REQUIRED_ROLLOUT_ANNOTATION}: #{required_rollout}' is invalid. "\
"Acceptable values: #{REQUIRED_ROLLOUT_TYPES.join(', ')}"
end

def deploy_failing_to_progress?
return false unless @progress_condition.present?

Expand All @@ -98,18 +137,22 @@ def deploy_failing_to_progress?
Time.parse(@progress_condition["lastUpdateTime"]).to_i >= (@deploy_started_at - 5.seconds).to_i
end

def find_latest_rs(deployment_data)
label_string = deployment_data["spec"]["selector"]["matchLabels"].map { |k, v| "#{k}=#{v}" }.join(",")
def all_rs_data(match_labels)
label_string = match_labels.map { |k, v| "#{k}=#{v}" }.join(",")
raw_json, _err, st = kubectl.run("get", "replicasets", "--output=json", "--selector=#{label_string}")
return unless st.success?
return {} unless st.success?

JSON.parse(raw_json)["items"]
end

all_rs_data = JSON.parse(raw_json)["items"]
def find_latest_rs(deployment_data)
current_revision = deployment_data["metadata"]["annotations"]["deployment.kubernetes.io/revision"]

latest_rs_data = all_rs_data.find do |rs|
latest_rs_data = all_rs_data(deployment_data["spec"]["selector"]["matchLabels"]).find do |rs|
rs["metadata"]["ownerReferences"].any? { |ref| ref["uid"] == deployment_data["metadata"]["uid"] } &&
rs["metadata"]["annotations"]["deployment.kubernetes.io/revision"] == current_revision
end

return unless latest_rs_data.present?

rs = ReplicaSet.new(
Expand All @@ -123,5 +166,18 @@ def find_latest_rs(deployment_data)
rs.sync(latest_rs_data)
rs
end

def min_available_replicas
if @max_unavailable =~ /%/
(@desired_replicas * (100 - @max_unavailable.to_i) / 100.0).ceil
else
@desired_replicas - @max_unavailable.to_i
end
end

def required_rollout
@definition.dig('metadata', 'annotations', REQUIRED_ROLLOUT_ANNOTATION).presence ||
DEFAULT_REQUIRED_ROLLOUT
end
end
end
8 changes: 7 additions & 1 deletion lib/kubernetes-deploy/kubernetes_resource/replica_set.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@
module KubernetesDeploy
class ReplicaSet < PodSetBase
TIMEOUT = 5.minutes
attr_reader :desired_replicas, :pods
attr_reader :desired_replicas, :ready_replicas, :available_replicas, :pods

def initialize(namespace:, context:, definition:, logger:, parent: nil, deploy_started_at: nil)
@parent = parent
@deploy_started_at = deploy_started_at
@rollout_data = { "replicas" => 0 }
@desired_replicas = -1
@ready_replicas = -1
@available_replicas = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same thing should technically be done in the reset section at L35 too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pods = []
super(namespace: namespace, context: context, definition: definition, logger: logger)
end
Expand All @@ -26,6 +28,8 @@ def sync(rs_data = nil)
@rollout_data = { "replicas" => 0 }.merge(
rs_data["status"].slice("replicas", "availableReplicas", "readyReplicas")
)
@ready_replicas = @rollout_data['readyReplicas'].to_i
@available_replicas = @rollout_data["availableReplicas"].to_i
@status = @rollout_data.map { |state_replicas, num| "#{num} #{state_replicas.chop.pluralize(num)}" }.join(", ")
@pods = find_pods(rs_data)
else # reset
Expand All @@ -34,6 +38,8 @@ def sync(rs_data = nil)
@status = nil
@pods = []
@desired_replicas = -1
@ready_replicas = -1
@available_replicas = -1
end
end

Expand Down
62 changes: 62 additions & 0 deletions test/fixtures/for_unit_tests/deployment_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: web
uid: foobar
annotations:
"deployment.kubernetes.io/revision": "1"
spec:
replicas: 3
progressDeadlineSeconds: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
selector:
matchLabels:
name: web
app: hello-cloud
template:
metadata:
labels:
name: web
app: hello-cloud
spec:
containers:
- name: app
image: busybox
status:
replicas: 3
conditions:
- type: Progressing
status: True
lastUpdateTime: "2018-01-09 22:56:45 UTC"

---
apiVersion: apps/v1beta1
kind: ReplicaSet
metadata:
name: web-1
annotations:
"deployment.kubernetes.io/revision": "1"
ownerReferences:
- uid: foobar
spec:
replicas: 3
selector:
matchLabels:
name: web
app: hello-cloud
template:
metadata:
labels:
name: web
app: hello-cloud
spec:
containers:
- name: app
image: busybox
status:
replicas: 3
34 changes: 34 additions & 0 deletions test/fixtures/slow-cloud/web.yml.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: web
annotations:
shipit.shopify.io/restart: "true"
kubernetes-deploy.shopify.io/required-rollout: maxUnavailable
spec:
replicas: 2
selector:
matchLabels:
name: web
app: slow-cloud
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
template:
metadata:
labels:
name: web
app: slow-cloud
sha: "<%= current_sha %>"
spec:
terminationGracePeriodSeconds: 0
containers:
- name: app
image: busybox
imagePullPolicy: IfNotPresent
command: ["tail", "-f", "/dev/null"]
ports:
- containerPort: 80
name: http
24 changes: 24 additions & 0 deletions test/integration/kubernetes_deploy_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -688,6 +688,30 @@ def test_can_deploy_deployment_with_zero_replicas
])
end

def test_deploy_successful_with_partial_availability
result = deploy_fixtures("slow-cloud", sha: "deploy1")
assert_deploy_success(result)

result = deploy_fixtures("slow-cloud", sha: "deploy2") do |fixtures|
dep = fixtures["web.yml.erb"]["Deployment"].first
container = dep["spec"]["template"]["spec"]["containers"].first
container["readinessProbe"] = {
"exec" => { "command" => %w(sleep 5) },
"timeoutSeconds" => 6
}
end
assert_deploy_success(result)

new_pods = kubeclient.get_pods(namespace: @namespace, label_selector: 'name=web,app=slow-cloud,sha=deploy2')
assert new_pods.length >= 1, "Expected at least one new pod, saw #{new_pods.length}"

new_ready_pods = new_pods.select do |pod|
pod.status.phase == "Running" &&
pod.status.conditions.any? { |condition| condition["type"] == "Ready" && condition["status"] == "True" }
end
assert_equal 1, new_ready_pods.length, "Expected exactly one new pod to be ready, saw #{new_ready_pods.length}"
end

def test_deploy_aborts_immediately_if_metadata_name_missing
result = deploy_fixtures("hello-cloud", subset: ["configmap-data.yml"]) do |fixtures|
definition = fixtures["configmap-data.yml"]["ConfigMap"].first
Expand Down
30 changes: 30 additions & 0 deletions test/integration/restart_task_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,36 @@ def test_restart_failure
in_order: true)
end

def test_restart_successful_with_partial_availability
result = deploy_fixtures("slow-cloud") do |fixtures|
web = fixtures["web.yml.erb"]["Deployment"].first
web["spec"]["strategy"]['rollingUpdate']['maxUnavailable'] = '50%'
container = web["spec"]["template"]["spec"]["containers"].first
container["readinessProbe"] = {
"exec" => { "command" => %w(sleep 5) },
"timeoutSeconds" => 6
}
end
assert_deploy_success(result)

restart = build_restart_task
assert_restart_success(restart.perform(["web"]))

pods = kubeclient.get_pods(namespace: @namespace, label_selector: 'name=web,app=slow-cloud')
new_pods = pods.select do |pod|
pod.spec.containers.any? { |c| c["name"] == "app" && c.env&.find { |n| n.name == "RESTARTED_AT" } }
end
assert new_pods.length >= 1, "Expected at least one new pod, saw #{new_pods.length}"

new_ready_pods = new_pods.select do |pod|
pod.status.phase == "Running" &&
pod.status.conditions.any? { |condition| condition["type"] == "Ready" && condition["status"] == "True" }
end
assert_equal 1, new_ready_pods.length, "Expected exactly one new pod to be ready, saw #{new_ready_pods.length}"

assert fetch_restarted_at("web"), "RESTARTED_AT is present after the restart"
end

private

def build_restart_task
Expand Down
Loading