Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch dry run but like for real this time #943

Closed
Closed
Show file tree
Hide file tree
Changes from 18 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ jobs:
ruby-tests:
runs-on: ubuntu-latest

name: "Tests - Ruby ${{ matrix.ruby }} with Kubernetes ${{ matrix.kubernetes_version }}"
name: "Tests (${{matrix.test_suite}}) - Ruby ${{ matrix.ruby }} with Kubernetes ${{ matrix.kubernetes_version }}"
strategy:
fail-fast: false
matrix:
Expand All @@ -20,6 +20,11 @@ jobs:
- "1.26.4"
- "1.24.13"
- "1.23.17"
test_suite:
- "unit_test"
- "cli_test"
- "serial_integration_test"
- "integration_test"
Comment on lines +23 to +27
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

integration tests are flaky and take a long time to run. Adding this extra dimension so we don't have to sit through unit/cli/serial tests each time integration fails

include:
- kubernetes_version: "1.27.3"
kind_image: "kindest/node:v1.27.3@sha256:9dd3392d79af1b084671b05bcf65b21de476256ad1dcc853d9f3b10b4ac52dde"
Expand Down Expand Up @@ -52,4 +57,4 @@ jobs:

- name: Run tests
run: |
bin/test
bin/test ${{matrix.test_suite}}
23 changes: 15 additions & 8 deletions bin/test
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,23 @@ if [[ "${CI:-0}" != "0" ]]; then
PARALLELISM=2
fi

print_header "Run CLI Tests"
bundle exec rake cli_test
test_type=$1

print_header "Run Unit Tests"
bundle exec rake unit_test

print_header "Run Non-Parallel Integration Tests"
bundle exec rake serial_integration_test
case $test_type in
cli_test | unit_test | serial_integration_test)
print_header $test_type
bundle exec rake $test_type
;;

print_header "Run Parallel Integration Tests (MT_CPU=$PARALLELISM)"
PARALLELIZE_ME=1 MT_CPU=$PARALLELISM bundle exec rake integration_test
integration_test)
print_header "Run Parallel Integration Tests (MT_CPU=$PARALLELISM)"
PARALLELIZE_ME=1 MT_CPU=$PARALLELISM bundle exec rake integration_test
;;

*)
echo "Argument must be one of: unit_test, cli_test, serial_integration_test, integration_test"
;;
esac

test $err -eq 0
2 changes: 1 addition & 1 deletion lib/krane/bindings_parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ def parse_file(string)
when '.json'
bindings = parse_json(File.read(file_path))
when '.yaml', '.yml'
bindings = YAML.safe_load(File.read(file_path), permitted_classes: [], permitted_symbols: [],
bindings = YAML.safe_load(File.read(file_path), permitted_classes: [], permitted_symbols: [],
aliases: true, filename: file_path)
else
raise ArgumentError, "Supplied file does not appear to be JSON or YAML"
Expand Down
32 changes: 18 additions & 14 deletions lib/krane/deploy_task.rb
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,6 @@ def run(**args)
def run!(verify_result: true, prune: true)
start = Time.now.utc
@logger.reset

@logger.phase_heading("Initializing deploy")
validate_configuration(prune: prune)
resources = discover_resources
Expand Down Expand Up @@ -285,26 +284,29 @@ def validate_configuration(prune:)
measure_method(:validate_configuration)

def validate_resources(resources)
validate_globals(resources)
batch_dry_run_success = validate_dry_run(resources)
resources.select! { |r| r.selected?(@selector) } if @selector_as_filter
validate_globals(resources)
applyables, _ = resources.partition { |r| r.deploy_method == :apply }
batch_dry_run_success = validate_dry_run(applyables)
if batch_dry_run_success
applyables.map { |r| r.server_dry_run_validated = true }
end
Krane::Concurrency.split_across_threads(resources) do |r|
# No need to pass in kubectl (and do per-resource dry run apply) if batch dry run succeeded
if batch_dry_run_success
r.validate_definition(kubectl: nil, selector: @selector, dry_run: false)
else
r.validate_definition(kubectl: kubectl, selector: @selector, dry_run: true)
end
# No need to pass in kubectl as we batch dry run server-side apply above
r.validate_definition(kubectl: nil, selector: @selector, dry_run: false)
Comment on lines 294 to +296
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still validate ALL resources, because there are additional krane-specific checks. However, we never pass in kubectl so we never end up running a dry-run apply (neither client-side nor server-side)

end

failed_resources = resources.select(&:validation_failed?)
if failed_resources.present?
if failed_resources.present? || !batch_dry_run_success
failed_resources.each do |r|
content = File.read(r.file_path) if File.file?(r.file_path) && !r.sensitive_template_content?
record_invalid_template(logger: @logger, err: r.validation_error_msg,
filename: File.basename(r.file_path), content: content)
end
raise FatalDeploymentError, "Template validation failed"
raise FatalDeploymentError
end
rescue FatalDeploymentError => err
raise FatalDeploymentError, "Template validation failed"
end
measure_method(:validate_resources)

Expand All @@ -315,13 +317,15 @@ def validate_globals(resources)
end
global_names = FormattedLogger.indent_four(global_names.join("\n"))

message = "This command is namespaced and cannot be used to deploy global resources. "\
"Use GlobalDeployTask instead."
@logger.summary.add_paragraph(ColorizedString.new(message).yellow)
@logger.summary.add_paragraph(ColorizedString.new("Global resources:\n#{global_names}").yellow)
raise FatalDeploymentError, "This command is namespaced and cannot be used to deploy global resources. "\
"Use GlobalDeployTask instead."
raise FatalDeploymentError
Comment on lines +320 to +324
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moves around log lines to make the FatalDeploymentError handler simpler while preserving log output

end

def validate_dry_run(resources)
resource_deployer.dry_run(resources)
resource_deployer.dry_run!(resources)
end

def namespace_definition
Expand Down
2 changes: 1 addition & 1 deletion lib/krane/kubectl.rb
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ def retriable_err?(err, retry_whitelist)
end

def platform_agnostic_version(version_string)
if match = version_string.match(/v(?<version>\d+\.\d+\.\d+)/)
if (match = version_string.match(/v(?<version>\d+\.\d+\.\d+)/))
Gem::Version.new(match[:version])
end
end
Expand Down
2 changes: 1 addition & 1 deletion lib/krane/kubernetes_resource.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ class KubernetesResource
using PsychK8sCompatibility

attr_reader :name, :namespace, :context
attr_writer :type, :deploy_started_at, :global
attr_writer :type, :deploy_started_at, :global, :server_dry_run_validated

GLOBAL = false
TIMEOUT = 5.minutes
Expand Down
2 changes: 1 addition & 1 deletion lib/krane/kubernetes_resource/daemon_set.rb
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ def find_nodes(cache)
all_nodes.each_with_object([]) do |node_data, relevant_nodes|
next if node_data.dig('spec', 'unschedulable').to_s.downcase == 'true'
cond = node_data.dig('status', 'conditions').find { |c| c['type'].downcase == 'ready' }
next if (!cond.nil? && cond['status'].downcase != 'true')
next if !cond.nil? && cond['status'].downcase != 'true'
relevant_nodes << Node.new(definition: node_data)
end
end
Expand Down
55 changes: 51 additions & 4 deletions lib/krane/resource_deployer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,13 @@ def initialize(task_config:, prune_allowlist:, global_timeout:, current_sha: nil
@statsd_tags = statsd_tags
end

def dry_run(resources)
def dry_run!(resources)
apply_all(resources, true, dry_run: true)
true
end

def dry_run(resources)
dry_run!(resources)
rescue FatalDeploymentError
false
end
Expand Down Expand Up @@ -163,13 +167,15 @@ def apply_all(resources, prune, dry_run: false)
global_mode = resources.all?(&:global?)
out, err, st = kubectl.run(*command, log_failure: false, output_is_sensitive: output_is_sensitive,
attempts: 2, use_namespace: !global_mode)

tags = statsd_tags + (dry_run ? ['dry_run:true'] : ['dry_run:false'])
Krane::StatsD.client.distribution('apply_all.duration', Krane::StatsD.duration(start), tags: tags)
if st.success?
log_pruning(out) if prune
elsif dry_run
record_dry_run_apply_failure(err, resources: resources)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logging is just different enough to require separate methods for the dry-run and actual-run failures

raise FatalDeploymentError, "Command failed: #{Shellwords.join(command)}"
else
record_apply_failure(err, resources: resources) unless dry_run
record_apply_failure(err, resources: resources)
raise FatalDeploymentError, "Command failed: #{Shellwords.join(command)}"
end
end
Expand All @@ -183,11 +189,52 @@ def log_pruning(kubectl_output)
logger.summary.add_action("pruned #{pruned.length} #{'resource'.pluralize(pruned.length)}")
end

def record_dry_run_apply_failure(err, resources: [])
if err == "error: no objects passed to apply"
heading = ColorizedString.new(err).red
msg = FormattedLogger.indent_four("No resources could be applied: please ensure your label selectors select at least one resource")
logger.summary.add_paragraph("#{heading}\n#{msg}")
return
end
unidentified_errors = []
filenames_with_sensitive_content = resources
.select(&:sensitive_template_content?)
.map { |r| File.basename(r.file_path) }

err.each_line do |line|
bad_files = find_bad_files_from_kubectl_output(line)
unless bad_files.present?
unidentified_errors << line
next
end

bad_files.each do |f|
err_msg = f[:err]
if filenames_with_sensitive_content.include?(f[:filename])
# Hide the error and template contents in case it has sensitive information
record_invalid_template(logger: logger, err: "SUPPRESSED FOR SECURITY", filename: f[:filename],
content: nil)
else
record_invalid_template(logger: logger, err: err_msg, filename: f[:filename], content: f[:content])
end
end
end
return unless unidentified_errors.any?

if resources.none?(&:sensitive_template_content?)
template_contents = resources.map(&:file_path).map { |path| File.read(path) }
record_invalid_template(logger: logger, err: unidentified_errors.join("\n"), filename: "", content: template_contents.join("\n"))
else
warn_msg = "WARNING: There was an error applying some or all resources. The raw output may be sensitive and " \
"so cannot be displayed."
logger.summary.add_paragraph(ColorizedString.new(warn_msg).yellow)
end
end

def record_apply_failure(err, resources: [])
warn_msg = "WARNING: Any resources not mentioned in the error(s) below were likely created/updated. " \
"You may wish to roll back this deploy."
logger.summary.add_paragraph(ColorizedString.new(warn_msg).yellow)

unidentified_errors = []
filenames_with_sensitive_content = resources
.select(&:sensitive_template_content?)
Expand Down
13 changes: 1 addition & 12 deletions test/integration-serial/serial_deploy_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,9 @@ def test_apply_failure_with_sensitive_resources_hides_template_content
refute_logs_match(%r{Kubectl err:.*something/invalid})

assert_logs_match_all([
"Command failed: apply -f",
"Template validation failed",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously these errors would occur during the deploy step (since we threw away batch dry-run apply and fell back to client-side validations). Now that we trust batch dry-run, the failure occurs earlier

/Invalid template: Deployment-web.*\.yml/,
])

refute_logs_match("kind: Deployment") # content of the sensitive template
end

Expand Down Expand Up @@ -526,16 +525,6 @@ def test_resource_discovery_stops_deploys_when_fetch_crds_kubectl_errs
], in_order: true)
end

def test_batch_dry_run_apply_failure_falls_back_to_individual_resource_dry_run_validation
Krane::KubernetesResource.any_instance.expects(:validate_definition).with do |kwargs|
kwargs[:kubectl].is_a?(Krane::Kubectl) && kwargs[:dry_run]
end
deploy_fixtures("hello-cloud", subset: %w(secret.yml)) do |fixtures|
secret = fixtures["secret.yml"]["Secret"].first
secret["bad_field"] = "bad_key"
end
end

def test_batch_dry_run_apply_success_precludes_individual_resource_dry_run_validation
Krane::KubernetesResource.any_instance.expects(:validate_definition).with { |params| params[:dry_run] == false }
result = deploy_fixtures("hello-cloud", subset: %w(secret.yml))
Expand Down
Loading
Loading