Skip to content

Conversation

@t0mmylam
Copy link
Collaborator

New metrics:

  • skyhook_rollout_matched_nodes - Number of nodes matched by compartment selector
  • skyhook_rollout_ceiling - Maximum nodes allowed in progress simultaneously
  • skyhook_rollout_in_progress - Current nodes in progress
  • skyhook_rollout_completed - Nodes completed in compartment
  • skyhook_rollout_progress_percent - Completion percentage (0-100)
  • skyhook_rollout_current_batch - Current batch number
  • skyhook_rollout_consecutive_failures - Consecutive batch failures
  • skyhook_rollout_should_stop - Binary flag indicating if rollout should stop

Labels: skyhook_name, policy_name, compartment_name, strategy

Features:

  • Metrics integrated into existing reset-then-set reconciliation flow
  • Automatic cleanup on Skyhook deletion
  • Works in legacy mode (policy_name=legacy, compartment_name=__default__)
  • Label naming follows existing conventions (_name suffix for resource identifiers)

Testing:

  • Added rollout metrics checks to simple-skyhook e2e test
  • Added cleanup verification to delete-skyhook e2e test

@t0mmylam t0mmylam merged commit 84caf87 into main Oct 23, 2025
10 of 11 checks passed
@t0mmylam t0mmylam deleted the metrics branch October 23, 2025 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants