-
Notifications
You must be signed in to change notification settings - Fork 10
K8s ‐ Verify cluster operators available
This rule verifies that all OpenShift cluster operators are in Available state, not Degraded, not stuck Progressing, and are Upgradeable. Cluster operators are essential components required for the cluster to function properly — they manage core platform capabilities such as authentication, networking, storage, and the API server.
The rule retrieves all ClusterOperator resources via oc get clusteroperators -o json and checks each operator's conditions:
- The
Availablecondition must havestatus: "True"— fails if not - The
Degradedcondition must not havestatus: "True"— fails if degraded - The
Progressingcondition should not havestatus: "True"— warns if progressing - The
Upgradeablecondition should not havestatus: "False"— warns if not upgradeable
If any operator is unavailable or degraded, the rule reports it as failed. If operators are progressing or not upgradeable (but otherwise available and not degraded), the rule reports a warning.
- Access to the OpenShift cluster with permissions to list ClusterOperator resources
- The
occommand-line tool configured and authenticated
If cluster operators are not in Available state:
- Cluster functionality loss: Core platform features (authentication, networking, ingress, monitoring) may be partially or fully unavailable
- Workload disruption: Applications relying on cluster services (e.g., image registry, DNS, storage) may fail
- Upgrade blocking: Unavailable, degraded, or not-upgradeable operators will block cluster upgrades
- Cascading failures: One unavailable operator can cause dependent operators to degrade
Common scenarios that may lead to unavailable or degraded cluster operators:
- A failed cluster upgrade that left operators in a transitional state
- Node failures or reboots that disrupted operator pods
- Resource exhaustion (CPU, memory, disk) on control plane nodes
- etcd cluster health issues affecting operator coordination
- Certificate expiration preventing operator communication
- Network connectivity issues between control plane components
- Storage backend failures affecting operators that rely on persistent storage
oc get clusteroperators
Look for operators with Available=False, Degraded=True, Progressing=True, or Upgradeable=False.
oc describe clusteroperator <operator-name>
Look for:
- Conditions section showing Available, Degraded, Progressing, and Upgradeable states
- Recent events indicating failures or transitions
- Version information for upgrade-related issues
oc get pods -n openshift-<operator-name> -o wide
Verify operator pods are running and ready on the expected nodes.
oc logs -n openshift-<operator-name> deployment/<operator-name> --tail=100
oc get events --all-namespaces --sort-by='.lastTimestamp' | grep -i operator
- Identify the specific operator(s) that are unavailable, degraded, progressing, or not upgradeable from the rule output
- Check the operator's conditions and events for the root cause
- Address the underlying issue (see specific scenarios below)
# Check if any operators are still progressing
oc get clusteroperators | grep -E 'True.*True|False'
# Force operator reconciliation by deleting the operator pod
oc delete pod -n openshift-<operator-name> -l name=<operator-name>
# Check node status
oc get nodes
# Check if operator pods are scheduled on healthy nodes
oc get pods -n openshift-<operator-name> -o wide
# Check certificate expiration
oc get secret -n openshift-<operator-name> -o jsonpath='{.items[*].metadata.name}'
# Approve pending CSRs if any
oc get csr | grep Pending
oc adm certificate approve <csr-name>
# Check what is blocking the upgrade
oc get clusteroperator <operator-name> -o json | jq '.status.conditions[] | select(.type=="Upgradeable")'
# Review operator logs for upgrade blockers
oc logs -n openshift-<operator-name> deployment/<operator-name> --tail=200 | grep -i upgrade
# Confirm all operators are Available and not Degraded
oc get clusteroperators
# Verify no operators are Degraded or unavailable
oc get clusteroperators -o json | jq '.items[] | select(.status.conditions[] | select(.type=="Available" and .status!="True")) | .metadata.name'
# Verify no operators are Progressing or not Upgradeable
oc get clusteroperators -o json | jq '.items[] | select(.status.conditions[] | select((.type=="Progressing" and .status=="True") or (.type=="Upgradeable" and .status=="False"))) | .metadata.name'