-
Notifications
You must be signed in to change notification settings - Fork 10
K8s ‐ Verify deployments availability
yogeshahiray edited this page Apr 16, 2026
·
1 revision
This rule validates that all deployments across all namespaces are available and ready. Deployments that lack an Available condition or have a non-True status are flagged as unavailable.
The rule queries all deployment objects in the cluster and examines their status conditions to identify any deployments that are not in a healthy, available state.
- Access to the OpenShift cluster with permissions to list deployments across all namespaces
- The
occommand-line tool configured and authenticated
Unavailable deployments can lead to:
- Application downtime: Services may become inaccessible to users
- Degraded performance: Reduced capacity if some replicas are unavailable
- Loss of redundancy: High availability guarantees may be compromised
Common scenarios that may cause deployments to become unavailable include:
-
Resource constraints
- Insufficient CPU or memory resources on nodes
- Resource quota limits exceeded in the namespace
- Pod eviction due to resource pressure
-
Image-related issues
- Image pull failures (invalid image name, authentication issues, registry unavailable)
- Missing or deleted container images
- Incompatible image architecture
-
Configuration errors
- Invalid environment variables or configuration maps
- Missing secrets required by the deployment
- Incorrect volume mount configurations
- Invalid container command or arguments
-
Pod failures
- Application crashes or startup failures
- Failed readiness or liveness probes
- Init container failures
- Persistent volume claim (PVC) binding issues
-
Node issues
- Node failures or evictions
- Taints/tolerations preventing pod scheduling
- Affinity/anti-affinity rules blocking placement
-
Network problems
- DNS resolution failures
- Network policy blocking traffic
- Service connectivity issues
-
Rollout issues
- Failed deployment updates
- Rollout stuck in progress
- Insufficient replicas during rolling update
# List all deployments with their availability status
oc get deployments --all-namespaces
# Get detailed deployment status in JSON format
oc get deployments --all-namespaces -o json | jq '.items[] | select(.status.conditions[] | select(.type=="Available" and .status!="True")) | {namespace: .metadata.namespace, name: .metadata.name, conditions: .status.conditions}'# Replace <namespace> and <deployment-name> with actual values
oc describe deployment <deployment-name> -n <namespace>Look for:
- Conditions section (especially Available, Progressing, ReplicaFailure)
- Events showing errors or warnings
- Replica status (desired vs ready vs available)
# List pods for the deployment
oc get pods -n <namespace> -l app=<deployment-label>
# Describe problematic pods
oc describe pod <pod-name> -n <namespace># Check current pod logs
oc logs <pod-name> -n <namespace>
# Check previous pod logs (if pod crashed)
oc logs <pod-name> -n <namespace> --previous# View recent events in the namespace
oc get events -n <namespace> --sort-by='.lastTimestamp'# Check node resources
oc adm top nodes
# Check pod resource requests/limits
oc describe deployment <deployment-name> -n <namespace> | grep -A 5 "Limits\|Requests"-
For image pull errors:
# Verify the image exists and is accessible oc describe deployment <deployment-name> -n <namespace> | grep Image # Check image pull secrets oc get secrets -n <namespace> # Update deployment with correct image or credentials oc set image deployment/<deployment-name> container-name=new-image:tag -n <namespace>
-
For resource constraints:
# Check resource quotas oc describe resourcequota -n <namespace> # Check limit ranges oc describe limitrange -n <namespace> # Adjust resource requests/limits if needed oc set resources deployment/<deployment-name> --limits=cpu=500m,memory=512Mi --requests=cpu=250m,memory=256Mi -n <namespace>
-
For configuration errors:
# Verify ConfigMaps exist oc get configmaps -n <namespace> # Verify Secrets exist oc get secrets -n <namespace> # Edit deployment to fix configuration oc edit deployment <deployment-name> -n <namespace>
-
For failed readiness/liveness probes:
# Check probe configuration oc get deployment <deployment-name> -n <namespace> -o yaml | grep -A 10 "livenessProbe\|readinessProbe" # Adjust probe timing or endpoints as needed oc edit deployment <deployment-name> -n <namespace>
-
For rollout issues:
# Check rollout status oc rollout status deployment/<deployment-name> -n <namespace> # Rollback to previous version if needed oc rollout undo deployment/<deployment-name> -n <namespace> # Pause rollout to investigate oc rollout pause deployment/<deployment-name> -n <namespace>
-
For node scheduling issues:
# Check node status oc get nodes # Check pod scheduling events oc describe pod <pod-name> -n <namespace> | grep -A 10 Events # Remove taints if blocking (use with caution) oc adm taint nodes <node-name> <taint-key>-
# Check deployment status
oc get deployment <deployment-name> -n <namespace>
# Verify all replicas are ready
oc get pods -n <namespace> -l app=<deployment-label>
# Check deployment conditions
oc get deployment <deployment-name> -n <namespace> -o jsonpath='{.status.conditions[?(@.type=="Available")]}'