K8s ‐ Verify deployments availability

K8s - Verify all deployments are available

Description

This rule validates that all deployments across all namespaces are available and ready. Deployments that lack an Available condition or have a non-True status are flagged as unavailable.

The rule queries all deployment objects in the cluster and examines their status conditions to identify any deployments that are not in a healthy, available state.

Prerequisites

Access to the OpenShift cluster with permissions to list deployments across all namespaces
The oc command-line tool configured and authenticated

Impact

Unavailable deployments can lead to:

Application downtime: Services may become inaccessible to users
Degraded performance: Reduced capacity if some replicas are unavailable
Loss of redundancy: High availability guarantees may be compromised

Root Cause

Common scenarios that may cause deployments to become unavailable include:

Resource constraints
- Insufficient CPU or memory resources on nodes
- Resource quota limits exceeded in the namespace
- Pod eviction due to resource pressure
Image-related issues
- Image pull failures (invalid image name, authentication issues, registry unavailable)
- Missing or deleted container images
- Incompatible image architecture
Configuration errors
- Invalid environment variables or configuration maps
- Missing secrets required by the deployment
- Incorrect volume mount configurations
- Invalid container command or arguments
Pod failures
- Application crashes or startup failures
- Failed readiness or liveness probes
- Init container failures
- Persistent volume claim (PVC) binding issues
Node issues
- Node failures or evictions
- Taints/tolerations preventing pod scheduling
- Affinity/anti-affinity rules blocking placement
Network problems
- DNS resolution failures
- Network policy blocking traffic
- Service connectivity issues
Rollout issues
- Failed deployment updates
- Rollout stuck in progress
- Insufficient replicas during rolling update

Diagnostics

1. Identify unavailable deployments

# List all deployments with their availability status
oc get deployments --all-namespaces

# Get detailed deployment status in JSON format
oc get deployments --all-namespaces -o json | jq '.items[] | select(.status.conditions[] | select(.type=="Available" and .status!="True")) | {namespace: .metadata.namespace, name: .metadata.name, conditions: .status.conditions}'

2. Describe the problematic deployment

# Replace <namespace> and <deployment-name> with actual values
oc describe deployment <deployment-name> -n <namespace>

Look for:

Conditions section (especially Available, Progressing, ReplicaFailure)
Events showing errors or warnings
Replica status (desired vs ready vs available)

3. Check pod status

# List pods for the deployment
oc get pods -n <namespace> -l app=<deployment-label>

# Describe problematic pods
oc describe pod <pod-name> -n <namespace>

4. Review pod logs

# Check current pod logs
oc logs <pod-name> -n <namespace>

# Check previous pod logs (if pod crashed)
oc logs <pod-name> -n <namespace> --previous

5. Check events

# View recent events in the namespace
oc get events -n <namespace> --sort-by='.lastTimestamp'

6. Check resource availability

# Check node resources
oc adm top nodes

# Check pod resource requests/limits
oc describe deployment <deployment-name> -n <namespace> | grep -A 5 "Limits\|Requests"

Solution

General troubleshooting steps:

For image pull errors:

# Verify the image exists and is accessible
oc describe deployment <deployment-name> -n <namespace> | grep Image

# Check image pull secrets
oc get secrets -n <namespace>

# Update deployment with correct image or credentials
oc set image deployment/<deployment-name> container-name=new-image:tag -n <namespace>

For resource constraints:

# Check resource quotas
oc describe resourcequota -n <namespace>

# Check limit ranges
oc describe limitrange -n <namespace>

# Adjust resource requests/limits if needed
oc set resources deployment/<deployment-name> --limits=cpu=500m,memory=512Mi --requests=cpu=250m,memory=256Mi -n <namespace>

For configuration errors:

# Verify ConfigMaps exist
oc get configmaps -n <namespace>

# Verify Secrets exist
oc get secrets -n <namespace>

# Edit deployment to fix configuration
oc edit deployment <deployment-name> -n <namespace>

For failed readiness/liveness probes:

# Check probe configuration
oc get deployment <deployment-name> -n <namespace> -o yaml | grep -A 10 "livenessProbe\|readinessProbe"

# Adjust probe timing or endpoints as needed
oc edit deployment <deployment-name> -n <namespace>

For rollout issues:

# Check rollout status
oc rollout status deployment/<deployment-name> -n <namespace>

# Rollback to previous version if needed
oc rollout undo deployment/<deployment-name> -n <namespace>

# Pause rollout to investigate
oc rollout pause deployment/<deployment-name> -n <namespace>

For node scheduling issues:

# Check node status
oc get nodes

# Check pod scheduling events
oc describe pod <pod-name> -n <namespace> | grep -A 10 Events

# Remove taints if blocking (use with caution)
oc adm taint nodes <node-name> <taint-key>-

Verify the fix:

# Check deployment status
oc get deployment <deployment-name> -n <namespace>

# Verify all replicas are ready
oc get pods -n <namespace> -l app=<deployment-label>

# Check deployment conditions
oc get deployment <deployment-name> -n <namespace> -o jsonpath='{.status.conditions[?(@.type=="Available")]}'

Resources

OpenShift Deployments Documentation

K8s ‐ Verify deployments availability

K8s - Verify all deployments are available

Description

Prerequisites

Impact

Root Cause

Diagnostics

1. Identify unavailable deployments

2. Describe the problematic deployment

3. Check pod status

4. Review pod logs

5. Check events

6. Check resource availability

Solution

General troubleshooting steps:

Verify the fix:

Resources

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally