Skip to content

K8s ‐ Verify internal image registry

yogeshahiray edited this page Apr 16, 2026 · 1 revision

K8s - Verify internal image registry is configured and available

Description

This rule validates that the OpenShift internal image registry is properly configured and available for use. The internal image registry is a built-in container image registry that runs within the OpenShift cluster, allowing users to push and pull container images without requiring an external registry.

The rule performs a two-step validation:

  1. Prerequisite check: Verifies that the image registry management state is set to "Managed" (not "Removed" or "Unmanaged")
  2. Availability check: If managed, ensures that all registry pods in the openshift-image-registry namespace are running and ready

The internal registry is a critical component for CI/CD pipelines, image builds, and development workflows within OpenShift.

Prerequisites

  • The oc command-line tool configured and authenticated
  • Access to view image registry configuration and pods
  • Cluster administrator privileges for registry configuration changes

Impact

An unavailable or misconfigured internal image registry can lead to:

  • Deployment failures: Applications configured to pull from internal registry cannot deploy
  • Template deployment failures: Templates referencing internal registry images cannot instantiate

Root Cause

Common scenarios that may cause the internal registry to be unavailable include:

  1. Management state configuration

    • Registry intentionally set to "Removed" state
    • Registry set to "Unmanaged" state (for external registry use)
    • Fresh cluster installation without registry configuration
    • Registry disabled during cluster upgrade or maintenance
  2. Storage configuration issues

    • No storage backend configured for registry
    • PersistentVolumeClaim (PVC) binding failures
    • StorageClass unavailable or misconfigured
    • Insufficient storage capacity
    • Storage backend (S3, Azure Blob, GCS, Swift, etc.) unavailable
    • Incorrect storage credentials or permissions
  3. Pod failures

    • Registry pods crashing or failing to start
    • Failed readiness or liveness probes
    • Container image pull failures
    • Insufficient node resources to schedule registry pods
    • Configuration errors in registry deployment
  4. Resource constraints

    • Insufficient CPU or memory on nodes
    • Registry pods evicted due to resource pressure
    • Node capacity exhausted
    • Resource quotas blocking pod creation

Diagnostics

1. Check image registry management state

# Get image registry configuration
oc get config.imageregistry.operator.openshift.io cluster -o yaml

# Check management state specifically
oc get config.imageregistry.operator.openshift.io cluster -o jsonpath='{.spec.managementState}'

# Should return: Managed
# Other values: Removed, Unmanaged

2. Check registry pod status

# List all pods in the registry namespace
oc get pods -n openshift-image-registry

# Look for image-registry pods specifically
oc get pods -n openshift-image-registry -l docker-registry=default

# Check pod details
oc describe pod -n openshift-image-registry -l docker-registry=default

3. Check registry pod logs

# View logs from registry pods
oc logs -n openshift-image-registry deployment/image-registry

# Check for errors
oc logs -n openshift-image-registry deployment/image-registry | grep -i "error\|fail\|fatal"

# Follow logs in real-time
oc logs -n openshift-image-registry deployment/image-registry -f

4. Verify registry configuration

# Get complete registry configuration
oc get config.imageregistry.operator.openshift.io cluster -o yaml

# Check replica count
oc get config.imageregistry.operator.openshift.io cluster -o jsonpath='{.spec.replicas}'

# Check rollout strategy
oc get config.imageregistry.operator.openshift.io cluster -o jsonpath='{.spec.rolloutStrategy}'

Solution

General troubleshooting steps:

  1. For registry in Removed or Unmanaged state:

    # Check current management state
    oc get config.imageregistry.operator.openshift.io cluster -o jsonpath='{.spec.managementState}'
    
    # Set management state to Managed
    oc patch config.imageregistry.operator.openshift.io cluster --type merge -p '{"spec":{"managementState":"Managed"}}'
    
    # Verify the change
    oc get config.imageregistry.operator.openshift.io cluster -o jsonpath='{.spec.managementState}'
  2. For missing storage configuration:

    # For empty storage configuration, set storage backend
    
    # Option 1: EmptyDir (for testing only, not for production)
    oc patch config.imageregistry.operator.openshift.io cluster --type merge -p '{"spec":{"storage":{"emptyDir":{}}}}'
    
    # Option 2: PVC (recommended for on-premises)
    oc patch config.imageregistry.operator.openshift.io cluster --type merge -p '{"spec":{"storage":{"pvc":{"claim":""}}}}'
    
    # Option 3: S3 (for AWS)
    oc patch config.imageregistry.operator.openshift.io cluster --type merge -p '{"spec":{"storage":{"s3":{"bucket":"my-registry-bucket","region":"us-east-1"}}}}'
    
    # Option 4: Azure Blob
    oc patch config.imageregistry.operator.openshift.io cluster --type merge -p '{"spec":{"storage":{"azure":{"accountName":"myaccount","container":"registry"}}}}'
    
    # Option 5: GCS (for Google Cloud)
    oc patch config.imageregistry.operator.openshift.io cluster --type merge -p '{"spec":{"storage":{"gcs":{"bucket":"my-registry-bucket"}}}}'
  3. For PVC binding issues:

    # Check PVC status
    oc get pvc -n openshift-image-registry
    
    # If PVC is pending, check StorageClass
    oc get storageclass
    
    # Set default StorageClass if needed
    oc patch storageclass <storage-class-name> -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
    
    # Delete and recreate PVC if corrupted
    oc delete pvc -n openshift-image-registry image-registry-storage
    
    # Manually create PVC with specific StorageClass
    cat <<EOF | oc apply -f -
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: image-registry-storage
      namespace: openshift-image-registry
    spec:
      accessModes:
      - ReadWriteMany
      resources:
        requests:
          storage: 100Gi
      storageClassName: <storage-class-name>
    EOF
  4. For pod failures:

    # Check pod logs for errors
    oc logs -n openshift-image-registry deployment/image-registry
    
    # Describe pods to see events
    oc describe pod -n openshift-image-registry -l docker-registry=default
    
    # Delete problematic pods to force recreation
    oc delete pod -n openshift-image-registry -l docker-registry=default
    
    # Wait for new pods to start
    oc get pods -n openshift-image-registry -w

Complete registry reconfiguration:

# For a fresh start, remove and recreate registry configuration

# 1. Set to Removed
oc patch config.imageregistry.operator.openshift.io cluster --type merge -p '{"spec":{"managementState":"Removed"}}'

# 2. Wait for pods to be deleted
oc get pods -n openshift-image-registry -w

# 3. Configure storage and set to Managed
oc patch config.imageregistry.operator.openshift.io cluster --type merge -p '{"spec":{"managementState":"Managed","storage":{"pvc":{"claim":""}}}}'

# 4. Verify registry comes up
oc get pods -n openshift-image-registry -w

Verify the fix:

# Check management state
oc get config.imageregistry.operator.openshift.io cluster -o jsonpath='{.spec.managementState}'
# Should return: Managed

# Check operator status
oc get clusteroperator image-registry
# All conditions should be True/False/False (Available/Progressing/Degraded)

# Check registry pods are running
oc get pods -n openshift-image-registry
# Should show image-registry pods in Running state with READY 1/1

# Check storage is configured
oc get config.imageregistry.operator.openshift.io cluster -o jsonpath='{.spec.storage}'
# Should show storage backend configuration

# Test registry functionality
oc get imagestreams -n openshift
# Should list imagestreams without errors

Resources

Clone this wiki locally