# Nombre: Luis Guillermo Cerda Sepúlveda

## Caso de uso:
### En mi caso , como SRE debo administrar multiples clusters Kubernetes que prestan servicios en multiples ambientes y en algunas ocasiones producto de problemas con el registry desde donde se descargan la imágenes, la caída de un worker o no haber drenado un nodo correctamente, algunos pods pueden quedar en un estado fallido lo que genera problemas con los servios que se alojan en el cluster.

Con el script siguiente es posible realizar la limpieza de estos pods, ya sea de manera masiva en un namespace o uno a uno permitiendo eliminar solo aquellos que nos interesen.

Para esto se ha importado la librería de python llamada <font color='green'>subprocess</font>, la cual permite ejecutar comandos a nivel de consola, permitiendo registrar la salida de la ejecución y trabajar con esta información.

* Es necesario que el programa <font color='green'>kubectl </font> se encuentre instalado y configurado para trabajar.



In [20]:
import subprocess

# User must type the namespace to check for dead pods
namespace = input("Input namespace to check for dead pods")

# Function to get the dead pods
def list_non_running_pods(namespace):
   
    # Define the kubectl command to list pods in the specified namespace
    list_pods_command = f"kubectl get pods -n {namespace} --no-headers"

    try:
        # Execute the kubectl command and capture the output
        pod_list_output = subprocess.check_output(list_pods_command, shell=True, text=True)

        # Initialize a list to store pod names to delete
        pods_to_delete = []

        # Split the output into lines and iterate over the pods
        for line in pod_list_output.strip().split('\n'):
            columns = line.split()
            pod_name = columns[0]
            pod_status = columns[2]

            # Check if the pod is not in a "Running" state
            if pod_status != "Running":
                pods_to_delete.append(pod_name)

                # Display pod information
                print(f"Pod Name: {pod_name}, Status: {pod_status}")

        return pods_to_delete
    # If kubectl binary is not available this line catch de exception
    except subprocess.CalledProcessError as e:
        print(f"Error executing kubectl command: {e}")
        return []

# Function to delete the dead pods
def delete_pods(namespace, pods_to_delete):
    if not pods_to_delete:
        print("No pods to delete.")
        return

    # Ask for confirmation to delete all pods at once
    confirmation = input("Do you want to delete all non-running pods at once (y/n)? ").strip().lower()
    
    if confirmation == "y":
        try:
            # Delete all non-running pods using kubectl delete
            delete_command = f"kubectl delete pods -n {namespace} {' '.join(pods_to_delete)}"
            subprocess.run(delete_command, shell=True)

            print(f"Deleted {len(pods_to_delete)} pods in namespace {namespace}")
        except subprocess.CalledProcessError as e:
            print(f"Error executing kubectl delete command: {e}")
    else:
        # Ask for confirmation to delete each pod individually
        for pod_name in pods_to_delete:
            confirmation = input(f"Do you want to delete pod {pod_name} (y/n)? ").strip().lower()
            if confirmation == "y":
                try:
                    # Delete the non-running pod using kubectl delete
                    delete_command = f"kubectl delete pod {pod_name} -n {namespace}"
                    subprocess.run(delete_command, shell=True)

                    print(f"Deleted pod {pod_name} in namespace {namespace}")
                except subprocess.CalledProcessError as e:
                    print(f"Error executing kubectl delete command: {e}")
            else:
                print(f"Skipped deletion of pod {pod_name}")

# pods_to_delete = list_non_running_pods(namespace)    # Line commented only for refresh failed pod list example



Ejecución del script eliminando todos los pods que presentan un estado fallido de una sola vez

In [21]:
pods_to_delete = list_non_running_pods(namespace)
delete_pods(namespace, pods_to_delete)

Pod Name: nginx-deployment-7c6cbf4cd4-62zrx, Status: ImagePullBackOff
Pod Name: nginx-deployment-7c6cbf4cd4-d6tf4, Status: ErrImagePull
Pod Name: nginx-deployment-7c6cbf4cd4-qcfv9, Status: ImagePullBackOff
pod "nginx-deployment-7c6cbf4cd4-62zrx" deleted
pod "nginx-deployment-7c6cbf4cd4-d6tf4" deleted
pod "nginx-deployment-7c6cbf4cd4-qcfv9" deleted
Deleted 3 pods in namespace default


Ejecución del script eliminando  los pods que presentan un estado fallido selectivamente

In [22]:
pods_to_delete = list_non_running_pods(namespace)
delete_pods(namespace, pods_to_delete)

Pod Name: nginx-deployment-7c6cbf4cd4-4b8qs, Status: ErrImagePull
Pod Name: nginx-deployment-7c6cbf4cd4-s8s9d, Status: ErrImagePull
Pod Name: nginx-deployment-7c6cbf4cd4-w55zq, Status: ErrImagePull
pod "nginx-deployment-7c6cbf4cd4-4b8qs" deleted
Deleted pod nginx-deployment-7c6cbf4cd4-4b8qs in namespace default
pod "nginx-deployment-7c6cbf4cd4-s8s9d" deleted
Deleted pod nginx-deployment-7c6cbf4cd4-s8s9d in namespace default
Skipped deletion of pod nginx-deployment-7c6cbf4cd4-w55zq
