<img src="./images/logo.png" alt="Drawing" style="width: 500px;"/>

# Cleanup

Let’s put everything back the way it was!

## **1. Remove Presto Connectors**

1. Navigate back to the AI Essentials dashboard.
1. In the sidebar navigation menu, select `Data Engineering` > `Data Sources`.
1. Under `Structured Data`, click on the 3 dots.
1. Click `Remove`.

### 1. Remove the SQL connecter 
<img src="./images/exercise5/rm_sql.png" alt="Drawing" style="width: 35%;"/>

### 2. Remove the Delta Table connecter 
<img src="./images/exercise5/rm_delta.png" alt="Drawing" style="width: 35%;"/>

## **2. Clean the rest**

<div class="alert alert-block alert-danger">
    <b>Important:</b> Set your <b>Username</b>, your <b>Domain</b> and your <b>Delta Path</b> here !
</div>

In [None]:
USERNAME=""
DOMAIN=""
DELTA_PATH=f"/mnt/shared/{USERNAME}-retail/delta-tables/"

Set the Global variables required to run the exercise smootly

In [None]:
# Global configuration
NAMESPACE = get_namespace_from_service_account()
POSTGRES_PASSWORD = "postgres"
PG_SERVICE_NAME = f"{USERNAME}-retail-postgres"
PG_DATABASE_NAME = f"{USERNAME}-retail"
PVC_PREFIX = "postgres-pvc"  # This should match the prefix used in deployment

# Print the result
print("NAMESPACE:", NAMESPACE)
print("POSTGRES_PASSWORD:", POSTGRES_PASSWORD)
print("PG_SERVICE_NAME:", PG_SERVICE_NAME)
print("PG_DATABASE_NAME:", PG_DATABASE_NAME)

from time import sleep

## Cleanup

Function that reads the Kubernetes namespace from the service account mount point.

In [None]:
def get_namespace_from_service_account():
    """
    Reads the Kubernetes namespace from the service account mount point.
    Returns 'default' if not running in a Kubernetes pod or if the file doesn't exist.
    """
    namespace_file = '/var/run/secrets/kubernetes.io/serviceaccount/namespace'
    try:
        with open(namespace_file, 'r') as f:
            return f.read().strip()
    except IOError:
        return 'default'

Function to clean up a PostgreSQL deployment on Kubernetes by deleting the deployment and service, waiting for termination, and verifying the deletion, while handling potential errors.

In [None]:
def clean_postgresql_deployment():
    """Clean up PostgreSQL deployment from Kubernetes"""
    try:
        print("Starting PostgreSQL cleanup...")
        
        # 1. Delete the deployment
        print(f"Deleting deployment {PG_SERVICE_NAME}...")
        subprocess.run(
            f"kubectl delete deployment {PG_SERVICE_NAME} -n {NAMESPACE} --ignore-not-found",
            shell=True, check=True
        )
        
        # 2. Delete the service
        print(f"Deleting service {PG_SERVICE_NAME}...")
        subprocess.run(
            f"kubectl delete service {PG_SERVICE_NAME} -n {NAMESPACE} --ignore-not-found",
            shell=True, check=True
        )
        
        # 3. Wait for resources to be deleted
        print("Waiting for resources to terminate...")
        sleep(10)  # Give Kubernetes time to clean up
        
        # 4. Verify deletion
        print("Verifying deletion...")
        verify_deletion()
        
        print("\nPostgreSQL deployment cleaned up successfully!")
        
    except subprocess.CalledProcessError as e:
        print(f"Error during cleanup: {e.stderr.decode()}")
        raise

Function to verify the deletion of a PostgreSQL deployment by checking for the existence of its deployment, service, and pods in Kubernetes, and issuing warnings if any resources are still found.

In [None]:
def verify_deletion():
    """Verify that all resources have been deleted"""
    try:
        # Check deployment
        result = subprocess.run(
            f"kubectl get deployment {PG_SERVICE_NAME} -n {NAMESPACE}",
            shell=True, capture_output=True, text=True
        )
        if "NotFound" not in result.stderr:
            print("Warning: Deployment might still exist")
        
        # Check service
        result = subprocess.run(
            f"kubectl get service {PG_SERVICE_NAME} -n {NAMESPACE}",
            shell=True, capture_output=True, text=True
        )
        if "NotFound" not in result.stderr:
            print("Warning: Service might still exist")
        
        # Check pods
        result = subprocess.run(
            f"kubectl get pods -n {NAMESPACE} -l app={PG_SERVICE_NAME}",
            shell=True, capture_output=True, text=True
        )
        if "No resources found" not in result.stdout:
            print("Warning: Pods might still exist")
    
    except Exception as e:
        print(f"Verification error: {e}")

Function to clean up persistent data in Kubernetes by finding and deleting Persistent Volume Claims (PVCs) and their associated Persistent Volumes (PVs), ensuring proper resource deletion and handling errors.

In [None]:
def delete_persistent_data():
    """Delete persistent volumes and claims"""
    try:
        print("\nStarting persistent data cleanup...")
        
        # 1. Find and delete all PVCs with the matching prefix
        print(f"Looking for PVCs with prefix '{PVC_PREFIX}'...")
        result = subprocess.run(
            f"kubectl get pvc -n {NAMESPACE} --no-headers -o custom-columns=':metadata.name' | grep '^{PVC_PREFIX}'",
            shell=True, capture_output=True, text=True
        )
        
        pvcs = result.stdout.splitlines()
        if pvcs:
            print(f"Found {len(pvcs)} PVC(s) to delete:")
            for pvc in pvcs:
                print(f"- {pvc}")
            print("\nDeleting PVCs...")
            subprocess.run(
                f"kubectl delete pvc {' '.join(pvcs)} -n {NAMESPACE}",
                shell=True, check=True
            )
        else:
            print("No PVCs found matching the prefix")
        
        # 2. Wait for PVCs to be deleted
        print("Waiting for PVCs to be deleted...")
        sleep(10)
        
        # 3. Find and delete associated PVs (if not automatically deleted)
        print("Checking for associated PVs...")
        result = subprocess.run(
            f"kubectl get pv --no-headers -o custom-columns=':metadata.name,:spec.claimRef.name' | grep {PVC_PREFIX} | awk '{{print $1}}'",
            shell=True, capture_output=True, text=True
        )
        
        pvs = result.stdout.splitlines()
        if pvs:
            print(f"Found {len(pvs)} PV(s) to delete:")
            for pv in pvs:
                print(f"- {pv}")
            print("\nDeleting PVs...")
            subprocess.run(
                f"kubectl delete pv {' '.join(pvs)}",
                shell=True, check=True
            )
        else:
            print("No associated PVs found")
        
        print("\nPersistent data cleanup completed")
        
    except subprocess.CalledProcessError as e:
        if "NotFound" not in e.stderr.decode() and "no matches found" not in e.stderr.decode():
            print(f"Error during persistent data cleanup: {e.stderr.decode()}")
        else:
            print("No persistent resources found to delete")

Function to clean up Delta tables by removing the specified Delta directory from the filesystem, handling any errors that may occur.

In [None]:
def clean_delta_tables():
    """Clean up Delta tables"""
    import shutil
    try:
        print(f"\nCleaning Delta tables at: {DELTA_PATH}")
        
        if os.path.exists(DELTA_PATH):
            print("Removing Delta directory...")
            shutil.rmtree(DELTA_PATH)
            print("Delta tables removed successfully!")
        else:
            print("Delta directory not found - nothing to clean")
            
    except Exception as e:
        print(f"Error cleaning Delta tables: {str(e)}")

Function to perform a complete cleanup by removing PostgreSQL deployment, deleting persistent data, and cleaning Delta tables, ensuring all resources are properly cleaned up.

In [None]:
def full_cleanup():
    """Perform complete cleanup including persistent data and analytics resources"""
    print("=== Starting Full Cleanup ===")
    clean_postgresql_deployment()
    delete_persistent_data()
    clean_delta_tables()
    print("=== Full Cleanup Completed ===")

Function to present a menu to the user for selecting between a basic cleanup (deployment/service) or a full cleanup (including data), and execute the corresponding cleanup process based on their choice.

In [None]:
# Update the main menu
if __name__ == "__main__":
    print("Retail Analytics Kubernetes Cleanup Tool")
    print(f"Namespace: {NAMESPACE}")
    print(f"PostgreSQL Resource: {PG_SERVICE_NAME}")
    print(f"PVC Prefix: {PVC_PREFIX}*")
    print(f"Delta Path: {DELTA_PATH}")
    
    choice = input("\nChoose cleanup option:\n"
                   "1. Basic cleanup (deployment/service)\n"
                   "2. Full cleanup (including data)\n"
                   "Enter choice (1/2): ")
    
    if choice == "1":
        clean_postgresql_deployment()
    elif choice == "2":
        full_cleanup()
    else:
        print("Invalid choice. Exiting.")

# **Conclusion**

Thank you for exploring our **Smart Retail Data Analyst** Demo! We’ve just seen how cutting-edge technologies like **Apache Spark**, **Delta Lake**, **Presto**, and **NVIDIA Inference Microservices** come together to unlock the true power of retail data.

Throughout the exercises, we:
- Analyzed customer purchasing patterns
- Predicted sales trends with real-time data
- Optimized inventory with fast, interactive SQL queries
- Used natural language to extract AI-powered insights effortlessly

Now that all exercises are complete and the environment has been successfully cleaned up, this concludes our demo.