# üîÑ Cluster Configuration Revert Tool

## Purpose

This notebook provides a production-ready tool to revert cluster configurations to their state **before a selected batch update**. It uses the selected batch as a **reference point in time** and reverts ALL clusters in the environment to their configuration before that reference date.

---

## Key Features

* ‚úÖ **Time-based Revert**: Select any batch as a reference point - all clusters revert to their state BEFORE that date
* ‚úÖ **Complete Configuration Restore**: Restores instance types, policies, security modes, AWS attributes, elastic disk, auto-termination, and all other settings
* ‚úÖ **Cross-Workspace Support**: Works across different workspaces using Service Principal authentication
* ‚úÖ **Environment Selector**: Supports dev, qa, uat, and prod environments
* ‚úÖ **Batch History**: View all batch updates with labels and statistics
* ‚úÖ **Comprehensive Validation**: Pre-execution checks for authentication, compute, and environment readiness
* ‚úÖ **Detailed Reporting**: Execution summary with success/failure tracking

---

## How It Works

### Logic Flow:

1. **Select Environment** (dev/qa/uat/prod) ‚Üí Determines which catalog and workspace to use
2. **View Batch History** ‚Üí Shows all batch updates in the selected environment
3. **Select Reference Batch** ‚Üí Choose which batch date to use as the revert reference point
4. **Analyze Impact** ‚Üí Shows ALL clusters in the environment and what will be reverted
5. **Execute Revert** ‚Üí Restores all clusters to their state BEFORE the reference date

### Important Concepts:

**Reference Point Logic:**
* The selected batch is a **reference point in time**, not a filter
* ALL clusters that have been updated in the environment will be reverted
* Each cluster reverts to its last configuration BEFORE the reference date
* Each cluster's previous config may be from different dates

**Example:**
* You select batch "2025-12-10_20-11" as reference
* Cluster A was last configured on Dec 9 ‚Üí reverts to Dec 9 state
* Cluster B was last configured on Nov 15 ‚Üí reverts to Nov 15 state
* Cluster C was configured on Dec 11 ‚Üí reverts to its state before Dec 10 20:11

---

## Prerequisites

### Required Access:
* ‚úÖ Read access to `{catalog}.billing_forecast.cluster_update_log` table
* ‚úÖ Read access to `system.compute.clusters` table
* ‚úÖ Service Principal credentials in `sp-oauth` secret scope (for cross-workspace)
* ‚úÖ Classic compute cluster (required for cross-workspace API calls)

### Required Secrets:
* `sp-oauth` scope with keys:
  * `client` - Service Principal Client ID
  * `secret` - Service Principal Client Secret

---

## Usage Instructions

### Step-by-Step:

1. **Attach to Classic Compute** (e.g., "Abhijit Joshi's multinode Cluster")
2. **Run Cell 2**: Select environment (dev/qa/uat/prod)
3. **Run Cell 3**: View all batch updates in the environment
4. **Run Cell 4**: Select which batch to use as reference point
5. **Run Cell 5**: Analyze impact - see what will be reverted
6. **Run Cell 6**: Execute the complete revert
7. **Verify**: Check target workspace to confirm configurations

---

## Safety Features

* üõ°Ô∏è **Pre-execution validation** of authentication and compute
* üõ°Ô∏è **Detailed preview** of all changes before execution
* üõ°Ô∏è **Per-cluster error handling** - one failure doesn't stop others
* üõ°Ô∏è **Comprehensive logging** - tracks success/failure for each cluster
* üõ°Ô∏è **Audit trail** - all operations logged with timestamps and users

---

## Important Notes

‚ö†Ô∏è **This tool reverts ALL clusters in the environment, not just those in the selected batch**

‚ö†Ô∏è **Each cluster reverts to its state BEFORE the reference date**

‚ö†Ô∏è **Always verify results in the target workspace after execution**

‚ö†Ô∏è **Cluster policies must exist and be accessible for successful revert**

---

**Ready to begin? Run the cells in sequence starting with Cell 2.**

## üöÄ Quick Start Guide

### Prerequisites Check

Before running this notebook, ensure:

1. ‚úÖ **Compute**: Attached to classic compute cluster (e.g., "Abhijit Joshi's multinode Cluster")
   * ‚ùå Serverless compute will NOT work for cross-workspace operations
   
2. ‚úÖ **Secrets**: Service Principal credentials available in `sp-oauth` scope
   * `client` - Service Principal Client ID
   * `secret` - Service Principal Client Secret
   
3. ‚úÖ **Permissions**: 
   * Read access to `{catalog}.billing_forecast.cluster_update_log`
   * Read access to `system.compute.clusters`
   * Service Principal has `CAN_MANAGE` on target clusters

---

### Execution Steps

| Step | Cell | Action | Description |
|------|------|--------|-------------|
| 1 | Cell 2 | **Select Environment** | Choose dev/qa/uat/prod |
| 2 | Cell 3 | **View Batch History** | See all batch updates |
| 3 | Cell 4 | **Select Reference Batch** | Choose time reference point |
| 4 | Cell 5 | **Analyze Impact** | Preview what will be reverted |
| 5 | Cell 6 | **Execute Revert** | Perform the revert operation |
| 6 | Cell 7 | **Verify Results** | Check execution summary |

---

### ‚ö†Ô∏è Important Warnings

* **This reverts ALL clusters in the environment**, not just those in the selected batch
* **Each cluster reverts to its state BEFORE the reference date**
* **Cluster policies must exist** in the target workspace
* **Always verify results** in the target workspace after execution
* **Cannot be undone** - document before executing

---

**Ready? Start with Cell 2 below.**

In [0]:
# ============================================================================
# CONFIGURATION: ENVIRONMENT SELECTION
# ============================================================================

import requests
from pyspark.sql.functions import col, lit, count, sum as spark_sum, max as spark_max, min as spark_min

print("="*80)
print("CLUSTER CONFIGURATION REVERT TOOL - CONFIGURATION")
print("="*80)

# Create environment selection widget
dbutils.widgets.dropdown(
    name="environment",
    defaultValue="dev_sandbox",
    choices=["dev_sandbox", "qa_sandbox", "uat_sandbox", "prod_sandbox"],
    label="Select Environment"
)

# Get selected environment
selected_environment = dbutils.widgets.get("environment")

# Environment configuration mapping
env_config = {
    "dev_sandbox": {
        "catalog": "dev_sandbox",
        "workspace_mapping": {
            "Integrated-Dev": "oportun-integrated-dev.cloud.databricks.com"
        }
    },
    "qa_sandbox": {
        "catalog": "qa_sandbox",
        "workspace_mapping": {
            "QA": "oportun-qa.cloud.databricks.com"
        }
    },
    "uat_sandbox": {
        "catalog": "uat_sandbox",
        "workspace_mapping": {
            "UAT": "oportun-uat.cloud.databricks.com"
        }
    },
    "prod_sandbox": {
        "catalog": "prod_sandbox",
        "workspace_mapping": {
            "Prod": "oportun-prod.cloud.databricks.com"
        }
    }
}

# Get configuration for selected environment
selected_catalog = env_config[selected_environment]["catalog"]
workspace_mapping = env_config[selected_environment]["workspace_mapping"]

print(f"\n‚úÖ Environment Configuration:")
print(f"   Selected: {selected_environment}")
print(f"   Catalog: {selected_catalog}")
print(f"   Target Workspaces: {', '.join(workspace_mapping.keys())}")

print("\n" + "="*80)
print("‚úÖ CONFIGURATION COMPLETE")
print("="*80)
print("\nNext: Run Cell 3 to view batch history")

In [0]:
# ============================================================================
# GET ALL BATCH UPDATES FOR SELECTED ENVIRONMENT
# ============================================================================

print("="*80)
print("BATCH UPDATES HISTORY")
print("="*80)
print(f"Environment: {selected_environment}")
print(f"Catalog: {selected_catalog}")
print("\n" + "="*80)

# Get all batch updates from the cluster_update_log table
all_batches = spark.sql(f"""
    SELECT 
        batch_id,
        batch_start_time,
        batch_end_time,
        execution_label,
        executed_by_user,
        COUNT(*) as total_updates,
        SUM(CASE WHEN update_status = 'SUCCESS' THEN 1 ELSE 0 END) as successful_updates,
        SUM(CASE WHEN update_status = 'FAILED' THEN 1 ELSE 0 END) as failed_updates,
        SUM(CASE WHEN dry_run = false THEN 1 ELSE 0 END) as actual_updates,
        SUM(CASE WHEN dry_run = true THEN 1 ELSE 0 END) as dry_run_updates,
        COUNT(DISTINCT workspace_name) as workspace_count,
        COLLECT_SET(workspace_name) as workspaces
    FROM {selected_catalog}.billing_forecast.cluster_update_log
    GROUP BY batch_id, batch_start_time, batch_end_time, execution_label, executed_by_user
    ORDER BY batch_end_time DESC
    LIMIT 50
""")

batch_count = all_batches.count()

if batch_count == 0:
    print(f"\n‚ö†Ô∏è  No batch updates found in {selected_catalog}.billing_forecast.cluster_update_log")
    print("\nPlease verify:")
    print("  1. The table exists and has data")
    print("  2. You have permissions to read the table")
    print("  3. The selected environment is correct")
else:
    print(f"\n‚úÖ Found {batch_count} batch updates in {selected_environment}\n")
    
    # Display all batches
    print("üìã Available Batch Updates:")
    display(all_batches.select(
        "batch_id",
        "execution_label",
        "batch_end_time",
        "executed_by_user",
        "total_updates",
        "successful_updates",
        "failed_updates",
        "actual_updates",
        "workspace_count",
        "workspaces"
    ))
    
    print(f"\n" + "="*80)
    print(f"Total batches available: {batch_count}")
    print("="*80)
    print("\n‚úÖ Batch information loaded and ready for selection")
    print("\nNext: Run Cell 4 to select which batch to use as reference point")

In [0]:
# ============================================================================
# SELECT BATCH REFERENCE POINT
# ============================================================================

print("Creating batch selection dropdown...\n")

# Get batch labels for dropdown (only actual updates, not dry runs)
actual_batches = all_batches.filter("actual_updates > 0").collect()

if len(actual_batches) == 0:
    print("‚ö†Ô∏è  No actual batch updates found (all were dry runs)")
    print("\nPlease run a live batch update first before using the revert tool.")
else:
    # Create list of batch labels
    batch_choices = []
    for row in actual_batches:
        label = row['execution_label'] if row['execution_label'] else f"Batch_{row['batch_end_time']}"
        batch_choices.append(label)
    
    # Create dropdown widget
    dbutils.widgets.dropdown(
        name="batch_to_revert",
        defaultValue=batch_choices[0],  # Default to most recent batch
        choices=batch_choices,
        label="Select Batch Reference Point"
    )
    
    # Get selected batch
    selected_batch_label = dbutils.widgets.get("batch_to_revert")
    
    # Find the batch info for selected label
    selected_batch_info = None
    for row in actual_batches:
        label = row['execution_label'] if row['execution_label'] else f"Batch_{row['batch_end_time']}"
        if label == selected_batch_label:
            selected_batch_info = row
            break
    
    if selected_batch_info:
        print("="*80)
        print("SELECTED BATCH REFERENCE POINT")
        print("="*80)
        print(f"Label: {selected_batch_label}")
        print(f"Batch ID: {selected_batch_info['batch_id']}")
        print(f"Executed By: {selected_batch_info['executed_by_user']}")
        print(f"Reference Time: {selected_batch_info['batch_start_time']}")
        print(f"Batch End Time: {selected_batch_info['batch_end_time']}")
        print(f"Total Updates in Batch: {selected_batch_info['total_updates']}")
        print(f"Successful: {selected_batch_info['successful_updates']}")
        print(f"Failed: {selected_batch_info['failed_updates']}")
        print(f"Workspaces: {', '.join(selected_batch_info['workspaces'])}")
        print("="*80)
        
        # Store selected batch details for use in subsequent cells
        selected_batch_id = selected_batch_info['batch_id']
        selected_batch_start_time = selected_batch_info['batch_start_time']
        selected_batch_end_time = selected_batch_info['batch_end_time']
        
        print("\n‚úÖ Batch reference point selected successfully")
        print("\nüí° Important: ALL clusters in the environment will be reverted")
        print(f"   to their state BEFORE {selected_batch_start_time}")
        print("\nNext: Run Cell 5 to analyze impact and see what will be reverted")
    else:
        print("‚ùå Could not find selected batch information")

In [0]:
# ============================================================================
# ANALYZE IMPACT - PREVIEW CHANGES
# ============================================================================

print("\nüîç Analyzing impact of revert operation...\n")
print("="*80)
print(f"Reference Batch: {selected_batch_label}")
print(f"Reference Date: {selected_batch_start_time}")
print("="*80)
print("\nüí° Logic: Revert ALL clusters in environment to their state BEFORE reference date")
print("="*80)

# Step 1: Get ALL clusters that have been updated in the selected environment
print("\nüîç Step 1: Identifying all clusters in environment...\n")

all_updated_clusters = spark.sql(f"""
    SELECT DISTINCT
        cluster_id,
        cluster_name,
        workspace_name,
        workspace_id,
        deployment_url
    FROM {selected_catalog}.billing_forecast.cluster_update_log
    WHERE update_status = 'SUCCESS'
        AND dry_run = false
    ORDER BY workspace_name, cluster_name
""")

all_cluster_count = all_updated_clusters.count()

if all_cluster_count == 0:
    print("‚ö†Ô∏è  No clusters found in cluster_update_log for this environment")
    print("\nThis environment has no cluster update history.")
else:
    print(f"‚úÖ Found {all_cluster_count} clusters in {selected_environment}\n")
    
    # Show cluster summary
    print("üìÑ All clusters that will be reverted:")
    display(all_updated_clusters)
    
    # Get cluster IDs for system table queries
    all_cluster_ids = [row['cluster_id'] for row in all_updated_clusters.collect()]
    all_cluster_ids_str = "', '".join(all_cluster_ids)
    
    # Step 2: For each cluster, get the last configuration BEFORE the reference date
    print(f"\nüîç Step 2: Retrieving configurations BEFORE {selected_batch_start_time}...")
    print("   (Each cluster's previous config may be from different dates)\n")
    
    previous_configs_batch = spark.sql(f"""
        WITH configs_before_reference AS (
            SELECT 
                cluster_id,
                cluster_name,
                driver_node_type,
                worker_node_type,
                min_autoscale_workers,
                max_autoscale_workers,
                worker_count,
                policy_id,
                data_security_mode,
                dbr_version,
                auto_termination_minutes,
                enable_elastic_disk,
                driver_instance_pool_id,
                worker_instance_pool_id,
                aws_attributes,
                change_time,
                owned_by,
                ROW_NUMBER() OVER (PARTITION BY cluster_id ORDER BY change_time DESC) as rn
            FROM system.compute.clusters
            WHERE cluster_id IN ('{all_cluster_ids_str}')
                AND change_time < timestamp'{selected_batch_start_time}'
                AND delete_time IS NULL
        )
        SELECT 
            cluster_id,
            cluster_name,
            driver_node_type as prev_driver,
            worker_node_type as prev_worker,
            min_autoscale_workers as prev_min_workers,
            max_autoscale_workers as prev_max_workers,
            worker_count as prev_worker_count,
            policy_id as prev_policy_id,
            data_security_mode as prev_data_security_mode,
            dbr_version as prev_dbr_version,
            auto_termination_minutes as prev_auto_termination,
            enable_elastic_disk as prev_enable_elastic_disk,
            driver_instance_pool_id as prev_driver_pool,
            worker_instance_pool_id as prev_worker_pool,
            aws_attributes as prev_aws_attributes,
            change_time as prev_config_time,
            owned_by as prev_owned_by
        FROM configs_before_reference
        WHERE rn = 1
        ORDER BY cluster_name
    """)
    
    prev_count_batch = previous_configs_batch.count()
    
    if prev_count_batch == 0:
        print(f"‚ùå No configurations found before reference date {selected_batch_start_time}")
        print("   This could mean:")
        print("   - Clusters were created after the reference date")
        print("   - System table data doesn't go back that far")
    else:
        print(f"‚úÖ Found previous configurations for {prev_count_batch} clusters")
        
        print("\nüìÑ Previous Configurations (Target State):")
        display(previous_configs_batch.select(
            "cluster_id", "cluster_name",
            "prev_driver", "prev_worker",
            "prev_policy_id", "prev_data_security_mode",
            "prev_enable_elastic_disk", "prev_auto_termination",
            "prev_config_time", "prev_owned_by"
        ))
        
        # Get current configurations
        print("\nüîç Step 3: Retrieving current configurations...")
        
        current_configs_batch = spark.sql(f"""
            WITH ranked AS (
                SELECT 
                    cluster_id,
                    cluster_name,
                    driver_node_type,
                    worker_node_type,
                    min_autoscale_workers,
                    max_autoscale_workers,
                    worker_count,
                    policy_id,
                    data_security_mode,
                    auto_termination_minutes,
                    enable_elastic_disk,
                    change_time,
                    ROW_NUMBER() OVER (PARTITION BY cluster_id ORDER BY change_time DESC) as rn
                FROM system.compute.clusters
                WHERE cluster_id IN ('{all_cluster_ids_str}')
                    AND delete_time IS NULL
            )
            SELECT 
                cluster_id,
                cluster_name,
                driver_node_type as curr_driver,
                worker_node_type as curr_worker,
                min_autoscale_workers as curr_min_workers,
                max_autoscale_workers as curr_max_workers,
                worker_count as curr_worker_count,
                policy_id as curr_policy_id,
                data_security_mode as curr_data_security_mode,
                auto_termination_minutes as curr_auto_termination,
                enable_elastic_disk as curr_enable_elastic_disk,
                change_time as curr_config_time
            FROM ranked
            WHERE rn = 1
            ORDER BY cluster_name
        """)
        
        print(f"‚úÖ Found current configurations for {current_configs_batch.count()} clusters")
        
        # Create revert plan
        print("\nüîç Step 4: Creating comprehensive revert plan...")
        
        revert_plan_batch = previous_configs_batch.alias("prev").join(
            current_configs_batch.alias("curr"),
            col("prev.cluster_id") == col("curr.cluster_id"),
            "inner"
        ).select(
            col("prev.cluster_id"),
            col("prev.cluster_name"),
            # Target (previous) configuration
            col("prev.prev_driver").alias("target_driver"),
            col("prev.prev_worker").alias("target_worker"),
            col("prev.prev_min_workers").alias("target_min_workers"),
            col("prev.prev_max_workers").alias("target_max_workers"),
            col("prev.prev_worker_count").alias("target_worker_count"),
            col("prev.prev_policy_id").alias("target_policy_id"),
            col("prev.prev_data_security_mode").alias("target_data_security_mode"),
            col("prev.prev_auto_termination").alias("target_auto_termination"),
            col("prev.prev_enable_elastic_disk").alias("target_enable_elastic_disk"),
            col("prev.prev_driver_pool").alias("target_driver_pool"),
            col("prev.prev_worker_pool").alias("target_worker_pool"),
            col("prev.prev_aws_attributes").alias("target_aws_attributes"),
            col("prev.prev_config_time").alias("target_config_time"),
            col("prev.prev_owned_by").alias("target_owned_by"),
            # Current configuration
            col("curr.curr_driver"),
            col("curr.curr_worker"),
            col("curr.curr_policy_id"),
            col("curr.curr_data_security_mode"),
            col("curr.curr_enable_elastic_disk"),
            col("curr.curr_auto_termination"),
            col("curr.curr_config_time")
        )
        
        print(f"‚úÖ Created revert plan for {revert_plan_batch.count()} clusters")
        
        print("\nüìÑ Revert Plan (Current ‚Üí Target):")
        display(revert_plan_batch.select(
            "cluster_id", "cluster_name",
            "curr_driver", "target_driver",
            "curr_worker", "target_worker",
            "curr_policy_id", "target_policy_id",
            "curr_data_security_mode", "target_data_security_mode",
            "curr_enable_elastic_disk", "target_enable_elastic_disk",
            "curr_auto_termination", "target_auto_termination",
            "curr_config_time", "target_config_time", "target_owned_by"
        ))
        
        # Analyze changes
        policy_changes_count = revert_plan_batch.filter(
            (col("curr_policy_id").isNull() & col("target_policy_id").isNotNull()) |
            (col("curr_policy_id").isNotNull() & col("target_policy_id").isNull()) |
            (col("curr_policy_id") != col("target_policy_id"))
        ).count()
        
        security_changes_count = revert_plan_batch.filter(
            (col("curr_data_security_mode").isNull() & col("target_data_security_mode").isNotNull()) |
            (col("curr_data_security_mode").isNotNull() & col("target_data_security_mode").isNull()) |
            (col("curr_data_security_mode") != col("target_data_security_mode"))
        ).count()
        
        instance_changes_count = revert_plan_batch.filter(
            (col("curr_driver") != col("target_driver")) |
            (col("curr_worker") != col("target_worker"))
        ).count()
        
        elastic_disk_changes = revert_plan_batch.filter(
            col("curr_enable_elastic_disk") != col("target_enable_elastic_disk")
        ).count()
        
        auto_term_changes = revert_plan_batch.filter(
            (col("curr_auto_termination").isNull() & col("target_auto_termination").isNotNull()) |
            (col("curr_auto_termination").isNotNull() & col("target_auto_termination").isNull()) |
            (col("curr_auto_termination") != col("target_auto_termination"))
        ).count()
        
        print("\n" + "="*80)
        print("‚ö†Ô∏è  IMPACT ANALYSIS - CHANGES TO BE REVERTED")
        print("="*80)
        print(f"   ‚Ä¢ Cluster Policy Changes: {policy_changes_count} clusters")
        print(f"   ‚Ä¢ Data Security Mode Changes: {security_changes_count} clusters")
        print(f"   ‚Ä¢ Instance Type Changes: {instance_changes_count} clusters")
        print(f"   ‚Ä¢ Elastic Disk Changes: {elastic_disk_changes} clusters")
        print(f"   ‚Ä¢ Auto Termination Changes: {auto_term_changes} clusters")
        print("="*80)
        
        if policy_changes_count > 0 or security_changes_count > 0:
            print("\n‚ùå CRITICAL: Security/governance settings will be restored!")
            print("   This is important for compliance and access controls.")
        
        print(f"\nüí° Summary:")
        print(f"   ‚Ä¢ Total clusters to revert: {all_cluster_count}")
        print(f"   ‚Ä¢ Clusters with previous configs: {prev_count_batch}")
        print(f"   ‚Ä¢ Reference point: {selected_batch_start_time}")
        print(f"   ‚Ä¢ Each cluster reverts to its last state BEFORE reference date")

print("\n" + "="*80)
if all_cluster_count > 0 and prev_count_batch > 0:
    print(f"‚úÖ ANALYSIS COMPLETE: Ready to revert {prev_count_batch} clusters")
    print("\nNext: Run Cell 6 to execute the revert operation")
else:
    print("‚ö†Ô∏è  ANALYSIS INCOMPLETE: Cannot proceed with revert")
print("="*80)

In [0]:
# ============================================================================
# EXECUTE COMPLETE REVERT OPERATION
# ============================================================================

import time
from datetime import datetime

print("\nüöÄ EXECUTING CLUSTER CONFIGURATION REVERT...\n")
print("="*80)
print(f"Selected Batch: {selected_batch_label}")
print(f"Reference Date: {selected_batch_start_time}")
print(f"Environment: {selected_environment}")
print(f"Execution Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("="*80)

if all_cluster_count == 0 or prev_count_batch == 0:
    print("\n‚ö†Ô∏è  Cannot proceed - no clusters or previous configurations available")
    print("\nPlease run Cell 5 first to analyze the impact.")
else:
    try:
        # Setup authentication
        print("\nüîê Step 1: Setting up authentication...")
        
        # Get target workspace from all_updated_clusters
        target_ws_info = all_updated_clusters.select("workspace_name", "deployment_url").first()
        target_workspace_url_exec = target_ws_info['deployment_url'].replace('https://', '')
        target_workspace_name_exec = target_ws_info['workspace_name']
        
        current_workspace_url_exec = spark.conf.get("spark.databricks.workspaceUrl")
        
        # Check if cross-workspace
        if current_workspace_url_exec.lower() != target_workspace_url_exec.lower():
            # Cross-workspace - use Service Principal
            print(f"   Cross-workspace operation detected")
            print(f"   Current: {current_workspace_url_exec}")
            print(f"   Target: {target_workspace_url_exec}")
            
            sp_client_id = dbutils.secrets.get(scope="sp-oauth", key="client")
            sp_client_secret = dbutils.secrets.get(scope="sp-oauth", key="secret")
            token_url = f"https://{target_workspace_url_exec}/oidc/v1/token"
            token_data = {"grant_type": "client_credentials", "scope": "all-apis"}
            token_response = requests.post(token_url, auth=(sp_client_id, sp_client_secret), data=token_data, timeout=30)
            
            if token_response.status_code != 200:
                raise Exception(f"OAuth authentication failed: {token_response.text}")
            
            access_token = token_response.json()['access_token']
            api_base_url = f"https://{target_workspace_url_exec}"
            print(f"\n‚úÖ Cross-workspace OAuth authentication successful")
        else:
            # Same workspace
            access_token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()
            api_base_url = f"https://{current_workspace_url_exec}"
            print(f"\n‚úÖ Same-workspace authentication successful")
        
        headers = {"Authorization": f"Bearer {access_token}", "Content-Type": "application/json"}
        
        # Execute revert for each cluster
        print(f"\nüîÑ Step 2: Reverting {prev_count_batch} cluster configurations...\n")
        print("="*80)
        
        revert_results_final = []
        revert_configs = revert_plan_batch.collect()
        
        for idx, config in enumerate(revert_configs, 1):
            cluster_id = config['cluster_id']
            cluster_name = config['cluster_name']
            
            print(f"\n[{idx}/{len(revert_configs)}] {cluster_name}")
            print(f"   Cluster ID: {cluster_id}")
            
            # Identify changes
            changes = []
            if config['curr_driver'] != config['target_driver']:
                changes.append(f"Driver: {config['curr_driver']} ‚Üí {config['target_driver']}")
            if config['curr_worker'] != config['target_worker']:
                changes.append(f"Worker: {config['curr_worker']} ‚Üí {config['target_worker']}")
            if config['curr_policy_id'] != config['target_policy_id']:
                changes.append(f"Policy: {config['curr_policy_id']} ‚Üí {config['target_policy_id']}")
            if config['curr_data_security_mode'] != config['target_data_security_mode']:
                changes.append(f"Security: {config['curr_data_security_mode']} ‚Üí {config['target_data_security_mode']}")
            if config['curr_enable_elastic_disk'] != config['target_enable_elastic_disk']:
                changes.append(f"Elastic Disk: {config['curr_enable_elastic_disk']} ‚Üí {config['target_enable_elastic_disk']}")
            if config['curr_auto_termination'] != config['target_auto_termination']:
                changes.append(f"Auto Term: {config['curr_auto_termination']} ‚Üí {config['target_auto_termination']}")
            
            if changes:
                print(f"   Changes ({len(changes)}):")
                for change in changes:
                    print(f"      ‚Ä¢ {change}")
            else:
                print("   ‚ÑπÔ∏è  No changes needed (already in target state)")
            
            try:
                # Get current cluster config from API
                get_url = f"{api_base_url}/api/2.0/clusters/get?cluster_id={cluster_id}"
                get_response = requests.get(get_url, headers=headers, timeout=30)
                
                if get_response.status_code != 200:
                    error_msg = f"Failed to get cluster: {get_response.text[:200]}"
                    print(f"   ‚ùå {error_msg}")
                    revert_results_final.append({
                        'cluster_id': cluster_id, 
                        'cluster_name': cluster_name, 
                        'status': 'FAILED', 
                        'message': error_msg,
                        'changes_count': len(changes)
                    })
                    continue
                
                cluster_config = get_response.json()
                
                # Build edit request with COMPLETE previous configuration
                edit_request = {
                    'cluster_id': cluster_id,
                    'spark_version': cluster_config['spark_version'],
                    'node_type_id': config['target_driver'],
                    'driver_node_type_id': config['target_driver']
                }
                
                # Restore autoscale or fixed workers
                if config['target_min_workers'] is not None and config['target_max_workers'] is not None:
                    edit_request['autoscale'] = {
                        'min_workers': int(config['target_min_workers']), 
                        'max_workers': int(config['target_max_workers'])
                    }
                elif config['target_worker_count'] is not None:
                    edit_request['num_workers'] = int(config['target_worker_count'])
                
                # Restore policy
                if config['target_policy_id'] is not None:
                    edit_request['policy_id'] = config['target_policy_id']
                
                # Restore data security mode
                if config['target_data_security_mode'] is not None:
                    edit_request['data_security_mode'] = config['target_data_security_mode']
                
                # Restore elastic disk
                if config['target_enable_elastic_disk'] is not None:
                    edit_request['enable_elastic_disk'] = bool(config['target_enable_elastic_disk'])
                
                # Restore auto termination
                if config['target_auto_termination'] is not None:
                    edit_request['autotermination_minutes'] = int(config['target_auto_termination'])
                
                # Restore instance pools
                if config['target_driver_pool'] is not None:
                    edit_request['driver_instance_pool_id'] = config['target_driver_pool']
                if config['target_worker_pool'] is not None:
                    edit_request['instance_pool_id'] = config['target_worker_pool']
                
                # Restore AWS attributes from previous state (CRITICAL for policy compliance)
                if config['target_aws_attributes'] is not None:
                    aws_attrs = config['target_aws_attributes']
                    aws_attrs_dict = {}
                    
                    if aws_attrs.first_on_demand is not None:
                        aws_attrs_dict['first_on_demand'] = int(aws_attrs.first_on_demand)
                    if aws_attrs.availability:
                        aws_attrs_dict['availability'] = aws_attrs.availability
                    if aws_attrs.zone_id:
                        aws_attrs_dict['zone_id'] = aws_attrs.zone_id
                    if aws_attrs.instance_profile_arn:
                        aws_attrs_dict['instance_profile_arn'] = aws_attrs.instance_profile_arn
                    if aws_attrs.spot_bid_price_percent is not None:
                        aws_attrs_dict['spot_bid_price_percent'] = int(aws_attrs.spot_bid_price_percent)
                    if aws_attrs.ebs_volume_type:
                        aws_attrs_dict['ebs_volume_type'] = aws_attrs.ebs_volume_type
                    if aws_attrs.ebs_volume_count is not None:
                        aws_attrs_dict['ebs_volume_count'] = int(aws_attrs.ebs_volume_count)
                    if aws_attrs.ebs_volume_size is not None:
                        aws_attrs_dict['ebs_volume_size'] = int(aws_attrs.ebs_volume_size)
                    
                    if aws_attrs_dict:
                        edit_request['aws_attributes'] = aws_attrs_dict
                
                # Preserve user-specific fields from current config
                for field in ['cluster_name', 'spark_conf', 'custom_tags', 'single_user_name', 'runtime_engine']:
                    if field in cluster_config:
                        edit_request[field] = cluster_config[field]
                
                # Execute cluster edit
                edit_url = f"{api_base_url}/api/2.0/clusters/edit"
                edit_response = requests.post(edit_url, headers=headers, json=edit_request, timeout=30)
                
                if edit_response.status_code == 200:
                    print(f"   ‚úÖ Successfully reverted ({len(changes)} changes)")
                    revert_results_final.append({
                        'cluster_id': cluster_id, 
                        'cluster_name': cluster_name, 
                        'status': 'SUCCESS', 
                        'message': f'Reverted {len(changes)} settings',
                        'changes_count': len(changes)
                    })
                else:
                    error_msg = f"API error: {edit_response.text[:200]}"
                    print(f"   ‚ùå {error_msg}")
                    revert_results_final.append({
                        'cluster_id': cluster_id, 
                        'cluster_name': cluster_name, 
                        'status': 'FAILED', 
                        'message': error_msg,
                        'changes_count': len(changes)
                    })
            
            except Exception as e:
                error_msg = f"Exception: {str(e)[:200]}"
                print(f"   ‚ùå {error_msg}")
                revert_results_final.append({
                    'cluster_id': cluster_id, 
                    'cluster_name': cluster_name, 
                    'status': 'FAILED', 
                    'message': error_msg,
                    'changes_count': len(changes) if 'changes' in locals() else 0
                })
            
            time.sleep(0.5)  # Rate limiting
        
        # Display results
        print("\n" + "="*80)
        print("üìà REVERT EXECUTION SUMMARY")
        print("="*80)
        
        results_df_final = spark.createDataFrame(revert_results_final)
        display(results_df_final)
        
        success_count = len([r for r in revert_results_final if r['status'] == 'SUCCESS'])
        failed_count = len([r for r in revert_results_final if r['status'] == 'FAILED'])
        total_changes = sum([r['changes_count'] for r in revert_results_final])
        
        print(f"\nüìä STATISTICS:")
        print(f"   Total Clusters: {len(revert_results_final)}")
        print(f"   ‚úÖ Successful: {success_count}")
        print(f"   ‚ùå Failed: {failed_count}")
        print(f"   Total Changes Applied: {total_changes}")
        print(f"   Success Rate: {(success_count/len(revert_results_final)*100):.1f}%")
        
        if success_count == len(revert_results_final):
            print(f"\nüéâ ALL CLUSTERS SUCCESSFULLY REVERTED!")
            print(f"\n‚úÖ Restored Configuration Elements:")
            print(f"   ‚Ä¢ Instance types (driver & worker)")
            print(f"   ‚Ä¢ Autoscale/worker count settings")
            print(f"   ‚Ä¢ Cluster policies (governance)")
            print(f"   ‚Ä¢ Data security modes (compliance)")
            print(f"   ‚Ä¢ AWS attributes (policy-compliant)")
            print(f"   ‚Ä¢ Auto-termination settings")
            print(f"   ‚Ä¢ Elastic disk settings")
            print(f"\nüîç NEXT STEPS:")
            print(f"   1. Verify configurations in workspace: {target_workspace_url_exec}")
            print(f"   2. Confirm cluster policies are properly applied")
            print(f"   3. Test cluster functionality if needed")
            print(f"   4. Document the revert operation for audit purposes")
        elif success_count > 0:
            print(f"\n‚ö†Ô∏è  PARTIAL SUCCESS: {success_count}/{len(revert_results_final)} clusters reverted")
            print(f"\nFailed clusters:")
            for result in revert_results_final:
                if result['status'] == 'FAILED':
                    print(f"   ‚Ä¢ {result['cluster_name']}")
                    print(f"     Error: {result['message'][:150]}")
            print(f"\nüîç Review the errors above and check:")
            print(f"   1. Cluster policies exist and are accessible")
            print(f"   2. Service Principal has edit permissions")
            print(f"   3. Clusters are not terminated or deleted")
        else:
            print(f"\n‚ùå NO CLUSTERS WERE REVERTED")
            print(f"\nPlease review the errors above and verify:")
            print(f"   1. Authentication is working correctly")
            print(f"   2. Target workspace is accessible")
            print(f"   3. Cluster policies and configurations are valid")
        
    except Exception as e:
        print(f"\n‚ùå CRITICAL ERROR during revert execution:")
        print(f"   {str(e)}")
        print(f"\nPlease check:")
        print(f"   1. Service Principal credentials are valid")
        print(f"   2. Network connectivity to target workspace")
        print(f"   3. Cell 5 was run successfully before this cell")

print("\n" + "="*80)
print("üèÅ REVERT OPERATION COMPLETE")
print(f"Completion Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("="*80)

In [0]:
# ============================================================================
# UPDATE REVERT TRACKING IN CLUSTER_CONFIG_BACKUP TABLE
# ============================================================================

from datetime import datetime
from pyspark.sql.functions import col, lit, current_timestamp

print("\nüìù UPDATING REVERT TRACKING IN BACKUP TABLE...\n")
print("="*80)

if 'revert_results_final' not in locals() or len(revert_results_final) == 0:
    print("‚ö†Ô∏è  No revert results found. Please run Cell 7 first.")
else:
    try:
        # Get current user
        current_user = spark.sql("SELECT current_user() as user").first()['user']
        
        # Generate new revert batch ID
        import uuid
        revert_batch_id = str(uuid.uuid4())
        revert_timestamp = datetime.now()
        
        # Filter successful reverts
        successful_reverts = [r for r in revert_results_final if r['status'] == 'SUCCESS']
        
        if len(successful_reverts) == 0:
            print("‚ö†Ô∏è  No successful reverts to track.")
        else:
            # Get cluster IDs that were successfully reverted
            reverted_cluster_ids = [r['cluster_id'] for r in successful_reverts]
            
            print(f"Updating tracking for {len(reverted_cluster_ids)} successfully reverted clusters...")
            print(f"Revert Batch ID: {revert_batch_id}")
            print(f"Revert Timestamp: {revert_timestamp}")
            print(f"Reverted By: {current_user}")
            print(f"Original Batch: {selected_batch_label}\n")
            
            # Get the batch_id from selected_batch_label
            batch_info = spark.table(f"{selected_environment}.billing_forecast.cluster_config_backup") \
                .filter(col("execution_label") == selected_batch_label) \
                .select("batch_id") \
                .distinct() \
                .first()
            
            if batch_info is None:
                print(f"‚ùå Could not find batch_id for execution_label: {selected_batch_label}")
            else:
                original_batch_id = batch_info['batch_id']
                
                # Update the backup table for successfully reverted clusters
                from delta.tables import DeltaTable
                
                backup_table = DeltaTable.forName(spark, f"{selected_environment}.billing_forecast.cluster_config_backup")
                
                # Update rows matching the original batch and reverted cluster IDs
                backup_table.update(
                    condition = (col("batch_id") == original_batch_id) & 
                                (col("cluster_id").isin(reverted_cluster_ids)),
                    set = {
                        "is_reverted": lit(True),
                        "revert_timestamp": lit(revert_timestamp),
                        "revert_batch_id": lit(revert_batch_id),
                        "reverted_by_user": lit(current_user)
                    }
                )
                
                print("‚úÖ Successfully updated revert tracking columns\n")
                
                # Verify the updates
                updated_records = spark.table(f"{selected_environment}.billing_forecast.cluster_config_backup") \
                    .filter(
                        (col("batch_id") == original_batch_id) & 
                        (col("cluster_id").isin(reverted_cluster_ids)) &
                        (col("is_reverted") == True)
                    ) \
                    .select("cluster_name", "cluster_id", "is_reverted", "revert_timestamp", "revert_batch_id", "reverted_by_user")
                
                print("üìä VERIFICATION - Updated Records:")
                display(updated_records)
                
                updated_count = updated_records.count()
                print(f"\n‚úÖ Verified: {updated_count} records updated in backup table")
                print(f"Expected: {len(reverted_cluster_ids)} records")
                
                if updated_count == len(reverted_cluster_ids):
                    print("\nüéâ ALL REVERT TRACKING UPDATES SUCCESSFUL!")
                else:
                    print(f"\n‚ö†Ô∏è  Mismatch: Expected {len(reverted_cluster_ids)} but updated {updated_count}")
                
                displayHTML(f"""
                <div style="padding: 15px; background-color: #e8f5e9; border-left: 5px solid #4caf50; margin: 10px 0;">
                    <h3 style="margin: 0; color: #2e7d32;">‚úì Revert Tracking Updated</h3>
                    <p style="margin: 5px 0; color: #1b5e20;"><strong>Table:</strong> <code style="background-color: #c8e6c9; padding: 2px 6px; border-radius: 3px;">{selected_environment}.billing_forecast.cluster_config_backup</code></p>
                    <p style="margin: 5px 0; color: #1b5e20;"><strong>Records Updated:</strong> {updated_count}</p>
                    <p style="margin: 5px 0; color: #1b5e20;"><strong>Revert Batch ID:</strong> {revert_batch_id}</p>
                    <p style="margin: 5px 0; color: #1b5e20;"><strong>Reverted By:</strong> {current_user}</p>
                    <p style="margin: 5px 0; color: #1b5e20;"><strong>Original Batch:</strong> {selected_batch_label}</p>
                </div>
                """)
                
    except Exception as e:
        print(f"\n‚ùå ERROR updating revert tracking:")
        print(f"   {str(e)}")
        import traceback
        traceback.print_exc()

print("\n" + "="*80)
print("üèÅ REVERT TRACKING UPDATE COMPLETE")
print("="*80)

## ‚úÖ Post-Execution Verification

### Step 1: Review Execution Summary

Check the results table above:
* ‚òê Verify success count matches expected
* ‚òê Review any failed clusters and error messages
* ‚òê Note the total configuration changes applied

---

### Step 2: Verify in Target Workspace

Navigate to the target workspace and verify:

1. **Go to Compute ‚Üí Clusters**
2. **For each reverted cluster, check:**
   * ‚òê Driver instance type matches target
   * ‚òê Worker instance type matches target
   * ‚òê Autoscale settings are correct
   * ‚òê **Cluster policy is applied** (if applicable)
   * ‚òê **Data security mode is set** (if applicable)
   * ‚òê Elastic disk setting is correct
   * ‚òê Auto-termination is configured

---

### Step 3: Test Cluster Functionality (Optional)

* ‚òê Start one or more reverted clusters
* ‚òê Verify they start successfully
* ‚òê Check that policies are enforced
* ‚òê Confirm data access works as expected

---

## üîß Troubleshooting Failed Reverts

### Common Issues and Solutions:

#### 1. Policy Not Found Error
**Symptom**: `Policy ID XXX not found`

**Causes:**
* Cluster policy doesn't exist in target workspace
* Policy was deleted or renamed
* Service Principal doesn't have access to policy

**Solutions:**
* Verify policy exists: Go to Compute ‚Üí Policies in target workspace
* Check Service Principal has `CAN_USE` permission on policy
* If policy is missing, recreate it or remove policy requirement

---

#### 2. AWS Attributes Validation Error
**Symptom**: `Invalid AWS attributes` or `Policy validation failed`

**Causes:**
* AWS attributes from previous state incompatible with current policy
* Policy requirements changed since previous configuration
* Instance profile or availability zone restrictions

**Solutions:**
* Check policy definition for AWS attribute requirements
* Verify instance profile ARN is valid
* Ensure availability zone is allowed by policy
* Check `first_on_demand`, `spot_bid_price_percent` settings

---

#### 3. Permission Denied Error
**Symptom**: `User does not have permission to edit cluster`

**Causes:**
* Service Principal lacks `CAN_MANAGE` permission on cluster
* Workspace-level permissions insufficient
* Secret scope access issues

**Solutions:**
* Grant Service Principal `CAN_MANAGE` on clusters
* Check workspace admin permissions
* Verify `sp-oauth` secret scope is accessible
* Test authentication in Cell 3

---

#### 4. Cluster Not Found or Terminated
**Symptom**: `Cluster XXX not found` or `Cannot edit terminated cluster`

**Causes:**
* Cluster was deleted after batch update
* Cluster is in TERMINATED state
* Cluster ID changed

**Solutions:**
* Check cluster still exists in target workspace
* Verify cluster is not permanently deleted
* Skip deleted clusters - they don't need revert

---

#### 5. Cross-Workspace Authentication Failed
**Symptom**: `OAuth authentication failed` or `401 Unauthorized`

**Causes:**
* Service Principal credentials invalid or expired
* Secret scope not accessible
* Network connectivity issues

**Solutions:**
* Verify Service Principal credentials in `sp-oauth` scope
* Test OAuth token generation manually
* Ensure classic compute is being used (not serverless)
* Check network connectivity to target workspace

---

## üìù Audit Log Template

Use this template to document the revert operation:

```
CLUSTER CONFIGURATION REVERT - AUDIT LOG
=========================================

Execution Date: [YYYY-MM-DD HH:MM:SS]
Executed By: [Your Name]
Environment: [dev_sandbox/qa_sandbox/uat_sandbox/prod_sandbox]
Catalog: [Catalog Name]

Reference Batch Information:
- Batch Label: [Execution Label]
- Batch ID: [UUID]
- Reference Date: [YYYY-MM-DD HH:MM:SS]
- Original Executed By: [User]

Revert Details:
- Total Clusters Processed: [X]
- Successfully Reverted: [Y]
- Failed: [Z]
- Success Rate: [%]
- Target Workspace: [Workspace URL]

Configuration Changes Restored:
- Instance Types: [Count]
- Cluster Policies: [Count]
- Security Modes: [Count]
- Elastic Disk Settings: [Count]
- Auto-Termination Settings: [Count]

Verification:
- Verified in Target Workspace: [Yes/No]
- All Policies Restored: [Yes/No]
- Clusters Functional: [Yes/No]
- Tests Passed: [Yes/No]

Reason for Revert:
[Explain why the revert was necessary]

Issues Encountered:
[Document any failures or problems]

Resolution:
[How issues were resolved]

Approved By: _______________
Date: _______________
```

---

## üîÑ Re-running This Tool

To revert a different batch:

1. **Change Environment**: Use the dropdown widget to select different environment
2. **Re-run Cells 3-6**: Execute the workflow cells in sequence
3. **Select Different Batch**: Use the batch dropdown to choose a different reference point
4. **The tool uses the selected batch as a time reference** - all clusters revert to their state BEFORE that date

---

**üéâ Revert operation complete! Always verify results and maintain audit documentation.**