You are an expert Python developer specializing in the Databricks environment. Your task is to create a complete Python script to be executed within a Databricks notebook. The script must perform the following operations:
1.	Data Retrieval from SpaceX API:
o	Interact with the SpaceX v3 REST API (https://api.spacexdata.com/v3).
o	Retrieve data from one specific endpoint likely containing categorical data where missing values might occur: 
	All Cores: https://api.spacexdata.com/v3/cores (Fields like status, block could be candidates)
	Alternative: All Launches: https://api.spacexdata.com/v3/launches (Fields like launch_site.site_name, rocket.rocket_name)
o	Handle potential errors during the API calls (e.g., timeouts, non-200 status codes).
2.	Missing Value Imputation (Mode):
o	Perform mode imputation on the retrieved data (list of dictionaries).
o	Imputation Logic: 
	Identify Categorical Fields: First, automatically identify the keys/fields within the dictionaries that predominantly contain categorical data (e.g., strings - str). You might need to inspect the first few records or a sample, or iterate through checking types.
	Calculate Mode per Field: For each identified categorical field, determine the mode (the most frequent value) using only the existing, non-missing (not None) values across all records in the dataset. The collections.Counter class is suitable for this.
	Handle Ties: If multiple values share the highest frequency (a tie for the mode), select any one of them as the mode (e.g., the one that appears first alphabetically or the first one encountered during counting).
	Impute Missing Values: Iterate through the dataset again. For each categorical field, replace any missing values (represented as None) with the pre-calculated mode for that specific field.
	Handle Edge Cases: If a categorical field contains only missing values (or no non-missing values to calculate a mode), log a warning and leave the missing values as None.
o	The final result should be the original list of dictionaries, but with missing categorical values replaced by the calculated mode for their respective fields.
3.	Control Parameters and Debugging:
o	Include a variable at the beginning of the script to define the API endpoint URL, making it easily modifiable: 
	API_ENDPOINT_URL = "https://api.spacexdata.com/v3/cores" #(or /launches)
o	Use Python's standard logging module to provide informative output during execution. Configure logging to display messages at the INFO level.
o	Log key messages such as: starting data retrieval, number of records retrieved, starting mode imputation process, identified categorical fields potentially needing imputation (e.g., ['status', 'block', ...]), calculated mode for field X, number of missing values imputed for field X, any warnings for fields with no calculable mode, mode imputation complete, starting upload to httpbin, upload outcome.
4.	Execution Time Measurement:
o	Code Execution Time: Measure the time taken to perform the main operations (data retrieval + mode imputation). Print this time after the imputation operation is complete.
o	Pipeline Execution Time: Measure the total execution time of the entire script (from the beginning until after the upload to httpbin). Print this total time at the end of the script. Use Python's time module.
5.	Upload Result:
o	Take the resulting imputed list of dictionaries from the mode imputation operation.
o	Serialize it into JSON format.
o	Make an HTTP POST request to the https://httpbin.org/post endpoint, sending the resulting imputed JSON data in the request body.
o	Verify the response from httpbin.org (e.g., check the status code) and log the outcome of the upload operation.


In [0]:
# Databricks Notebook: SpaceX API Data Imputation Pipeline

# 1. Imports
import requests
import logging
import time
import json
from collections import Counter
from typing import List, Dict, Any

# 2. Setup logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger()

# 3. Control Parameters
API_ENDPOINT_URL = "https://api.spacexdata.com/v3/cores"  # Can switch to "https://api.spacexdata.com/v3/launches"
UPLOAD_ENDPOINT_URL = "https://httpbin.org/post"

# 4. Helper Functions

def retrieve_data(api_url: str) -> List[Dict[str, Any]]:
    """Retrieve data from the given API URL with error handling."""
    try:
        logger.info(f"Starting data retrieval from {api_url}")
        response = requests.get(api_url, timeout=10)
        response.raise_for_status()
        data = response.json()
        logger.info(f"Data retrieval complete. Number of records retrieved: {len(data)}")
        return data
    except requests.exceptions.RequestException as e:
        logger.error(f"Error fetching data from API: {e}")
        raise

def identify_categorical_fields(data: List[Dict[str, Any]]) -> List[str]:
    """Identify fields that predominantly contain categorical (string) data."""
    if not data:
        return []

    sample_record = data[0]
    categorical_fields = [key for key, value in sample_record.items() if isinstance(value, str) or value is None]
    logger.info(f"Identified categorical fields: {categorical_fields}")
    return categorical_fields

def calculate_modes(data: List[Dict[str, Any]], categorical_fields: List[str]) -> Dict[str, Any]:
    """Calculate the mode for each categorical field."""
    modes = {}
    for field in categorical_fields:
        counter = Counter()
        for record in data:
            value = record.get(field)
            if value is not None:
                counter[value] += 1
        
        if counter:
            most_common = counter.most_common()
            mode_value = most_common[0][0]
            modes[field] = mode_value
            logger.info(f"Calculated mode for field '{field}': {mode_value}")
        else:
            logger.warning(f"No non-missing values found for field '{field}'. Imputation skipped.")
    return modes

def impute_missing_values(data: List[Dict[str, Any]], modes: Dict[str, Any]) -> List[Dict[str, Any]]:
    """Impute missing categorical values with the pre-calculated mode."""
    imputed_count = {field: 0 for field in modes.keys()}
    
    for record in data:
        for field, mode_value in modes.items():
            if record.get(field) is None:
                record[field] = mode_value
                imputed_count[field] += 1

    for field, count in imputed_count.items():
        logger.info(f"Imputed {count} missing values for field '{field}'")
    return data

def upload_result(upload_url: str, data: List[Dict[str, Any]]) -> None:
    """Upload the imputed data to the given upload URL."""
    try:
        logger.info(f"Starting upload to {upload_url}")
        headers = {'Content-Type': 'application/json'}
        response = requests.post(upload_url, headers=headers, data=json.dumps(data), timeout=10)
        response.raise_for_status()
        logger.info(f"Upload successful. Status code: {response.status_code}")
    except requests.exceptions.RequestException as e:
        logger.error(f"Error during upload: {e}")
        raise

# 5. Main Execution

if __name__ == "__main__":
    # Start total pipeline timer
    total_start_time = time.time()

    # Data retrieval and imputation timer
    operation_start_time = time.time()
    
    # Step 1: Retrieve data
    data = retrieve_data(API_ENDPOINT_URL)

    # Step 2: Identify categorical fields
    categorical_fields = identify_categorical_fields(data)

    # Step 3: Calculate modes
    modes = calculate_modes(data, categorical_fields)

    # Step 4: Impute missing values
    imputed_data = impute_missing_values(data, modes)

    # End of data retrieval + imputation timer
    operation_end_time = time.time()
    operation_time = operation_end_time - operation_start_time
    logger.info(f"Data retrieval and mode imputation completed in {operation_time:.2f} seconds.")

    # Step 5: Upload result
    upload_result(UPLOAD_ENDPOINT_URL, imputed_data)

    # End total pipeline timer
    total_end_time = time.time()
    total_pipeline_time = total_end_time - total_start_time
    logger.info(f"Total pipeline execution time: {total_pipeline_time:.2f} seconds.")
