You are an expert Python developer specializing in the Databricks environment. Your task is to create a complete Python script to be executed within a Databricks notebook. The script must perform the following operations:
1.	Data Retrieval from SpaceX API:
o	Interact with the SpaceX v3 REST API (https://api.spacexdata.com/v3).
o	Retrieve data from one specific endpoint likely containing numerical data where missing values might occur: 
	All launches: https://api.spacexdata.com/v3/launches
	(Self-correction: While launches is common, /cores might be a better example for potential missing numericals like reuse_count, rtls_landings etc. Let's use /cores for a potentially more illustrative example, but keep /launches as an alternative)
	Alternative/Primary: All Cores: https://api.spacexdata.com/v3/cores
o	Handle potential errors during the API calls (e.g., timeouts, non-200 status codes).
2.	Missing Value Imputation (Mean):
o	Perform mean imputation on the retrieved data (list of dictionaries).
o	Imputation Logic: 
	Identify Numerical Fields: First, automatically identify the keys/fields within the dictionaries that predominantly contain numerical values (int or float). You might need to inspect the first few records or a sample to determine these fields reliably, or iterate through all records checking types.
	Calculate Mean per Field: For each identified numerical field, calculate the mean using only the existing, non-missing (not None) numerical values across all records in the dataset.
	Impute Missing Values: Iterate through the dataset again. For each numerical field, replace any missing values (represented as None) with the pre-calculated mean for that specific field.
	Handle Edge Cases: If a numerical field contains only missing values (or no valid numbers to calculate a mean), log a warning and leave the missing values as None (or impute with 0, please specify preference - let's default to leaving them None and logging).
o	The final result should be the original list of dictionaries, but with missing numerical values replaced by the calculated mean for their respective fields.
3.	Control Parameters and Debugging:
o	Include a variable at the beginning of the script to define the API endpoint URL, making it easily modifiable: 
	API_ENDPOINT_URL = "https://api.spacexdata.com/v3/cores" #(or /launches)
o	Use Python's standard logging module to provide informative output during execution. Configure logging to display messages at the INFO level.
o	Log key messages such as: starting data retrieval, number of records retrieved, starting imputation process, identified numerical fields potentially needing imputation (e.g., ['reuse_count', 'rtls_attempts', ...]), calculated mean for field X, number of missing values imputed for field X, any warnings for fields with no calculable mean, imputation complete, starting upload to httpbin, upload outcome.
4.	Execution Time Measurement:
o	Code Execution Time: Measure the time taken to perform the main operations (data retrieval + imputation). Print this time after the imputation operation is complete.
o	Pipeline Execution Time: Measure the total execution time of the entire script (from the beginning until after the upload to httpbin). Print this total time at the end of the script. Use Python's time module.
5.	Upload Result:
o	Take the resulting imputed list of dictionaries from the imputation operation.
o	Serialize it into JSON format.
o	Make an HTTP POST request to the https://httpbin.org/post endpoint, sending the resulting imputed JSON data in the request body.
o	Verify the response from httpbin.org (e.g., check the status code) and log the outcome of the upload operation.


In [0]:
# SpaceX API Data Processing with Mean Imputation
# For execution in Databricks notebooks

import requests
import json
import time
import logging
import statistics
from typing import Dict, List, Any, Union, Optional

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# API Configuration
API_ENDPOINT_URL = "https://api.spacexdata.com/v3/cores"  # Can be modified to /launches if needed
HTTPBIN_URL = "https://httpbin.org/post"
REQUEST_TIMEOUT = 30  # seconds

def retrieve_spacex_data() -> List[Dict[str, Any]]:
    """
    Retrieves data from the SpaceX API.
    
    Returns:
        List of dictionaries containing SpaceX data.
    
    Raises:
        Exception: If the API request fails.
    """
    logger.info(f"Starting data retrieval from {API_ENDPOINT_URL}")
    try:
        response = requests.get(API_ENDPOINT_URL, timeout=REQUEST_TIMEOUT)
        response.raise_for_status()  # Raises an exception for 4XX/5XX responses
        data = response.json()
        logger.info(f"Successfully retrieved {len(data)} records from the SpaceX API")
        return data
    except requests.exceptions.Timeout:
        logger.error(f"Request to {API_ENDPOINT_URL} timed out after {REQUEST_TIMEOUT} seconds")
        raise
    except requests.exceptions.HTTPError as e:
        logger.error(f"HTTP error occurred: {e}")
        raise
    except requests.exceptions.RequestException as e:
        logger.error(f"Error during request to SpaceX API: {e}")
        raise
    except json.JSONDecodeError:
        logger.error("Failed to decode JSON response from SpaceX API")
        raise

def identify_numerical_fields(data: List[Dict[str, Any]]) -> List[str]:
    """
    Identifies keys in the data that predominantly contain numerical values.
    
    Args:
        data: List of dictionaries containing SpaceX data.
        
    Returns:
        List of keys that contain numerical values.
    """
    logger.info("Identifying numerical fields in the dataset")
    
    if not data:
        logger.warning("Empty dataset provided, cannot identify numerical fields")
        return []
    
    # Track counts of numerical values for each field
    field_counts = {}
    
    # Examine each record in the dataset
    for record in data:
        for key, value in record.items():
            if key not in field_counts:
                field_counts[key] = {"numerical": 0, "total": 0}
            
            field_counts[key]["total"] += 1
            
            # Check if the value is a number (int or float) but not a boolean
            if isinstance(value, (int, float)) and not isinstance(value, bool):
                field_counts[key]["numerical"] += 1
    
    # Consider a field numerical if at least 50% of non-None values are numbers
    numerical_fields = []
    for key, counts in field_counts.items():
        if counts["total"] > 0 and counts["numerical"] / counts["total"] >= 0.5:
            numerical_fields.append(key)
    
    logger.info(f"Identified numerical fields: {numerical_fields}")
    return numerical_fields

def calculate_means(data: List[Dict[str, Any]], numerical_fields: List[str]) -> Dict[str, Optional[float]]:
    """
    Calculates the mean for each numerical field in the data.
    
    Args:
        data: List of dictionaries containing SpaceX data.
        numerical_fields: List of keys that contain numerical values.
        
    Returns:
        Dictionary mapping field names to their mean values.
    """
    logger.info("Calculating means for numerical fields")
    means = {}
    
    for field in numerical_fields:
        # Collect all non-None numerical values for this field
        values = [
            record[field] for record in data 
            if field in record and 
            record[field] is not None and 
            isinstance(record[field], (int, float)) and
            not isinstance(record[field], bool)
        ]
        
        # Calculate mean if there are values
        if values:
            field_mean = statistics.mean(values)
            means[field] = field_mean
            logger.info(f"Calculated mean for field '{field}': {field_mean}")
        else:
            means[field] = None
            logger.warning(f"Field '{field}' has no valid numerical values to calculate mean")
    
    return means

def perform_mean_imputation(data: List[Dict[str, Any]], numerical_fields: List[str], means: Dict[str, Optional[float]]) -> List[Dict[str, Any]]:
    """
    Imputes missing values in numerical fields with their respective means.
    
    Args:
        data: List of dictionaries containing SpaceX data.
        numerical_fields: List of keys that contain numerical values.
        means: Dictionary mapping field names to their mean values.
        
    Returns:
        List of dictionaries with imputed values.
    """
    logger.info("Starting mean imputation process")
    imputed_data = []
    imputation_counts = {field: 0 for field in numerical_fields}
    
    for record in data:
        imputed_record = record.copy()
        
        for field in numerical_fields:
            # Skip fields not present in this record
            if field not in imputed_record:
                continue
            
            # Skip fields where we couldn't calculate a mean
            if means[field] is None:
                continue
                
            # Impute missing values
            if imputed_record[field] is None:
                imputed_record[field] = means[field]
                imputation_counts[field] += 1
        
        imputed_data.append(imputed_record)
    
    # Log imputation counts
    for field, count in imputation_counts.items():
        if count > 0:
            logger.info(f"Imputed {count} missing values for field '{field}'")
        else:
            logger.info(f"No imputation needed for field '{field}'")
    
    logger.info("Imputation process complete")
    return imputed_data

def upload_to_httpbin(data: List[Dict[str, Any]]) -> Dict[str, Any]:
    """
    Uploads the imputed data to httpbin.org.
    
    Args:
        data: List of dictionaries with imputed values.
        
    Returns:
        Response from httpbin.org.
        
    Raises:
        Exception: If the upload fails.
    """
    logger.info(f"Starting upload to {HTTPBIN_URL}")
    try:
        # Serialize the data to JSON
        json_data = json.dumps(data)
        
        # Upload to httpbin.org
        response = requests.post(
            HTTPBIN_URL, 
            data=json_data,
            headers={"Content-Type": "application/json"},
            timeout=REQUEST_TIMEOUT
        )
        response.raise_for_status()
        
        # Parse the response
        result = response.json()
        logger.info(f"Successfully uploaded data to {HTTPBIN_URL}")
        return result
    except requests.exceptions.Timeout:
        logger.error(f"Request to {HTTPBIN_URL} timed out after {REQUEST_TIMEOUT} seconds")
        raise
    except requests.exceptions.HTTPError as e:
        logger.error(f"HTTP error occurred during upload: {e}")
        raise
    except requests.exceptions.RequestException as e:
        logger.error(f"Error during upload to httpbin: {e}")
        raise
    except json.JSONDecodeError:
        logger.error("Failed to decode JSON response from httpbin")
        raise

def main():
    """
    Main function that executes the entire data processing pipeline.
    """
    # Start timing the whole pipeline
    pipeline_start_time = time.time()
    
    try:
        # Start timing the data retrieval and imputation
        processing_start_time = time.time()
        
        # Step 1: Retrieve data from SpaceX API
        data = retrieve_spacex_data()
        
        # Step 2: Identify numerical fields
        numerical_fields = identify_numerical_fields(data)
        
        # Step 3: Calculate means for numerical fields
        means = calculate_means(data, numerical_fields)
        
        # Step 4: Perform mean imputation
        imputed_data = perform_mean_imputation(data, numerical_fields, means)
        
        # Calculate and log processing time
        processing_time = time.time() - processing_start_time
        logger.info(f"Data retrieval and imputation completed in {processing_time:.2f} seconds")
        
        # Step 5: Upload the imputed data to httpbin.org
        upload_result = upload_to_httpbin(imputed_data)
        logger.info("Upload result received from httpbin")
        
        # Calculate and log total pipeline time
        pipeline_time = time.time() - pipeline_start_time
        logger.info(f"Total pipeline execution completed in {pipeline_time:.2f} seconds")
        
        # Return the imputed data for further use in the notebook if needed
        return imputed_data
        
    except Exception as e:
        logger.error(f"Pipeline execution failed: {e}")
        # Calculate time even in case of failure
        pipeline_time = time.time() - pipeline_start_time
        logger.info(f"Pipeline execution failed after {pipeline_time:.2f} seconds")
        raise

# Execute the main function when the notebook cell is run
if __name__ == "__main__":
    imputed_data = main()