You are an expert Python developer specializing in the Databricks environment. Your task is to create a complete Python script to be executed within a Databricks notebook. The script must perform the following operations:
1.	Data Retrieval from SpaceX API:
o	Interact with the SpaceX v3 REST API (https://api.spacexdata.com/v3).
o	Retrieve data from one specific endpoint likely containing categorical data where missing values might occur: 
	All Cores: https://api.spacexdata.com/v3/cores (Fields like status, block could be candidates)
	Alternative: All Launches: https://api.spacexdata.com/v3/launches (Fields like launch_site.site_name, rocket.rocket_name)
o	Handle potential errors during the API calls (e.g., timeouts, non-200 status codes).
2.	Missing Value Imputation (Mode):
o	Perform mode imputation on the retrieved data (list of dictionaries).
o	Imputation Logic: 
	Identify Categorical Fields: First, automatically identify the keys/fields within the dictionaries that predominantly contain categorical data (e.g., strings - str). You might need to inspect the first few records or a sample, or iterate through checking types.
	Calculate Mode per Field: For each identified categorical field, determine the mode (the most frequent value) using only the existing, non-missing (not None) values across all records in the dataset. The collections.Counter class is suitable for this.
	Handle Ties: If multiple values share the highest frequency (a tie for the mode), select any one of them as the mode (e.g., the one that appears first alphabetically or the first one encountered during counting).
	Impute Missing Values: Iterate through the dataset again. For each categorical field, replace any missing values (represented as None) with the pre-calculated mode for that specific field.
	Handle Edge Cases: If a categorical field contains only missing values (or no non-missing values to calculate a mode), log a warning and leave the missing values as None.
o	The final result should be the original list of dictionaries, but with missing categorical values replaced by the calculated mode for their respective fields.
3.	Control Parameters and Debugging:
o	Include a variable at the beginning of the script to define the API endpoint URL, making it easily modifiable: 
	API_ENDPOINT_URL = "https://api.spacexdata.com/v3/cores" #(or /launches)
o	Use Python's standard logging module to provide informative output during execution. Configure logging to display messages at the INFO level.
o	Log key messages such as: starting data retrieval, number of records retrieved, starting mode imputation process, identified categorical fields potentially needing imputation (e.g., ['status', 'block', ...]), calculated mode for field X, number of missing values imputed for field X, any warnings for fields with no calculable mode, mode imputation complete, starting upload to httpbin, upload outcome.
4.	Execution Time Measurement:
o	Code Execution Time: Measure the time taken to perform the main operations (data retrieval + mode imputation). Print this time after the imputation operation is complete.
o	Pipeline Execution Time: Measure the total execution time of the entire script (from the beginning until after the upload to httpbin). Print this total time at the end of the script. Use Python's time module.
5.	Upload Result:
o	Take the resulting imputed list of dictionaries from the mode imputation operation.
o	Serialize it into JSON format.
o	Make an HTTP POST request to the https://httpbin.org/post endpoint, sending the resulting imputed JSON data in the request body.
o	Verify the response from httpbin.org (e.g., check the status code) and log the outcome of the upload operation.


In [0]:
#!/usr/bin/env python
# SpaceX API Data Processing with Mode Imputation
# For Databricks Environment

import requests
import json
import time
import logging
from collections import Counter
from typing import Dict, List, Any, Optional

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# API endpoint configuration
API_ENDPOINT_URL = "https://api.spacexdata.com/v3/cores"  # Alternative: "https://api.spacexdata.com/v3/launches"
HTTPBIN_URL = "https://httpbin.org/post"

def retrieve_spacex_data(url: str) -> List[Dict[str, Any]]:
    """
    Retrieves data from the SpaceX API and handles potential errors.
    
    Args:
        url: The API endpoint URL
        
    Returns:
        List of dictionaries containing the API response data
    """
    logger.info(f"Starting data retrieval from {url}")
    
    try:
        response = requests.get(url, timeout=30)
        response.raise_for_status()  # Raise exception for 4XX/5XX responses
        data = response.json()
        logger.info(f"Successfully retrieved {len(data)} records from the API")
        return data
    except requests.exceptions.Timeout:
        logger.error("Request timed out. Check network connection or API availability.")
        raise
    except requests.exceptions.HTTPError as e:
        logger.error(f"HTTP error occurred: {e}")
        raise
    except requests.exceptions.RequestException as e:
        logger.error(f"Error during API request: {e}")
        raise
    except json.JSONDecodeError:
        logger.error("Failed to parse API response as JSON")
        raise

def identify_categorical_fields(data: List[Dict[str, Any]], sample_size: int = 10) -> List[str]:
    """
    Identifies potentially categorical fields in the dataset.
    
    Args:
        data: List of dictionaries
        sample_size: Number of records to examine
        
    Returns:
        List of field names that appear to be categorical
    """
    if not data:
        return []
    
    # Sample data to examine
    sample = data[:min(sample_size, len(data))]
    
    # Track fields and their value types
    field_types = {}
    categorical_fields = []
    
    # Examine each record in the sample
    for record in sample:
        for field, value in record.items():
            # Skip None values
            if value is None:
                continue
                
            # For nested dictionaries, skip processing
            if isinstance(value, dict):
                continue
                
            # For lists, skip processing
            if isinstance(value, list):
                continue
                
            # Update field_types dictionary
            if field not in field_types:
                field_types[field] = []
            
            field_types[field].append(type(value))
    
    # Identify fields that predominantly contain strings (categorical)
    for field, types in field_types.items():
        # Calculate percentage of string values
        if types:
            string_percentage = types.count(str) / len(types)
            
            # If more than 70% of non-None values are strings, consider it categorical
            if string_percentage > 0.7:
                categorical_fields.append(field)
    
    return categorical_fields

def calculate_modes(data: List[Dict[str, Any]], categorical_fields: List[str]) -> Dict[str, Any]:
    """
    Calculates the mode for each categorical field.
    
    Args:
        data: List of dictionaries
        categorical_fields: List of field names considered categorical
        
    Returns:
        Dictionary mapping field names to their mode values
    """
    field_modes = {}
    
    for field in categorical_fields:
        # Collect all non-None values for this field
        field_values = [record[field] for record in data if field in record and record[field] is not None]
        
        if field_values:
            # Calculate mode using Counter
            counter = Counter(field_values)
            mode_value = counter.most_common(1)[0][0]
            field_modes[field] = mode_value
            logger.info(f"Field '{field}': Mode calculated as '{mode_value}'")
        else:
            logger.warning(f"Field '{field}': Contains only missing values, no mode calculated")
            field_modes[field] = None
            
    return field_modes

def perform_mode_imputation(data: List[Dict[str, Any]], field_modes: Dict[str, Any]) -> List[Dict[str, Any]]:
    """
    Imputes missing values with the mode for each categorical field.
    
    Args:
        data: List of dictionaries
        field_modes: Dictionary mapping field names to their mode values
        
    Returns:
        List of dictionaries with missing values imputed
    """
    imputed_data = data.copy()
    imputation_counts = {field: 0 for field in field_modes}
    
    for i, record in enumerate(imputed_data):
        for field, mode_value in field_modes.items():
            # Skip fields with no calculable mode
            if mode_value is None:
                continue
                
            # Check if field exists and is None
            if field in record and record[field] is None:
                record[field] = mode_value
                imputation_counts[field] += 1
    
    # Log imputation counts
    for field, count in imputation_counts.items():
        logger.info(f"Field '{field}': Imputed {count} missing values")
    
    return imputed_data

def upload_to_httpbin(data: List[Dict[str, Any]]) -> requests.Response:
    """
    Uploads the processed data to httpbin.org.
    
    Args:
        data: List of dictionaries to upload
        
    Returns:
        Response object from the HTTP request
    """
    logger.info("Starting upload to httpbin.org")
    
    try:
        json_data = json.dumps(data)
        response = requests.post(HTTPBIN_URL, data=json_data, headers={'Content-Type': 'application/json'}, timeout=30)
        response.raise_for_status()
        logger.info(f"Upload successful: Status code {response.status_code}")
        return response
    except requests.exceptions.RequestException as e:
        logger.error(f"Error during upload: {e}")
        raise

def main():
    """Main function to orchestrate the data processing pipeline."""
    # Start timing for the entire pipeline
    pipeline_start_time = time.time()
    
    try:
        # Data retrieval
        raw_data = retrieve_spacex_data(API_ENDPOINT_URL)
        
        # Start timing for data processing
        processing_start_time = time.time()
        
        # Identify categorical fields
        logger.info("Identifying categorical fields for imputation")
        categorical_fields = identify_categorical_fields(raw_data)
        logger.info(f"Identified categorical fields: {categorical_fields}")
        
        # Calculate mode for each categorical field
        logger.info("Starting mode calculation")
        field_modes = calculate_modes(raw_data, categorical_fields)
        
        # Perform mode imputation
        logger.info("Starting mode imputation process")
        imputed_data = perform_mode_imputation(raw_data, field_modes)
        
        # End timing for data processing
        processing_end_time = time.time()
        processing_time = processing_end_time - processing_start_time
        logger.info(f"Data processing (imputation) completed in {processing_time:.2f} seconds")
        
        # Upload imputed data to httpbin
        response = upload_to_httpbin(imputed_data)
        
        # End timing for the entire pipeline
        pipeline_end_time = time.time()
        pipeline_time = pipeline_end_time - pipeline_start_time
        logger.info(f"Total pipeline execution completed in {pipeline_time:.2f} seconds")
        
        # Return imputed data for potential further use in the notebook
        return imputed_data
        
    except Exception as e:
        logger.error(f"Pipeline execution failed: {str(e)}")
        raise

# Execute the pipeline when the notebook cell is run
if __name__ == "__main__":
    result = main()
    
    # Display some results in the notebook for verification
    print(f"Processed {len(result)} SpaceX records with mode imputation")
    
    # In a Databricks notebook, you can also display the first few records
    # Display the first 5 records
    print("\nSample of processed data (first 5 records):")
    for i, record in enumerate(result[:5]):
        print(f"Record {i+1}: {record}")