You are an expert Python developer specializing in the Databricks environment. Your task is to create a complete Python script to be executed within a Databricks notebook. The script must perform the following operations:
1.	Data Retrieval from SpaceX API:
o	Interact with the SpaceX v3 REST API (https://api.spacexdata.com/v3).
o	Retrieve data from one specific endpoint likely containing categorical data where missing values might occur: 
	All Cores: https://api.spacexdata.com/v3/cores (Fields like status, block could be candidates)
	Alternative: All Launches: https://api.spacexdata.com/v3/launches (Fields like launch_site.site_name, rocket.rocket_name)
o	Handle potential errors during the API calls (e.g., timeouts, non-200 status codes).
2.	Missing Value Imputation (Mode):
o	Perform mode imputation on the retrieved data (list of dictionaries).
o	Imputation Logic: 
	Identify Categorical Fields: First, automatically identify the keys/fields within the dictionaries that predominantly contain categorical data (e.g., strings - str). You might need to inspect the first few records or a sample, or iterate through checking types.
	Calculate Mode per Field: For each identified categorical field, determine the mode (the most frequent value) using only the existing, non-missing (not None) values across all records in the dataset. The collections.Counter class is suitable for this.
	Handle Ties: If multiple values share the highest frequency (a tie for the mode), select any one of them as the mode (e.g., the one that appears first alphabetically or the first one encountered during counting).
	Impute Missing Values: Iterate through the dataset again. For each categorical field, replace any missing values (represented as None) with the pre-calculated mode for that specific field.
	Handle Edge Cases: If a categorical field contains only missing values (or no non-missing values to calculate a mode), log a warning and leave the missing values as None.
o	The final result should be the original list of dictionaries, but with missing categorical values replaced by the calculated mode for their respective fields.
3.	Control Parameters and Debugging:
o	Include a variable at the beginning of the script to define the API endpoint URL, making it easily modifiable: 
	API_ENDPOINT_URL = "https://api.spacexdata.com/v3/cores" #(or /launches)
o	Use Python's standard logging module to provide informative output during execution. Configure logging to display messages at the INFO level.
o	Log key messages such as: starting data retrieval, number of records retrieved, starting mode imputation process, identified categorical fields potentially needing imputation (e.g., ['status', 'block', ...]), calculated mode for field X, number of missing values imputed for field X, any warnings for fields with no calculable mode, mode imputation complete, starting upload to httpbin, upload outcome.
4.	Execution Time Measurement:
o	Code Execution Time: Measure the time taken to perform the main operations (data retrieval + mode imputation). Print this time after the imputation operation is complete.
o	Pipeline Execution Time: Measure the total execution time of the entire script (from the beginning until after the upload to httpbin). Print this total time at the end of the script. Use Python's time module.
5.	Upload Result:
o	Take the resulting imputed list of dictionaries from the mode imputation operation.
o	Serialize it into JSON format.
o	Make an HTTP POST request to the https://httpbin.org/post endpoint, sending the resulting imputed JSON data in the request body.
o	Verify the response from httpbin.org (e.g., check the status code) and log the outcome of the upload operation.


In [0]:
# Databricks notebook source
import requests
import json
import time
import logging
from collections import Counter

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Control Parameter: Define the API endpoint URL
API_ENDPOINT_URL = "https://api.spacexdata.com/v3/cores"  # Or use "https://api.spacexdata.com/v3/launches"

# --- Data Retrieval ---
def fetch_data(api_url):
    """
    Retrieves data from the specified API endpoint.

    Args:
        api_url (str): The URL of the API endpoint.

    Returns:
        list or None: A list of dictionaries containing the retrieved data,
                     or None if an error occurred.
    """
    logging.info(f"Starting data retrieval from: {api_url}")
    start_time = time.time()
    try:
        response = requests.get(api_url, timeout=10)
        response.raise_for_status()  # Raise HTTPError for bad responses (4xx or 5xx)
        data = response.json()
        end_time = time.time()
        logging.info(f"Successfully retrieved {len(data)} records in {end_time - start_time:.2f} seconds.")
        return data
    except requests.exceptions.RequestException as e:
        logging.error(f"Error during API request: {e}")
        return None

# --- Missing Value Imputation (Mode) ---
def impute_missing_mode(data):
    """
    Performs mode imputation on missing values in categorical fields.

    Args:
        data (list): A list of dictionaries.

    Returns:
        list: The list of dictionaries with missing categorical values imputed.
    """
    if not data:
        logging.warning("No data provided for imputation.")
        return []

    logging.info("Starting mode imputation process.")
    start_time = time.time()

    # Identify categorical fields (assuming strings are predominantly categorical)
    categorical_fields = set()
    if data:
        first_record = data[0]
        for key, value in first_record.items():
            if isinstance(value, str):
                categorical_fields.add(key)

    logging.info(f"Identified potential categorical fields for imputation: {list(categorical_fields)}")

    modes = {}
    for field in categorical_fields:
        values = [record.get(field) for record in data if record.get(field) is not None]
        if not values:
            logging.warning(f"No non-missing values found for field '{field}'. Skipping mode calculation.")
            modes[field] = None
        else:
            mode_counter = Counter(values)
            most_common = mode_counter.most_common(1)
            if most_common:
                modes[field] = most_common[0][0]
                logging.info(f"Calculated mode for field '{field}': '{modes[field]}'")
            else:
                logging.warning(f"Could not calculate mode for field '{field}'. Skipping imputation.")
                modes[field] = None

    imputed_data = []
    for record in data:
        imputed_record = record.copy()
        for field, mode_value in modes.items():
            if mode_value is not None and imputed_record.get(field) is None:
                imputed_record[field] = mode_value
        imputed_data.append(imputed_record)

    end_time = time.time()
    logging.info(f"Mode imputation complete in {end_time - start_time:.2f} seconds.")
    return imputed_data

# --- Upload Result ---
def upload_data(data):
    """
    Uploads the given data as JSON to https://httpbin.org/post.

    Args:
        data (list): The list of dictionaries to upload.
    """
    logging.info("Starting upload to https://httpbin.org/post")
    start_time = time.time()
    try:
        json_data = json.dumps(data)
        response = requests.post("https://httpbin.org/post", data=json_data, headers={'Content-Type': 'application/json'}, timeout=10)
        response.raise_for_status()
        end_time = time.time()
        logging.info(f"Upload to httpbin.org successful. Status code: {response.status_code} in {end_time - start_time:.2f} seconds.")
        logging.info(f"Upload outcome: {response.json().get('data')[:200]}...") # Log a snippet of the response data
    except requests.exceptions.RequestException as e:
        logging.error(f"Error during upload to httpbin.org: {e}")

# --- Main Execution ---
if __name__ == "__main__":
    total_start_time = time.time()

    # Data Retrieval
    raw_data = fetch_data(API_ENDPOINT_URL)

    if raw_data:
        # Mode Imputation
        imputed_data = impute_missing_mode(raw_data)

        # Upload Result
        upload_data(imputed_data)

    total_end_time = time.time()
    logging.info(f"Total script execution time: {total_end_time - total_start_time:.2f} seconds.")