You are an expert Python developer specializing in the Databricks environment. Your task is to create a complete Python script to be executed within a Databricks notebook. The script must perform the following operations:
1.	Data Retrieval from SpaceX API:
o	Interact with the SpaceX v3 REST API (https://api.spacexdata.com/v3).
o	Retrieve data from one specific endpoint likely containing numerical data where missing values might occur: 
	All launches: https://api.spacexdata.com/v3/launches
	(Self-correction: While launches is common, /cores might be a better example for potential missing numericals like reuse_count, rtls_landings etc. Let's use /cores for a potentially more illustrative example, but keep /launches as an alternative)
	Alternative/Primary: All Cores: https://api.spacexdata.com/v3/cores
o	Handle potential errors during the API calls (e.g., timeouts, non-200 status codes).
2.	Missing Value Imputation (Mean):
o	Perform mean imputation on the retrieved data (list of dictionaries).
o	Imputation Logic: 
	Identify Numerical Fields: First, automatically identify the keys/fields within the dictionaries that predominantly contain numerical values (int or float). You might need to inspect the first few records or a sample to determine these fields reliably, or iterate through all records checking types.
	Calculate Mean per Field: For each identified numerical field, calculate the mean using only the existing, non-missing (not None) numerical values across all records in the dataset.
	Impute Missing Values: Iterate through the dataset again. For each numerical field, replace any missing values (represented as None) with the pre-calculated mean for that specific field.
	Handle Edge Cases: If a numerical field contains only missing values (or no valid numbers to calculate a mean), log a warning and leave the missing values as None (or impute with 0, please specify preference - let's default to leaving them None and logging).
o	The final result should be the original list of dictionaries, but with missing numerical values replaced by the calculated mean for their respective fields.
3.	Control Parameters and Debugging:
o	Include a variable at the beginning of the script to define the API endpoint URL, making it easily modifiable: 
	API_ENDPOINT_URL = "https://api.spacexdata.com/v3/cores" #(or /launches)
o	Use Python's standard logging module to provide informative output during execution. Configure logging to display messages at the INFO level.
o	Log key messages such as: starting data retrieval, number of records retrieved, starting imputation process, identified numerical fields potentially needing imputation (e.g., ['reuse_count', 'rtls_attempts', ...]), calculated mean for field X, number of missing values imputed for field X, any warnings for fields with no calculable mean, imputation complete, starting upload to httpbin, upload outcome.
4.	Execution Time Measurement:
o	Code Execution Time: Measure the time taken to perform the main operations (data retrieval + imputation). Print this time after the imputation operation is complete.
o	Pipeline Execution Time: Measure the total execution time of the entire script (from the beginning until after the upload to httpbin). Print this total time at the end of the script. Use Python's time module.
5.	Upload Result:
o	Take the resulting imputed list of dictionaries from the imputation operation.
o	Serialize it into JSON format.
o	Make an HTTP POST request to the https://httpbin.org/post endpoint, sending the resulting imputed JSON data in the request body.
o	Verify the response from httpbin.org (e.g., check the status code) and log the outcome of the upload operation.


In [0]:
# Databricks notebook source
import requests
import json
import logging
import time

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Control Parameter: API Endpoint URL
API_ENDPOINT_URL = "https://api.spacexdata.com/v3/cores"
# Alternative endpoint: API_ENDPOINT_URL = "https://api.spacexdata.com/v3/launches"

def fetch_data(api_url):
    """
    Retrieves data from the specified API endpoint.

    Args:
        api_url (str): The URL of the API endpoint.

    Returns:
        list: A list of dictionaries containing the retrieved data, or None if an error occurs.
    """
    logging.info(f"Starting data retrieval from: {api_url}")
    try:
        response = requests.get(api_url, timeout=10)
        response.raise_for_status()  # Raise HTTPError for bad responses (4xx or 5xx)
        data = response.json()
        logging.info(f"Successfully retrieved {len(data)} records.")
        return data
    except requests.exceptions.RequestException as e:
        logging.error(f"Error during API call to {api_url}: {e}")
        return None

def impute_missing_values(data):
    """
    Performs mean imputation on missing numerical values in a list of dictionaries.

    Args:
        data (list): A list of dictionaries.

    Returns:
        list: The list of dictionaries with missing numerical values imputed.
    """
    if not data:
        logging.warning("No data provided for imputation.")
        return []

    logging.info("Starting imputation process.")

    # Identify numerical fields
    numerical_fields = set()
    if data:
        first_few_records = data[:min(5, len(data))]  # Inspect the first few records
        for record in first_few_records:
            for key, value in record.items():
                if isinstance(value, (int, float)):
                    numerical_fields.add(key)

        if not numerical_fields:
            logging.info("No numerical fields found in the data.")
            return data

        logging.info(f"Identified potential numerical fields for imputation: {list(numerical_fields)}")

    # Calculate mean for each numerical field
    field_means = {}
    for field in numerical_fields:
        numerical_values = [record.get(field) for record in data if isinstance(record.get(field), (int, float))]
        if numerical_values:
            field_means[field] = sum(numerical_values) / len(numerical_values)
            logging.info(f"Calculated mean for field '{field}': {field_means[field]:.2f}")
        else:
            field_means[field] = None
            logging.warning(f"No valid numerical values found for field '{field}'. Missing values will remain as None.")

    # Impute missing values
    imputed_data = []
    for record in data:
        new_record = record.copy()
        for field, mean_value in field_means.items():
            if field in new_record and new_record[field] is None and mean_value is not None:
                new_record[field] = mean_value
                # Log the imputation (optional, can be verbose)
                # logging.info(f"Imputed missing value in field '{field}' with mean: {mean_value:.2f}")
        imputed_data.append(new_record)

    # Log the number of imputed values per field (more efficient than logging each imputation)
    for field, mean_value in field_means.items():
        if mean_value is not None:
            imputed_count = sum(1 for original_record, imputed_record in zip(data, imputed_data)
                                if original_record.get(field) is None and imputed_record.get(field) == mean_value)
            if imputed_count > 0:
                logging.info(f"Imputed {imputed_count} missing values for field '{field}'.")

    logging.info("Imputation process complete.")
    return imputed_data

def upload_data(data):
    """
    Uploads the given data as JSON to https://httpbin.org/post.

    Args:
        data (list): The list of dictionaries to upload.
    """
    logging.info("Starting upload to https://httpbin.org/post")
    try:
        response = requests.post("https://httpbin.org/post", json=data, timeout=10)
        response.raise_for_status()
        logging.info(f"Upload to https://httpbin.org/post successful. Status code: {response.status_code}")
        # Optionally log the response content for debugging:
        # logging.debug(f"Response from httpbin.org: {response.json()}")
    except requests.exceptions.RequestException as e:
        logging.error(f"Error during upload to https://httpbin.org/post: {e}")

if __name__ == "__main__":
    total_start_time = time.time()

    # Data Retrieval
    start_time = time.time()
    raw_data = fetch_data(API_ENDPOINT_URL)
    end_time = time.time()
    if raw_data:
        logging.info(f"Data retrieval took {end_time - start_time:.2f} seconds.")

        # Missing Value Imputation
        imputation_start_time = time.time()
        imputed_data = impute_missing_values(raw_data)
        imputation_end_time = time.time()
        logging.info(f"Missing value imputation took {imputation_end_time - imputation_start_time:.2f} seconds.")

        # Upload Result
        upload_data(imputed_data)
    else:
        logging.warning("No data retrieved, skipping imputation and upload.")

    total_end_time = time.time()
    logging.info(f"Total script execution time: {total_end_time - total_start_time:.2f} seconds.")