
# Data Standardization and Flattening (JSON, XML, and XLSX)

This notebook is responsible for standardizing and flattening JSON, XML, and XLSX files. It reads the raw data from the landing zone, applies schemas for validation where applicable, flattens nested structures using `depth_level`, and then writes the transformed data as Delta Parquet files.

## Key Features:
- **Multi-format Support:**
  - Reads JSON, XML, and XLSX files from the landing zone.
  - Handles nested structures and various data types and formats.

- **Schema Validation:**
  - Applies predefined schemas for data validation:
    - JSON: Schema files must be available at `landing/schemachecks/[datasetidentifier]/[datasetidentifier]_schema.json`.
    - XML: Validates against XSD (XML Schema Definition) files.
    - XLSX: Schema validation is not applicable.

- **Data Flattening:**
  - Supports flattening of nested structures in JSON and XML using `depth_level` for controlling the hierarchy level to flatten.
  - Processes XLSX files into a structured, normalized format.

- **Efficient Data Storage:**
  - Saves the processed data as Delta Parquet files for efficient storage and querying.

This notebook provides a flexible and robust framework for standardizing and preparing data for downstream analytics across multiple file formats.

---

## ADF Pipeline Integration
- The notebook is designed to work seamlessly with Azure Data Factory (ADF) pipelines.
- Parameters such as `SourceDatasetidentifier`, `SourceStorageAccount`, and other configurations are dynamically passed from the ADF pipeline at runtime.
- External parameters passed by the pipeline automatically override the default or test configurations defined in the notebook.

---

## Function Definitions
- Most of the functions used in this notebook are part of a reusable library hosted on GitHub: [Open-Dataplatform/utils-databricks](https://github.com/Open-Dataplatform/utils-databricks).
- Detailed descriptions of these functions, including their purpose and usage, can be found in the GitHub repository.
- This allows you to reuse and extend the existing functionalities in your own workflows efficiently.

---

## Use Guide
1. **Clone the Notebook:**
   - Clone this notebook and use it as a template for your data processing workflows.

2. **Customize the Code:**
   - Remove unnecessary code or datasets that are not relevant to your use case.
   - Add custom logic and transformations based on your specific requirements.

3. **Test and Validate:**
   - Use the predefined dataset configurations as an inspiration (`triton__flow_plans`, `cpx_so__nomination`, etc.) to test and validate your notebook logic.

4. **Run in Production:**
   - Integrate this notebook into an ADF pipeline and pass the required parameters dynamically for production workflows.

This notebook serves as a reusable and customizable framework for handling multi-format data standardization and flattening tasks.

## Setup

### Package Installation and Management

In [None]:
# ==============================================================
# Setup: Package Installation and Management
# ==============================================================

# Purpose:
# Manage and install essential Python packages for the Databricks project.
# Ensures compatibility by specifying exact package versions where necessary.
# Includes support for utilities, data processing, and XML/XLSX handling.

# Step 1: Optional - Remove an existing version of the custom utility package
# Uncomment the line below if a previous version of the utility needs to be removed.
# %pip uninstall databricks-custom-utils -y

# Step 2: Install required packages
# The command below installs:
# - Custom Databricks utilities (specific version from GitHub repository).
# - Libraries for SQL parsing, Excel file handling, XML processing, and syntax highlighting.
%pip install \
    git+https://github.com/Open-Dataplatform/utils-databricks.git@v0.7.2 \
    sqlparse \
    openpyxl \
    lxml \
    xmlschema \
    pygments

"""
Package Details:
- `utils-databricks`: Custom utilities for extended functionality in Databricks.
- `sqlparse`: SQL query parsing and formatting library.
- `openpyxl`: Library for handling Excel (XLSX) files.
- `lxml`: Robust library for processing XML and HTML files.
- `xmlschema`: Tools for XML schema validation and conversion.
- `pygments`: Syntax highlighting for code snippets in logs or reports.
"""

### Initialize Logger

In [None]:
# ==============================================================
# Initialize Logger
# ==============================================================

# Purpose:
# Set up a custom logger for detailed logging and debugging throughout the notebook.
# The logger offers advanced features, including:
# - Debug-level logging for in-depth insights during execution.
# - Block-style logging for structured, readable logs.
# - Syntax highlighting for SQL queries and Python code in logs.

# Step 1: Import the Logger class from the custom utilities package
from custom_utils.logging.logger import Logger

# Step 2: Initialize the Logger instance
# - `debug=True` enables detailed logs, useful for troubleshooting and analysis.
logger = Logger(debug=True)

# Log the initialization success
logger.log_message("Logger initialized successfully.")

## Widget Initialization and Test Configuration

### Clear all existing widgets

In [None]:
# ==============================================================
# Widget Initialization and Test Configuration: Clear All Widgets
# ==============================================================

# Purpose:
# Ensure a clean slate for widget initialization by removing all existing widgets.
# This step prevents duplication or conflicts with previously defined widgets.

# Step 1: Remove all existing widgets
try:
    dbutils.widgets.removeAll()
    logger.log_message("All existing widgets removed successfully.")
except Exception as e:
    logger.log_error(f"Error during widget cleanup: {str(e)}")
    raise RuntimeError(f"Failed to clear widgets: {str(e)}")

In [None]:
# ============================================================== 
# Widget Keys for Dynamic Initialization
# ==============================================================

# Global list of all possible widget keys (except "SourceDatasetidentifier")
widget_keys = [
    "FileType",
    "SourceStorageAccount",
    "DestinationStorageAccount",
    "SourceContainer",
    "SourceFileName",
    "KeyColumns",
    "FeedbackColumn",
    "DepthLevel",
    "SchemaFolderName",
    "XmlRootName",
    "SheetName"
]

# ============================================================== 
# Common Widgets
# ==============================================================

def initialize_common_widgets():
    """
    Initializes common widgets that are the same across all datasets.
    """
    try:
        dbutils.widgets.text("SourceStorageAccount", "dplandingstoragetest", "Source Storage Account")
        dbutils.widgets.text("DestinationStorageAccount", "dpuniformstoragetest", "Destination Storage Account")
        dbutils.widgets.text("SourceContainer", "landing", "Source Container")
        logger.log_message("Common widgets initialized successfully.")
    except Exception as e:
        logger.log_error(f"Error initializing common widgets: {str(e)}")
        raise RuntimeError(f"Failed to initialize common widgets: {str(e)}")

# ============================================================== 
# Clear Widgets Except Common Ones
# ==============================================================

def clear_widgets_except_common():
    """
    Clears all widgets except the common widgets like SourceDatasetidentifier,
    SourceStorageAccount, DestinationStorageAccount, and SourceContainer.
    """
    common_keys = ["SourceDatasetidentifier", "SourceStorageAccount", "DestinationStorageAccount", "SourceContainer"]
    try:
        for key in widget_keys:
            if key not in common_keys:
                try:
                    dbutils.widgets.remove(key)
                except Exception:
                    # Ignore errors if widget doesn't exist
                    continue
        logger.log_message("Cleared all dataset-specific widgets.")
    except Exception as e:
        logger.log_error(f"Error clearing widgets: {str(e)}")
        raise RuntimeError(f"Failed to clear widgets: {str(e)}")

# ============================================================== 
# Initialize Widgets Based on Selected Dataset
# ==============================================================

def initialize_widgets(selected_dataset, external_params=None):
    """
    Dynamically initializes and updates widget values based on the selected dataset.

    Args:
        selected_dataset (str): The selected dataset identifier.
        external_params (dict, optional): External parameters to override widget values.
    """
    try:
        # Step 1: Clear dataset-specific widgets except common ones
        clear_widgets_except_common()

        # Step 2: Define dataset-specific widget configurations
        dataset_config = {
            "triton__flow_plans": {
                "FileType": "json",
                "SourceFileName": "triton__flow_plans-202408*",
                "KeyColumns": "Guid",
                "FeedbackColumn": "EventTimestamp",
                "DepthLevel": "1",
                "SchemaFolderName": "schemachecks"
            },
            "cpx_so__nomination": {
                "FileType": "json",
                "SourceFileName": "cpx_so__nomination-20241127T21*",
                "KeyColumns": "flows_accountInternal_code, flows_accountExternal_code, flows_location_code, flows_direction, flows_periods_validityPeriod_begin, flows_periods_validityPeriod_end",
                "FeedbackColumn": "dateCreated",
                "DepthLevel": "",
                "SchemaFolderName": "schemachecks"
            },
            "ddp_em__dayahead_flows_nemo": {
                "FileType": "xml",
                "SourceFileName": "ddp_em__dayahead_flows_nemo-202405*",
                "KeyColumns": "TimeSeries_mRID, TimeSeries_Period_timeInterval_start, TimeSeries_Period_Point_position",
                "FeedbackColumn": "timeseries_timestamp",
                "DepthLevel": "1",
                "SchemaFolderName": "schemachecks",
                "XmlRootName": "Schedule_MarketDocument"
            },
            "ddp_cm__mfrr_settlement": {
                "FileType": "xml",
                "SourceFileName": "*",
                "KeyColumns": "mRID, TimeSeries_mRID, TimeSeries_Period_timeInterval_start, TimeSeries_Period_Point_position, TimeSeries_Period_resolution",
                "FeedbackColumn": "input_file_name",
                "DepthLevel": "",
                "SchemaFolderName": "schemachecks",
                "XmlRootName": "ReserveAllocationResult_MarketDocument"
            },
            "pluto_pc__units_scadamw": {
                "FileType": "xlsx",
                "SourceFileName": "UnitsSCADAMW.xlsx",
                "KeyColumns": "Unit_GSRN",
                "SheetName": "Sheet"
            }
        }

        # Step 3: Validate selected dataset and get its configuration
        if selected_dataset not in dataset_config:
            raise ValueError(f"Unknown dataset identifier: {selected_dataset}")
        dataset_widgets = dataset_config[selected_dataset]

        # Step 4: Create widgets dynamically based on dataset configuration
        for key, value in dataset_widgets.items():
            dbutils.widgets.text(key, value, key)

        # Step 5: Apply external parameters if provided
        if external_params:
            for key, value in external_params.items():
                if key in dataset_widgets:
                    dbutils.widgets.text(key, value, key)

        logger.log_message(f"Widgets initialized and updated for dataset: {selected_dataset}")

    except Exception as e:
        logger.log_error(f"Error initializing widgets for dataset {selected_dataset}: {str(e)}")
        raise RuntimeError(f"Failed to initialize widgets for dataset: {selected_dataset}")


# ============================================================== 
# Main Execution: Initialize SourceDatasetidentifier Dropdown
# ==============================================================

try:
    # Step 1: Initialize common widgets
    initialize_common_widgets()

    # Step 2: Ensure SourceDatasetidentifier exists
    try:
        dbutils.widgets.get("SourceDatasetidentifier")
    except Exception:
        # Initialize SourceDatasetidentifier dropdown if it doesn't exist
        dbutils.widgets.dropdown(
            "SourceDatasetidentifier",
            "triton__flow_plans",  # Default value
            [
                "triton__flow_plans",
                "cpx_so__nomination",
                "ddp_em__dayahead_flows_nemo",
                "pluto_pc__units_scadamw",
                "ddp_cm__mfrr_settlement"
            ],
            "Select Dataset Identifier"
        )
        logger.log_message("SourceDatasetidentifier dropdown initialized successfully.")

    # Step 3: Get the current value of SourceDatasetidentifier
    selected_dataset = dbutils.widgets.get("SourceDatasetidentifier")

    # Step 4: Initialize widgets for the selected dataset
    initialize_widgets(selected_dataset)

except Exception as e:
    logger.log_error(f"Error during widget initialization: {str(e)}")
    raise RuntimeError(f"Widget initialization failed: {str(e)}")

### Initialize Notebook and Retrieve Parameters

In [None]:
# ==============================================================
# Initialize Notebook and Retrieve Parameters
# ==============================================================

# Purpose:
# Set up the notebook by initializing its configuration and retrieving essential parameters.
# This ensures centralized management of settings and enables efficient debugging
# through a consistent configuration framework.

# Step 1: Import the Config class from the custom utilities package
from custom_utils.config.config import Config

# Step 2: Initialize the Config object
# - Pass `dbutils` for accessing Databricks workspace resources.
# - Set `debug=False` to disable verbose debug logs for cleaner execution.
config = Config.initialize(dbutils=dbutils, debug=False)

# Step 3: Unpack configuration parameters
# - Extracts configuration values into the notebook's global scope.
# - This simplifies access to parameters by making them available as standard variables.
config.unpack(globals())

### Verify paths and files

In [None]:
# ==============================================================
# Verify Paths and Files
# ==============================================================

# Purpose:
# Validate the required paths and files to ensure all necessary resources 
# are available for processing. This pre-check prevents runtime errors 
# by identifying and addressing issues early in the notebook execution.

# Step 1: Import the Validator class from the custom utilities package
from custom_utils.validation.validation import Validator

# Step 2: Initialize the Validator
# - Pass `config` to access path and file parameters from the configuration.
# - Set `debug=False` for standard validation logging without verbose output.
validator = Validator(config=config, debug=False)

# Step 3: Unpack validation parameters
# - Extracts validation-related parameters into the notebook's global scope.
validator.unpack(globals())

# Step 4: Perform validation and check for an exit flag
# - If critical validation fails, the notebook execution is terminated.
validator.check_and_exit()


### Exit the Notebook if Validation Fails

In [None]:
# ==============================================================
# Exit the Notebook if Validation Fails
# ==============================================================

# Purpose:
# Stop notebook execution gracefully if critical validation checks fail.
# If validation passes, continue processing with a confirmation message.

# Step 1: Check for an exit condition flagged by the Validator
if Validator.exit_notebook:
    # Step 2: Log the exit message using the logger
    # - Provides context on why the notebook execution is being terminated.
    logger.log_error(Validator.exit_notebook_message, level="error")
    
    # Step 3: Exit the notebook with a descriptive message
    # - Uses Databricks utilities to terminate execution cleanly.
    dbutils.notebook.exit(f"Notebook exited: {Validator.exit_notebook_message}")
else:
    # Step 4: Log a success message if validation passed
    # - Confirms the notebook will continue execution.
    logger.log_message("Validation passed. The notebook is proceeding without exiting.", level="info")

## Processing Workflow

### Flattening and Processing

In [None]:
# ==============================================================
# Processing Workflow - Flattening and Processing
# ==============================================================

# Purpose:
# This section executes the core data processing workflow, which includes:
# - Flattening complex hierarchical data for simplified querying and analysis.
# - Applying optional dataset-specific transformations to align with business requirements.

from pyspark.sql.functions import col
from custom_utils.transformations.dataframe import DataFrameTransformer

# Initialize the DataFrameTransformer
# - Uses the current configuration and disables debug mode for standard operation.
transformer = DataFrameTransformer(config=config, debug=False)

try:
    # Step 1: Process and flatten the data
    # - Produces both the initial DataFrame and its flattened version.
    # - depth_level controls the level of flattening for nested structures.
    df_initial, df_flattened = transformer.process_and_flatten_data(depth_level=depth_level)

    # Step 2: Apply dataset-specific transformations (if applicable)
    # The following examples demonstrate how to implement renaming and casting
    # for specific datasets. These transformations are optional and can be removed
    # or customized based on dataset requirements.

    # Example: Triton flow plans
    # - Renames the column "Timestamp" to "EventTimestamp".
    # - Casts the "Timestamp" field to the timestamp data type for consistency.
    if config.source_datasetidentifier == "triton__flow_plans":
        df_flattened = (
            df_flattened
            .withColumn("Timestamp", col("Timestamp").cast("timestamp"))
            .withColumnRenamed("Timestamp", "EventTimestamp")
        )
        logger.log_message("Applied transformations for 'triton__flow_plans'.", level="info")

    # Example: CPX SO nomination
    # - Casts specific fields to timestamp for consistent temporal representation.
    # - This is necessary for fields like validity periods and timestamps.
    if config.source_datasetidentifier == "cpx_so__nomination":
        df_flattened = (
            df_flattened
            .withColumn("dateCreated", col("dateCreated").cast("timestamp"))
            .withColumn("validityPeriod_begin", col("validityPeriod_begin").cast("timestamp"))
            .withColumn("validityPeriod_end", col("validityPeriod_end").cast("timestamp"))
            .withColumn("flows_periods_validityPeriod_begin", col("flows_periods_validityPeriod_begin").cast("timestamp"))
            .withColumn("flows_periods_validityPeriod_end", col("flows_periods_validityPeriod_end").cast("timestamp"))
        )
        logger.log_message("Applied transformations for 'cpx_so__nomination'.", level="info")

    # Step 3: Display the initial and flattened DataFrames for user verification
    # - Provides a visual check for the raw and processed data.
    logger.log_block("Displaying the initial and flattened DataFrames.", level="info")
    logger.log_message("Initial DataFrame:", level="info")
    display(df_initial)

    logger.log_message("Flattened DataFrame:", level="info")
    display(df_flattened)

except Exception as e:
    # Step 4: Handle errors gracefully
    # - Logs the error details for debugging and terminates the process.
    logger.log_message(f"Error during processing: {str(e)}", level="error")
    dbutils.notebook.exit(f"Processing failed: {str(e)}")

## Quality check 

### Perform Quality Check and Remove Duplicates

In [None]:
# ==============================================================
# Quality Check - Perform Quality Check and Remove Duplicates
# ==============================================================

# Purpose:
# This section performs data quality checks to ensure:
# - The integrity, accuracy, and consistency of the processed data.
# - Duplicate records are identified and optionally removed.
# - Additional quality checks (e.g., null value checks, value range checks) are executed.

from custom_utils.quality.quality import DataQualityManager

# Step 1: Initialize the DataQualityManager
# - This class manages all quality check operations and logs relevant information.
quality_manager = DataQualityManager(logger=logger, debug=True)

# Step 2: Log available quality checks
# - Provides an overview of checks supported by the quality manager for user reference.
quality_manager.describe_available_checks()

# Step 3: Execute data quality checks on the flattened DataFrame
try:
    # Perform quality checks with the following configurations:
    cleaned_data_view = quality_manager.perform_data_quality_checks(
        spark=spark,  # Required: Spark session.
        df=df_flattened,  # Required: DataFrame to perform quality checks on.
        
        # Key columns for partitioning and duplicate checking.
        # - Required parameter to identify unique records in the dataset.
        key_columns=key_columns,
        
        # Optional: Columns for ordering within partitions (e.g., to select the latest record).
        # - Defaults to `key_columns` if not provided.
        order_by=feedback_column,
        
        # Optional: Column to use for duplicate removal ordering.
        # - If not provided, falls back to `key_columns`.
        feedback_column=feedback_column,
        
        # Optional: Column for referential integrity check against a reference DataFrame.
        # - Ensures that foreign key relationships are maintained.
        join_column=key_columns,
        
        # Optional: Exclude specified columns from the final DataFrame.
        # - For example, `input_file_name` is excluded to avoid irrelevant data in output.
        columns_to_exclude=["input_file_name"],
        
        # Optional: Specify whether to use Python or SQL syntax for quality checks.
        # - Default is SQL-based for optimized performance.
        use_python=False
    )

    # Description of Arguments:
    # - `spark`: Spark session used for distributed processing (required).
    # - `df`: The DataFrame on which quality checks are performed (required).
    # - `key_columns`: Columns used for identifying unique records (required).
    # - `order_by`: Columns for ordering within partitions (optional; defaults to `key_columns`).
    # - `feedback_column`: Column used for ordering duplicates (optional; falls back to `key_columns`).
    # - `join_column`: Column for referential integrity validation (optional).
    # - `columns_to_exclude`: List of columns to exclude from the final DataFrame (optional).
    # - `use_python`: Boolean flag to select Python-based or SQL-based operations (optional).

except Exception as e:
    # Handle any errors during the quality check process
    logger.log_error(f"Error during quality check: {str(e)}")
    raise RuntimeError(f"Quality check failed: {str(e)}")

## Unified Data Management

### Table Creation and Data Merging

In [None]:
# ==============================================================
# Unified Data Management: Table Creation and Data Merging
# ==============================================================

# Purpose:
# This section handles the creation of destination tables and merges
# processed data into the respective storage location. It ensures:
# - Data is written to a unified storage with consistent formatting.
# - Merging supports updates, inserts, and deletions seamlessly.
# - Storage operations are managed efficiently with robust logging.

from custom_utils.catalog.catalog_utils import DataStorageManager

# Step 1: Initialize the DataStorageManager
# - Manages operations related to data storage and merging.
# - Includes detailed logging and debugging capabilities.
storage_manager = DataStorageManager(logger=logger, debug=True)

# Step 2: Perform the data storage operation
try:
    # Manage data operation with the following configurations (these parameters are defined in the configuration and passed as parameters to the function, and can be overridden if necessary):
    # The used variables (cleaned_data_view, key_columns, etc.) are created and assigned by call to config.unpack(globals()) above, under section "Initialize Notebook and Retrieve Parameters"
    storage_manager.manage_data_operation(
        spark=spark,  # Required: Spark session for executing SQL or DataFrame operations.
        dbutils=dbutils,  # Required: Databricks utilities for interacting with storage.

        # Name of the cleaned data view containing the processed DataFrame.
        # - Required parameter that holds the cleaned and transformed data.
        cleaned_data_view=cleaned_data_view,

        # Key columns used for matching records during the merge operation.
        # - Required parameter to ensure data consistency during updates.
        key_columns=key_columns,

        # Destination folder path for storing data as Delta files.
        # - Optional: If not provided, a default path defined in the configuration is used.
        destination_folder_path=destination_data_folder_path,

        # Target database or environment for storing the data.
        # - Optional: Overrides the default destination environment if provided.
        destination_environment=destination_environment,

        # Target table name or identifier for the source dataset.
        # - Optional: Allows dynamic specification of the target table for merging.
        source_datasetidentifier=source_datasetidentifier,

        # Boolean flag to select SQL-based (default) or Python DataFrame-based operations.
        # - Optional: Default is `False` to prioritize SQL for performance.
        use_python=False
    )

    # Description of Arguments:
    # - `spark`: Active Spark session (required).
    # - `dbutils`: Databricks utilities object for workspace interaction (required).
    # - `cleaned_data_view`: Name of the view containing cleaned data (required).
    # - `key_columns`: Columns used for identifying unique records during merge (required).
    # - `destination_folder_path`: Override for the destination folder (optional).
    # - `destination_environment`: Override for the target database/environment (optional).
    # - `source_datasetidentifier`: Override for the source dataset/table identifier (optional).
    # - `use_python`: Boolean flag for using Python or SQL operations (optional).

    # Log success
    logger.log_message("Data successfully written and merged into the destination table.")

except Exception as e:
    # Step 3: Handle errors during the data storage process
    # - Logs the error details and raises a RuntimeError to terminate execution.
    logger.log_error(f"Error during data storage operation: {str(e)}")
    raise RuntimeError(f"Data storage operation failed: {str(e)}")

## Finishing

### Return period (from_datetime, to_datetime) covered by data read

In [None]:
# ==============================================================
# Finishing - Return Period Covered by Data Read
# ==============================================================

# Purpose:
# This section generates the feedback timestamps, providing the time 
# period covered by the processed and stored data. It calculates 
# `from_datetime` and `to_datetime` based on the data in the cleaned 
# data view.

# Description of the function:
# - `generate_feedback_timestamps`: A method to calculate and return 
#   feedback timestamps for the data processing period.
# - Uses either the `feedback_column` or the first column in `key_columns` 
#   to determine the timestamp range.
# 
# Parameters:
# - `spark (SparkSession)`: Required. Active Spark session.
# - `view_name (str)`: Required. The name of the cleaned data view.
# - `feedback_column (Optional[str])`: Optional. Column to use for calculating 
#   feedback timestamps. Defaults to None, in which case the first column in 
#   `key_columns` is used.
# - `key_columns (Optional[Union[str, List[str]]])`: Optional. Columns to group 
#   by when calculating the timestamp range.

# Step 1: Generate feedback timestamps
try:
    # Call the `generate_feedback_timestamps` function with the following:
    # - The active Spark session to access the view.
    # - The name of the cleaned data view containing processed data.
    # - The `feedback_column` for timestamp calculations (optional).
    # - The `key_columns` for grouping (optional).
    notebook_output = storage_manager.generate_feedback_timestamps(
        spark=spark,  # Active Spark session
        view_name=cleaned_data_view,  # The view containing cleaned and processed data
        feedback_column=feedback_column,  # Column used for identifying feedback periods
        key_columns=key_columns  # Key columns for grouping and extracting timestamp bounds
    )

    # Log the successful generation of feedback timestamps
    logger.log_message("Feedback timestamps successfully generated.", level="info")

except Exception as e:
    # Handle errors during feedback timestamp generation
    logger.log_error(f"Error generating feedback timestamps: {str(e)}")
    raise RuntimeError(f"Failed to generate feedback timestamps: {str(e)}")

## Exit the notebook

In [None]:
# ==============================================================
# Exit the Notebook
# ==============================================================

# Purpose:
# Finalize the notebook execution by exiting and returning the output.
# The output provides a summary of the period covered by the processed data,
# ensuring a clear handoff to any downstream workflows.

# Step 1: Exit the notebook with the generated output
try:
    # Use dbutils to exit the notebook gracefully
    # - The `notebook_output` contains feedback timestamps or relevant results.
    dbutils.notebook.exit(notebook_output)

    # Log the successful exit for tracking and debugging purposes
    logger.log_message(f"Notebook exited successfully with output: {notebook_output}", level="info")

except Exception as e:
    # Handle errors during the exit process
    # - Logs the error and raises a RuntimeError to signal failure.
    logger.log_error(f"Error during notebook exit: {str(e)}")
    raise RuntimeError(f"Failed to exit the notebook: {str(e)}")