
# Data Standardization and Flattening (JSON, XML, and XLSX)
This notebook is responsible for standardizing and flattening JSON, XML, and XLSX files. It reads the raw data from the landing zone, applies schemas for validation where applicable, flattens nested structures using `depth_level`, and then writes the transformed data as Delta Parquet files.

## Key Features:
- **Multi-format Support:**
  - Reads JSON, XML, and XLSX files from the landing zone.
  - Handles nested structures and various data types in all formats.

- **Schema Validation:**
  - Applies predefined schemas for data validation:
    - JSON: Schema files must be available at `landing/schemachecks/[datasetidentifier]/[datasetidentifier]_schema.json`.
    - XML: Validates against XSD (XML Schema Definition) files.
    - XLSX: Schema validation is not applicable.

- **Data Flattening:**
  - Supports flattening of nested structures in JSON and XML using `depth_level` for controlling the hierarchy level to flatten.
  - Processes XLSX files into a structured, normalized format.

- **Efficient Data Storage:**
  - Saves the processed data as Delta Parquet files for efficient storage and querying.

This notebook provides a flexible and robust framework for standardizing and preparing data for downstream analytics across multiple file formats.

## Widget Initialization and Test Configuration

### Clear all existing widgets

In [None]:
# ==============================================================
# Widget Initialization and Test Configuration: Clear All Widgets
# ==============================================================

# Purpose:
# Ensure a clean slate for widget initialization by removing all existing widgets.
# This step prevents duplication or conflicts with previously defined widgets.

# Step 1: Remove all existing widgets
try:
    dbutils.widgets.removeAll()
    logger.log_message("All existing widgets removed successfully.")
except Exception as e:
    logger.log_error(f"Error during widget cleanup: {str(e)}")
    raise RuntimeError(f"Failed to clear widgets: {str(e)}")

### Triton Flow Plans (JSON Example)

In [None]:
# ==============================================================
# Widget Initialization and Test Configuration: Triton Flow Plans (JSON Example)
# ==============================================================

# Purpose:
# Set up widgets for testing the "triton__flow_plans" dataset using JSON files.
# The widgets enable dynamic configuration of key parameters such as file type,
# storage accounts, containers, dataset identifiers, and processing options.

# Step 1: Clear all existing widgets to ensure no duplication or conflict
try:
    dbutils.widgets.removeAll()
    logger.log_message("All existing widgets cleared successfully.")
except Exception as e:
    logger.log_error(f"Error clearing widgets: {str(e)}")
    raise RuntimeError(f"Failed to initialize widgets: {str(e)}")

# Step 2: Initialize File Type dropdown with supported options
dbutils.widgets.dropdown("FileType", "json", ["json", "xlsx", "xml"], "File Type")

# Step 3: Define Source and Destination Storage Accounts
dbutils.widgets.text("SourceStorageAccount", "dplandingstoragetest", "Source Storage Account")
dbutils.widgets.text("DestinationStorageAccount", "dpuniformstoragetest", "Destination Storage Account")

# Step 4: Configure Source Container and Dataset Identifier
dbutils.widgets.text("SourceContainer", "landing", "Source Container")
dbutils.widgets.text("SourceDatasetidentifier", "triton__flow_plans", "Source Datasetidentifier")

# Step 5: Specify Source File Name and Key Columns
dbutils.widgets.text("SourceFileName", "triton__flow_plans-202408*", "Source File Name")
dbutils.widgets.text("KeyColumns", "Guid", "Key Columns")

# Step 6: Set Feedback Column, Flattening Depth Level, and Schema Folder Name
dbutils.widgets.text("FeedbackColumn", "EventTimestamp", "Feedback Column")
dbutils.widgets.text("DepthLevel", "1", "Depth Level")
dbutils.widgets.text("SchemaFolderName", "schemachecks", "Schema Folder Name")

# Log a summary of widget initialization
logger.log_block("Widget Initialization Summary", [
    "FileType: json (default)",
    "SourceStorageAccount: dplandingstoragetest",
    "DestinationStorageAccount: dpuniformstoragetest",
    "SourceContainer: landing",
    "SourceDatasetidentifier: triton__flow_plans",
    "SourceFileName: triton__flow_plans-202408*",
    "KeyColumns: Guid",
    "FeedbackColumn: EventTimestamp",
    "DepthLevel: 1",
    "SchemaFolderName: schemachecks"
])

### CPX SO Nomination (JSON Example)

In [None]:
# ==============================================================
# Widget Initialization and Test Configuration: CPX SO Nomination (JSON Example)
# ==============================================================

# Purpose:
# Set up widgets for testing the "cpx_so__nomination" dataset using JSON files.
# The widgets enable dynamic configuration of critical parameters such as file type,
# storage accounts, containers, dataset identifiers, and processing options.

# Step 1: Clear all existing widgets to ensure no duplication or conflict
try:
    dbutils.widgets.removeAll()
    logger.log_message("All existing widgets cleared successfully.")
except Exception as e:
    logger.log_error(f"Error clearing widgets: {str(e)}")
    raise RuntimeError(f"Failed to initialize widgets: {str(e)}")

# Step 2: Initialize File Type dropdown with supported options
dbutils.widgets.dropdown("FileType", "json", ["json", "xlsx", "xml"], "File Type")

# Step 3: Define Source and Destination Storage Accounts
dbutils.widgets.text("SourceStorageAccount", "dplandingstoragetest", "Source Storage Account")
dbutils.widgets.text("DestinationStorageAccount", "dpuniformstoragetest", "Destination Storage Account")

# Step 4: Configure Source Container and Dataset Identifier
dbutils.widgets.text("SourceContainer", "landing", "Source Container")
dbutils.widgets.text("SourceDatasetidentifier", "cpx_so__nomination", "Source Datasetidentifier")

# Step 5: Specify Source File Name and Key Columns
dbutils.widgets.text("SourceFileName", "cpx_so__nomination-20241127T21*", "Source File Name")
dbutils.widgets.text(
    "KeyColumns", 
    "flows_accountInternal_code, flows_accountExternal_code, flows_location_code, flows_direction, "
    "flows_periods_validityPeriod_begin, flows_periods_validityPeriod_end", 
    "Key Columns"
)

# Step 6: Set Feedback Column, Flattening Depth Level, and Schema Folder Name
dbutils.widgets.text("FeedbackColumn", "dateCreated", "Feedback Column")
dbutils.widgets.text("DepthLevel", "", "Depth Level")
dbutils.widgets.text("SchemaFolderName", "schemachecks", "Schema Folder Name")

# Log a summary of widget initialization
logger.log_block("Widget Initialization Summary", [
    "FileType: json (default)",
    "SourceStorageAccount: dplandingstoragetest",
    "DestinationStorageAccount: dpuniformstoragetest",
    "SourceContainer: landing",
    "SourceDatasetidentifier: cpx_so__nomination",
    "SourceFileName: cpx_so__nomination-20241127T21*",
    "KeyColumns: flows_accountInternal_code, flows_accountExternal_code, flows_location_code, flows_direction, "
    "flows_periods_validityPeriod_begin, flows_periods_validityPeriod_end",
    "FeedbackColumn: dateCreated",
    "DepthLevel: Not Specified",
    "SchemaFolderName: schemachecks"
])

### DDP EM Day-Ahead Flows NEMO (XML Example)

In [None]:
# ==============================================================
# Widget Initialization and Test Configuration: DDP EM Day-Ahead Flows NEMO (XML Example)
# ==============================================================

# Purpose:
# Set up widgets for testing the "ddp_em__dayahead_flows_nemo" dataset using XML files.
# The widgets enable dynamic configuration of critical parameters, including file type,
# storage accounts, containers, dataset identifiers, and XML-specific options.

# Step 1: Clear all existing widgets to ensure no duplication or conflicts
try:
    dbutils.widgets.removeAll()
    logger.log_message("All existing widgets cleared successfully.")
except Exception as e:
    logger.log_error(f"Error clearing widgets: {str(e)}")
    raise RuntimeError(f"Failed to initialize widgets: {str(e)}")

# Step 2: Initialize File Type dropdown with supported options
dbutils.widgets.dropdown("FileType", "xml", ["json", "xlsx", "xml"], "File Type")

# Step 3: Define Source and Destination Storage Accounts
dbutils.widgets.text("SourceStorageAccount", "dplandingstoragetest", "Source Storage Account")
dbutils.widgets.text("DestinationStorageAccount", "dpuniformstoragetest", "Destination Storage Account")

# Step 4: Configure Source Container and Dataset Identifier
dbutils.widgets.text("SourceContainer", "landing", "Source Container")
dbutils.widgets.text("SourceDatasetidentifier", "ddp_em__dayahead_flows_nemo", "Source Datasetidentifier")

# Step 5: Specify Source File Name and Key Columns
dbutils.widgets.text("SourceFileName", "ddp_em__dayahead_flows_nemo-202405*", "Source File Name")
dbutils.widgets.text(
    "KeyColumns", 
    "TimeSeries_mRID, TimeSeries_Period_timeInterval_start, TimeSeries_Period_Point_position", 
    "Key Columns"
)

# Step 6: Set Additional Parameters for Feedback Column, Depth Level, Schema Folder, and XML Root Name
dbutils.widgets.text("FeedbackColumn", "timeseries_timestamp", "Feedback Column")
dbutils.widgets.text("DepthLevel", "1", "Depth Level")
dbutils.widgets.text("SchemaFolderName", "schemachecks", "Schema Folder Name")
dbutils.widgets.text("XmlRootName", "Schedule_MarketDocument", "XML Root Name")

# Log a summary of widget initialization
logger.log_block("Widget Initialization Summary", [
    "FileType: xml (default)",
    "SourceStorageAccount: dplandingstoragetest",
    "DestinationStorageAccount: dpuniformstoragetest",
    "SourceContainer: landing",
    "SourceDatasetidentifier: ddp_em__dayahead_flows_nemo",
    "SourceFileName: ddp_em__dayahead_flows_nemo-202405*",
    "KeyColumns: TimeSeries_mRID, TimeSeries_Period_timeInterval_start, TimeSeries_Period_Point_position",
    "FeedbackColumn: timeseries_timestamp",
    "DepthLevel: 1",
    "SchemaFolderName: schemachecks",
    "XML Root Name: Schedule_MarketDocument"
])

### DDP CM mFRR Settlement (XML Example)

In [None]:
# ==============================================================
# Widget Initialization and Test Configuration: DDP CM mFRR Settlement (XML Example)
# ==============================================================

# Purpose:
# Set up widgets for testing the "ddp_cm__mfrr_settlement" dataset using XML files.
# The widgets enable dynamic configuration of critical parameters, including file type,
# storage accounts, containers, dataset identifiers, and XML-specific options.

# Step 1: Clear all existing widgets to ensure no duplication or conflicts
try:
    dbutils.widgets.removeAll()
    logger.log_message("All existing widgets cleared successfully.")
except Exception as e:
    logger.log_error(f"Error clearing widgets: {str(e)}")
    raise RuntimeError(f"Failed to initialize widgets: {str(e)}")

# Step 2: Initialize File Type dropdown with supported options
dbutils.widgets.dropdown("FileType", "xml", ["json", "xlsx", "xml"], "File Type")

# Step 3: Define Source and Destination Storage Accounts
dbutils.widgets.text("SourceStorageAccount", "dplandingstoragetest", "Source Storage Account")
dbutils.widgets.text("DestinationStorageAccount", "dpuniformstoragetest", "Destination Storage Account")

# Step 4: Configure Source Container and Dataset Identifier
dbutils.widgets.text("SourceContainer", "landing", "Source Container")
dbutils.widgets.text("SourceDatasetidentifier", "ddp_cm__mfrr_settlement", "Source Datasetidentifier")

# Step 5: Specify Source File Name and Key Columns
dbutils.widgets.text("SourceFileName", "*", "Source File Name")
dbutils.widgets.text(
    "KeyColumns", 
    "mRID, TimeSeries_mRID, TimeSeries_Period_timeInterval_start, TimeSeries_Period_Point_position, TimeSeries_Period_resolution", 
    "Key Columns"
)

# Step 6: Set Additional Parameters for Feedback Column, Depth Level, Schema Folder, and XML Root Name
dbutils.widgets.text("FeedbackColumn", "input_file_name", "Feedback Column")
dbutils.widgets.text("DepthLevel", "", "Depth Level")
dbutils.widgets.text("SchemaFolderName", "schemachecks", "Schema Folder Name")
dbutils.widgets.text("XmlRootName", "ReserveAllocationResult_MarketDocument", "XML Root Name")

# Log a summary of widget initialization
logger.log_block("Widget Initialization Summary", [
    "FileType: xml (default)",
    "SourceStorageAccount: dplandingstoragetest",
    "DestinationStorageAccount: dpuniformstoragetest",
    "SourceContainer: landing",
    "SourceDatasetidentifier: ddp_cm__mfrr_settlement",
    "SourceFileName: *",
    "KeyColumns: mRID, TimeSeries_mRID, TimeSeries_Period_timeInterval_start, TimeSeries_Period_Point_position, TimeSeries_Period_resolution",
    "FeedbackColumn: input_file_name",
    "DepthLevel: (none specified)",
    "SchemaFolderName: schemachecks",
    "XML Root Name: ReserveAllocationResult_MarketDocument"
])

### PLUTO PC Units SCADA MW (XLSX Example)

In [None]:
# ==============================================================
# Widget Initialization and Test Configuration: PLUTO PC Units SCADA MW (XLSX Example)
# ==============================================================

# Purpose:
# Configure widgets for testing the "pluto_pc__units_scadamw" dataset using XLSX files.
# These widgets allow dynamic parameterization for file type, storage accounts, sheet name, and key columns.

# Step 1: Clear all existing widgets to ensure no duplication or conflicts
try:
    dbutils.widgets.removeAll()
    logger.log_message("Existing widgets cleared successfully.")
except Exception as e:
    logger.log_error(f"Error clearing widgets: {str(e)}")
    raise RuntimeError(f"Failed to initialize widgets: {str(e)}")

# Step 2: Initialize File Type dropdown with supported options
dbutils.widgets.dropdown("FileType", "xlsx", ["json", "xlsx", "xml"], "File Type")

# Step 3: Define Source and Destination Storage Accounts
dbutils.widgets.text("SourceStorageAccount", "dplandingstoragetest", "Source Storage Account")
dbutils.widgets.text("DestinationStorageAccount", "dpuniformstoragetest", "Destination Storage Account")

# Step 4: Configure Source Container and Dataset Identifier
dbutils.widgets.text("SourceContainer", "landing", "Source Container")
dbutils.widgets.text("SourceDatasetidentifier", "pluto_pc__units_scadamw", "Source Datasetidentifier")

# Step 5: Specify Source File Name and Key Columns
dbutils.widgets.text("SourceFileName", "UnitsSCADAMW.xlsx", "Source File Name")
dbutils.widgets.text("KeyColumns", "Unit_GSRN", "Key Columns")

# Step 6: Specify Additional Parameter for Sheet Name
dbutils.widgets.text("SheetName", "Sheet", "Sheet Name")

# Log a summary of widget initialization
logger.log_block("Widget Initialization Summary", [
    "FileType: xlsx (default)",
    "SourceStorageAccount: dplandingstoragetest",
    "DestinationStorageAccount: dpuniformstoragetest",
    "SourceContainer: landing",
    "SourceDatasetidentifier: pluto_pc__units_scadamw",
    "SourceFileName: UnitsSCADAMW.xlsx",
    "KeyColumns: Unit_GSRN",
    "SheetName: Sheet"
])

## Setup

### Package Installation and Management

In [None]:
# ==============================================================
# Setup: Package Installation and Management
# ==============================================================

# Purpose:
# Manage and install essential Python packages for the Databricks project.
# Ensures compatibility by specifying exact package versions where necessary.
# Includes support for utilities, data processing, and XML/XLSX handling.

# Step 1: Optional - Remove an existing version of the custom utility package
# Uncomment the line below if a previous version of the utility needs to be removed.
# %pip uninstall databricks-custom-utils -y

# Step 2: Install required packages
# The command below installs:
# - Custom Databricks utilities (specific version from GitHub repository).
# - Libraries for SQL parsing, Excel file handling, XML processing, and syntax highlighting.
%pip install \
    git+https://github.com/Open-Dataplatform/utils-databricks.git@v0.7.1 \
    sqlparse \
    openpyxl \
    lxml \
    xmlschema \
    pygments

"""
Package Details:
- `utils-databricks`: Custom utilities for extended functionality in Databricks.
- `sqlparse`: SQL query parsing and formatting library.
- `openpyxl`: Library for handling Excel (XLSX) files.
- `lxml`: Robust library for processing XML and HTML files.
- `xmlschema`: Tools for XML schema validation and conversion.
- `pygments`: Syntax highlighting for code snippets in logs or reports.
"""

### Initialize Logger

In [None]:
# ==============================================================
# Initialize Logger
# ==============================================================

# Purpose:
# Set up a custom logger for detailed logging and debugging throughout the notebook.
# The logger offers advanced features, including:
# - Debug-level logging for in-depth insights during execution.
# - Block-style logging for structured, readable logs.
# - Syntax highlighting for SQL queries and Python code in logs.

# Step 1: Import the Logger class from the custom utilities package
from custom_utils.logging.logger import Logger

# Step 2: Initialize the Logger instance
# - `debug=True` enables detailed logs, useful for troubleshooting and analysis.
logger = Logger(debug=True)

# Log the initialization success
logger.log_message("Logger initialized successfully.")

### Initialize Notebook and Retrieve Parameters

In [None]:
# ==============================================================
# Initialize Notebook and Retrieve Parameters
# ==============================================================

# Purpose:
# Set up the notebook by initializing its configuration and retrieving essential parameters.
# This ensures centralized management of settings and enables efficient debugging
# through a consistent configuration framework.

# Step 1: Import the Config class from the custom utilities package
from custom_utils.config.config import Config

# Step 2: Initialize the Config object
# - Pass `dbutils` for accessing Databricks workspace resources.
# - Set `debug=False` to disable verbose debug logs for cleaner execution.
config = Config.initialize(dbutils=dbutils, debug=False)

# Step 3: Unpack configuration parameters
# - Extracts configuration values into the notebook's global scope.
# - This simplifies access to parameters by making them available as standard variables.
config.unpack(globals())

### Verify paths and files

In [None]:
# ==============================================================
# Verify Paths and Files
# ==============================================================

# Purpose:
# Validate the required paths and files to ensure all necessary resources 
# are available for processing. This pre-check prevents runtime errors 
# by identifying and addressing issues early in the notebook execution.

# Step 1: Import the Validator class from the custom utilities package
from custom_utils.validation.validation import Validator

# Step 2: Initialize the Validator
# - Pass `config` to access path and file parameters from the configuration.
# - Set `debug=False` for standard validation logging without verbose output.
validator = Validator(config=config, debug=False)

# Step 3: Unpack validation parameters
# - Extracts validation-related parameters into the notebook's global scope.
validator.unpack(globals())

# Step 4: Perform validation and check for an exit flag
# - If critical validation fails, the notebook execution is terminated.
validator.check_and_exit()


### Exit the Notebook if Validation Fails

In [None]:
# ==============================================================
# Exit the Notebook if Validation Fails
# ==============================================================

# Purpose:
# Stop notebook execution gracefully if critical validation checks fail.
# If validation passes, continue processing with a confirmation message.

# Step 1: Check for an exit condition flagged by the Validator
if Validator.exit_notebook:
    # Step 2: Log the exit message using the logger
    # - Provides context on why the notebook execution is being terminated.
    logger.log_error(Validator.exit_notebook_message, level="error")
    
    # Step 3: Exit the notebook with a descriptive message
    # - Uses Databricks utilities to terminate execution cleanly.
    dbutils.notebook.exit(f"Notebook exited: {Validator.exit_notebook_message}")
else:
    # Step 4: Log a success message if validation passed
    # - Confirms the notebook will continue execution.
    logger.log_message("Validation passed. The notebook is proceeding without exiting.", level="info")

## Processing Workflow

### Flattening and Processing

In [None]:
# ==============================================================
# Processing Workflow - Flattening and Processing
# ==============================================================

# Purpose:
# This section executes the core data processing workflow, which includes:
# - Flattening complex hierarchical data for simplified querying and analysis.
# - Applying dataset-specific transformations to align with business requirements.

from pyspark.sql.functions import col
from custom_utils.transformations.dataframe import DataFrameTransformer

# Initialize the DataFrameTransformer
# - Uses the current configuration and disables debug mode for standard operation.
transformer = DataFrameTransformer(config=config, debug=False)

try:
    # Step 1: Process and flatten the data
    # - Produces both the initial DataFrame and its flattened version.
    df_initial, df_flattened = transformer.process_and_flatten_data(depth_level=depth_level)

    # Step 2: Apply dataset-specific transformations (if applicable)
    # Triton flow plans: Rename and cast the "Timestamp" column to "EventTimestamp"
    if config.source_datasetidentifier == "triton__flow_plans":
        df_flattened = (
            df_flattened
            .withColumn("Timestamp", col("Timestamp").cast("timestamp"))
            .withColumnRenamed("Timestamp", "EventTimestamp")
        )
        logger.log_message("Applied transformations for 'triton__flow_plans'.", level="info")

    # CPX SO nomination: Cast relevant fields to timestamp for consistency
    if config.source_datasetidentifier == "cpx_so__nomination":
        df_flattened = (
            df_flattened
            .withColumn("dateCreated", col("dateCreated").cast("timestamp"))
            .withColumn("validityPeriod_begin", col("validityPeriod_begin").cast("timestamp"))
            .withColumn("validityPeriod_end", col("validityPeriod_end").cast("timestamp"))
            .withColumn("flows_periods_validityPeriod_begin", col("flows_periods_validityPeriod_begin").cast("timestamp"))
            .withColumn("flows_periods_validityPeriod_end", col("flows_periods_validityPeriod_end").cast("timestamp"))
        )
        logger.log_message("Applied transformations for 'cpx_so__nomination'.", level="info")

    # Step 3: Display the initial and flattened DataFrames for user verification
    logger.log_block("Displaying the initial and flattened DataFrames.", level="info")
    logger.log_message("Initial DataFrame:", level="info")
    display(df_initial)

    logger.log_message("Flattened DataFrame:", level="info")
    display(df_flattened)

except Exception as e:
    # Step 4: Handle errors gracefully
    # - Logs the error details for debugging and terminates the process.
    logger.log_message(f"Error during processing: {str(e)}", level="error")
    dbutils.notebook.exit(f"Processing failed: {str(e)}")

## Quality check 

### Perform Quality Check and Remove Duplicates

In [None]:
# ==============================================================
# Quality Check - Perform Quality Check and Remove Duplicates
# ==============================================================

# Purpose:
# This section performs data quality checks to ensure:
# - The integrity, accuracy, and consistency of the processed data.
# - Duplicate records are identified and optionally removed.
# - Additional quality checks (e.g., null value checks, value range checks) are executed.

from custom_utils.quality.quality import DataQualityManager

# Step 1: Initialize the DataQualityManager
quality_manager = DataQualityManager(logger=logger, debug=True)

# Step 2: Log available quality checks
# This provides an overview of checks supported by the quality manager.
quality_manager.describe_available_checks()

# Step 3: Execute data quality checks on the flattened DataFrame
try:
    # Perform quality checks with the following configurations:
    # - Partitioning and duplicate checks are based on `key_columns`.
    # - Duplicate removal uses `feedback_column` for ordering within partitions.
    # - Exclude the `input_file_name` column from the final processed DataFrame.
    cleaned_data_view = quality_manager.perform_data_quality_checks(
        spark=spark,
        df=df_flattened,
        key_columns=key_columns,  # Key columns for partitioning and duplicate checking
        order_by=feedback_column,  # Columns for ordering within partitions
        feedback_column=feedback_column,  # Column for duplicate removal ordering
        join_column=key_columns,  # Column for referential integrity check
        columns_to_exclude=["input_file_name"],  # Columns to exclude from final DataFrame
        use_python=False  # Use SQL-based quality checks
    )

except Exception as e:
    # Handle any errors during the quality check process
    logger.log_error(f"Error during quality check: {str(e)}")
    raise RuntimeError(f"Quality check failed: {str(e)}")

## Unified Data Management

### Table Creation and Data Merging

In [None]:
# ==============================================================
# Unified Data Management: Table Creation and Data Merging
# ==============================================================

# Purpose:
# This section handles the creation of destination tables and merges
# processed data into the respective storage location. It ensures:
# - Data is written to a unified storage with consistent formatting.
# - Merging supports updates, inserts, and deletions seamlessly.
# - Storage operations are managed efficiently with robust logging.

from custom_utils.catalog.catalog_utils import DataStorageManager

# Step 1: Initialize the DataStorageManager
storage_manager = DataStorageManager(logger=logger, debug=True)

# Step 2: Perform the data storage operation
try:
    # Manage data operation to handle:
    # - Writing data to a destination folder.
    # - Creating a table if it does not exist.
    # - Merging processed data into the existing table.
    storage_manager.manage_data_operation(
        spark=spark,
        dbutils=dbutils,
        cleaned_data_view=cleaned_data_view,  # The view containing cleaned and transformed data
        key_columns=key_columns,  # Columns used to match and merge records
        destination_folder_path=destination_data_folder_path,  # Path for storing Delta files
        destination_environment=destination_environment,  # Database name
        source_datasetidentifier=source_datasetidentifier,  # Target table name
        use_python=False  # False indicates SQL-based operations
    )

    # Log success
    logger.log_message("Data successfully written and merged into the destination table.")

except Exception as e:
    # Handle any errors during the data storage process
    logger.log_error(f"Error during data storage operation: {str(e)}")
    raise RuntimeError(f"Data storage operation failed: {str(e)}")

## Finishing

### Return period (from_datetime, to_datetime) covered by data read

In [None]:
# ==============================================================
# Finishing - Return Period Covered by Data Read
# ==============================================================

# Purpose:
# This section generates the feedback timestamps, providing the time 
# period covered by the processed and stored data. It extracts the 
# `from_datetime` and `to_datetime` based on the cleaned data view.

# Step 1: Generate feedback timestamps
try:
    notebook_output = storage_manager.generate_feedback_timestamps(
        spark=spark,
        view_name=cleaned_data_view,  # The view containing cleaned and processed data
        feedback_column=feedback_column,  # Column used for identifying feedback periods
        key_columns=key_columns  # Key columns for grouping and extracting timestamp bounds
    )

except Exception as e:
    # Handle errors during feedback timestamp generation
    logger.log_error(f"Error generating feedback timestamps: {str(e)}")
    raise RuntimeError(f"Failed to generate feedback timestamps: {str(e)}")

## Exit the notebook

In [None]:
# ==============================================================
# Exit the Notebook
# ==============================================================

# Purpose:
# Conclude the notebook execution and return the output summarizing 
# the period covered by the processed data.

# Step 1: Exit the notebook with the output
try:
    dbutils.notebook.exit(notebook_output)
    logger.log_message(f"Notebook exited successfully with output: {notebook_output}")

except Exception as e:
    logger.log_error(f"Error during notebook exit: {str(e)}")
    raise RuntimeError(f"Failed to exit the notebook: {str(e)}")