## Creating custom components that incorporate error handling

To implement a custom component in Haystack that emphasizes robust error handling, logging exceptions, and incorporates a retry mechanism for recoverable errors, we'll design a `RobustProcessorComponent`. This component will attempt to process data, catch and log specific exceptions, and retry processing under certain conditions using exponential backoff.

First, include necessary imports for the component, logging, and time (for sleep during retries).


In [1]:
from haystack import component
import logging
import time
import random

# Initialize logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


Step 2: Define the RobustProcessorComponent
This component will simulate data processing and include mechanisms for error handling and retries with exponential backoff.

In [2]:
@component
class RobustProcessorComponent():

    @component.output_types(processed_data=str, status=str)
    def run(self, data: str) -> dict:
        """
        Processes data with error handling and retry mechanism.
        """
        max_retries = 3
        backoff_factor = 2
        retry_count = 0

        while retry_count <= max_retries:
            try:
                # Simulate data processing that may fail
                if random.random() < 0.5:  # 50% chance of simulated failure
                    raise ConnectionError("Simulated connection error")

                # Simulated successful processing
                processed_data = f"Processed: {data}"
                return {"processed_data": processed_data, "status": "success"}

            except ConnectionError as e:
                retry_count += 1
                sleep_time = backoff_factor ** retry_count
                logger.warning(f"Retry {retry_count}/{max_retries} after error: {e}. Retrying in {sleep_time} seconds.")
                time.sleep(sleep_time)
            except Exception as e:
                logger.error(f"Unexpected error processing data '{data}': {e}")
                return {"processed_data": "", "status": "error"}

        return {"processed_data": "", "status": "retry_limit_exceeded"}



Step 3: Best Practices in Error Handling

* Specific Exceptions: The component distinguishes between ConnectionError for retryable errors and a generic Exception for non-retryable, unexpected errors. This ensures that only specific, known recoverable errors trigger the retry logic.
* Error Logging: It logs warnings for retryable errors with sufficient context (including retry count and backoff time) and errors for unexpected exceptions, providing insight into the issues encountered without stopping the pipeline.
* Retry Mechanism: Implements a simple exponential backoff mechanism for retryable errors, increasing the wait time between retries to mitigate issues like temporary network problems.
* Error Propagation: For unrecoverable errors or when the retry limit is exceeded, the component returns a status indicating the failure, allowing the calling code or subsequent components in the pipeline to handle it as needed.

Usage example

In [3]:
if __name__ == "__main__":
    robust_processor = RobustProcessorComponent()

    # Simulate processing data with potential retries for recoverable errors
    result = robust_processor.run(data="Example data")
    logger.info(f"Processing result: {result}")


INFO:__main__:Processing result: {'processed_data': 'Processed: Example data', 'status': 'success'}
