## Creating custom components that incorporate scalability considerations


First, include necessary libraries for asynchronous programming and potential multiprocessing use.



In [1]:
from haystack import component
import asyncio
import logging

# Initialize logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("ScalableTextProcessor")


Step 2: Define the ScalableTextProcessor Component

This component will process text data asynchronously, demonstrating a stateless design and concurrency. It will also be designed with memory efficiency in mind.


In [2]:
@component
class ScalableTextProcessor():

    @component.output_types(processed_text=str, status=str)
    async def run(self, text: str) -> dict:
        """
        Processes text data asynchronously, demonstrating statelessness and concurrency.
        """
        try:
            # Simulate an I/O-bound operation, such as fetching data from a database or an API
            processed_text = await self.async_process_text(text)
            return {"processed_text": processed_text, "status": "success"}
        except Exception as e:
            logger.error(f"Failed to process text due to: {str(e)}")
            return {"processed_text": "", "status": "error"}

    async def async_process_text(self, text: str) -> str:
        """
        An example async function that simulates text processing.
        """
        await asyncio.sleep(1)  # Simulate an I/O operation
        return text.upper()  # Example processing


Step 3: Implementing Best Practices

* Stateless Design: The ScalableTextProcessor component does not rely on any internal state between invocations, making it inherently scalable and suitable for horizontal scaling.

* Concurrency and Parallelism: By using `asyncio`, this component can handle multiple text processing tasks concurrently, improving throughput for I/O-bound operations.

* Memory Management: This example keeps memory usage minimal by avoiding large temporary data structures and focusing on processing each piece of text independently.

* Scalability Testing: While not demonstrated directly in the code, it's crucial to test this component under various load scenarios. Tools like locust for load testing or Python's cProfile for performance profiling can help identify and address bottlenecks.

Usage Example

To use this component within an async environment:

In [4]:
import asyncio

async def main():
    processor = ScalableTextProcessor()
    result = await processor.run(text="Hello, world!")
    print(result)

# Use the existing event loop instead of asyncio.run()
loop = asyncio.get_event_loop()

# If the loop is already running, this approach avoids the RuntimeError
if not loop.is_running():
    loop.run_until_complete(main())
else:
    # Directly schedule the coroutine to be run on the existing event loop
    await main()  # This 'await' is only valid in an async environment like an async cell in Jupyter



{'processed_text': 'HELLO, WORLD!', 'status': 'success'}


This `ScalableTextProcessor` component showcases how to design custom components in Haystack with a focus on robustness and scalability, incorporating best practices such as stateless design, concurrency, and efficient memory usage.

