🔧 **Setup Required**: Before running this notebook, please follow the [setup instructions](../README.md#setup-instructions) to configure your environment and API keys.

## Building custom components with Haystack

Whereas the Haystack library provides a wide range of pre-built components, it is also possible to build custom components. This notebook demonstrates how to build a custom component for Haystack.

The custom component we will build is a simple one: a component that takes a list of strings as input and returns the number of words in each string. This is a simple example, but it demonstrates the basic principles of building a custom component.

In [4]:
from typing import List
from haystack import component
from haystack.dataclasses import Document
from typing import List

@component
class Prefixer:
    """
    A custom component that adds a specified prefix to the content of each Document.
    """

    # The __init__ method is standard Python. It's used to initialize the component's state.
    # In this simple case, it's not strictly necessary, but it's good practice to include.
    def __init__(self):
        pass

    # Apply the @component.output_types decorator to the run method
    # This declares that our component has one output socket named 'documents',
    # and it will produce a list of Document objects.
    @component.output_types(documents=List)
    def run(self, documents: List, prefix: str):
        """
        The main logic of the component.
        
        :param documents: A list of Document objects to be processed.
        :param prefix: The string prefix to add to each document's content.
        :return: A dictionary containing the list of modified documents.
        """
        
        # The parameters 'documents' and 'prefix' become the input sockets.
        
        modified_documents = [
            Document(
                content=f"{prefix}{doc.content}",
                meta=doc.meta
            ) for doc in documents
        ]
        for doc in documents:
            # Create a new Document to avoid modifying the original in place
            new_doc = Document(
                content=f"{prefix}{doc.content}",
                meta=doc.meta
            )
            modified_documents.append(new_doc)
            
        # 4. Return a dictionary where the key matches the declared output socket name.
        return {"documents": modified_documents}


In [7]:
documents = [
    Document(content="This is the first document.", meta={"id": 1}),
    Document(content="This is the second document.", meta={"id": 2}),
]
prefixer_instance = Prefixer()
prefixer_instance.run(documents=documents, prefix=">> ")

{'documents': [Document(id=16d755955d6788aa82081377a1cc7960bf7aacea3e26994724213b85b76c4aae, content: '>> This is the first document.', meta: {'id': 1}),
  Document(id=7e39b6b9ece41dae104b2745cd33e661fbd64a8e7edc629cfd32d2a85a960cf9, content: '>> This is the second document.', meta: {'id': 2}),
  Document(id=16d755955d6788aa82081377a1cc7960bf7aacea3e26994724213b85b76c4aae, content: '>> This is the first document.', meta: {'id': 1}),
  Document(id=7e39b6b9ece41dae104b2745cd33e661fbd64a8e7edc629cfd32d2a85a960cf9, content: '>> This is the second document.', meta: {'id': 2})]}