The Anatomy of a Custom Component

To be recognized and used by a Haystack Pipeline, a custom component class must adhere to a simple contract 22:

- The `@component` Decorator: This class decorator is mandatory. It registers the class with the Haystack framework, signaling that it can be instantiated and added to a pipeline.
- A `run()` Method: Every component must have a `run()` method. This is the entry point for the component's logic. The parameters of the run() method automatically become the component's input sockets.
- The` @component.output_types(...)` Decorator: This method decorator is used on the `run()` method to declare the names and data types of the component's output sockets. This information is crucial for the pipeline's validation logic.
- A Dictionary Return Value: The `run()` method must return a Python dictionary. The keys of this dictionary must exactly match the output socket names declared in the `@component.output_types decorator`.


In [2]:
# Import necessary classes from Haystack
from typing import List
from haystack import component
from haystack.dataclasses import Document

# 1. Apply the @component decorator to the class
@component
class Prefixer:
    """
    A custom component that adds a specified prefix to the content of each Document.
    """

    # The __init__ method is standard Python. It's used to initialize the component's state.
    # In this simple case, it's not strictly necessary, but it's good practice to include.
    def __init__(self):
        pass

    # 3. Apply the @component.output_types decorator to the run method
    # This declares that our component has one output socket named 'documents',
    # and it will produce a list of Document objects.
    @component.output_types(documents=List)
    def run(self, documents: List, prefix: str):
        """
        The main logic of the component.
        
        :param documents: A list of Document objects to be processed.
        :param prefix: The string prefix to add to each document's content.
        :return: A dictionary containing the list of modified documents.
        """
        
        # 2. The parameters 'documents' and 'prefix' become the input sockets.
        
        modified_documents = [
            Document(
                content=f"{prefix}{doc.content}",
                meta=doc.meta
            ) for doc in documents
        ]
        for doc in documents:
            # Create a new Document to avoid modifying the original in place
            new_doc = Document(
                content=f"{prefix}{doc.content}",
                meta=doc.meta
            )
            modified_documents.append(new_doc)
            
        # 4. Return a dictionary where the key matches the declared output socket name.
        return {"documents": modified_documents}



In [None]:
# --- Using the Custom Component Stand-Alone ---

# Create an instance of our new component
prefixer_inst = Prefixer()

# Prepare some input data
docs_to_prefix = Document(content="This is the first document.")

prefix_string = "NOTE: "

# Run the component directly
result_standalone = prefixer_inst.run(documents=docs_to_prefix, prefix=prefix_string)

print("--- Stand-Alone Usage ---")
print(f"Original content: '{docs_to_prefix.content}'")
print(f"Prefixed content: '{result_standalone['documents'].content}'")



TypeError: 'Document' object is not iterable

In [None]:
# --- Using the Custom Component in a Pipeline ---

from haystack import Pipeline

# Create a new pipeline
prefix_pipeline = Pipeline()

# Add our custom component instance to the pipeline
prefix_pipeline.add_component(name="text_prefixer", instance=prefixer_inst)

# Visualize the pipeline (it will just be a single node)
prefix_pipeline.draw("prefix_pipeline.png")
print("\nPipeline visualization saved to 'prefix_pipeline.png'")

# Run the pipeline
pipeline_run_data = {
    "text_prefixer": {
        "documents": docs_to_prefix,
        "prefix": "PIPELINE SAYS: "
    }
}
result_pipeline = prefix_pipeline.run(pipeline_run_data)

print("\n--- Pipeline Usage ---")
print(f"Pipeline output content: '{result_pipeline['text_prefixer']['documents'].content}'")


