# Building and Using Supercomponents in Haystack
This notebook demonstrates how to create, use, and integrate supercomponents in Haystack pipelines. Each step is explained to help you understand the design and application of supercomponents for modular, reusable workflows.

## 1. Defining a Supercomponent
A supercomponent is a class decorated with `@super_component` that encapsulates an internal pipeline of components. This allows you to bundle complex logic into a single reusable unit.

In [1]:
from haystack import Pipeline, super_component
from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
from haystack.dataclasses import Document
from typing import List

# 1. Apply the @super_component decorator to the class
@super_component
class TextProcessor:
    """
    A Supercomponent that encapsulates a text processing pipeline,
    which cleans and then splits documents.
    """

    def __init__(self, clean: bool = True, split_by: str = "word", split_length: int = 150, split_overlap: int = 20):
        """
        Initializes the TextProcessor and its internal pipeline.
        
        :param clean: Whether to run the DocumentCleaner.
        :param split_by: The unit to split by ('word', 'sentence', etc.).
        :param split_length: The length of each split.
        :param split_overlap: The overlap between splits.
        """
        
        # 2. Create a Pipeline instance and assign it to self.pipeline
        self.pipeline = Pipeline()
        
        # Add the internal components to the pipeline
        if clean:
            self.pipeline.add_component("cleaner", DocumentCleaner(
                                                remove_empty_lines=True, 
                                                remove_extra_whitespaces=True
            ))
            self.pipeline.add_component("splitter", DocumentSplitter(split_by=split_by, 
                                                                     split_length=split_length, 
                                                                     split_overlap=split_overlap))
            # Connect the internal components
            self.pipeline.connect("cleaner.documents", "splitter.documents")
        else:
            # If cleaning is disabled, the splitter is the only component
            self.pipeline.add_component("splitter", DocumentSplitter(split_by=split_by, 
                                                                     split_length=split_length, 
                                                                     split_overlap=split_overlap))

    # The run method is implicitly handled by the @super_component decorator.
    # It will automatically expose the input and output sockets of the internal pipeline.
    # Input sockets of the first component(s) become the supercomponent's inputs.
    # Output sockets of the last component(s) become the supercomponent's outputs.
    
    # We can define an `input_types` method to tell the supercomponent what to expect.
    def input_types(self):
        return {"documents": List}

    # And an `output_types` method to declare what it will produce.
    def output_types(self):
        return {"documents": List}


## 2. Using the Supercomponent
You can use your supercomponent just like any other Haystack component. Instantiate it, provide input data, and call its `run()` method. The internal pipeline handles the processing steps.

In [3]:
# --- Using the Supercomponent ---

# Create an instance of our new supercomponent
text_processor_inst = TextProcessor(clean=True,
                                    split_by="sentence", 
                                    split_length=2, 
                                    split_overlap=0)

# Prepare some messy input documents
messy_docs = [
    Document(content="This is     the first document!!! It has some messy text..."),
    Document(content="Here's    the second document??? It also needs cleaning!!!"),
]

# Run the supercomponent just like a regular component
# The input socket 'documents' is automatically mapped to the 'cleaner' component's input.
result = text_processor_inst.run(documents=messy_docs)

# The output socket 'documents' is automatically mapped from the 'splitter' component's output.
processed_docs = result["documents"]

print("--- Supercomponent Usage ---")
print(f"Processed {len(processed_docs)} documents (chunks):")
for doc in processed_docs:
    print(f"- '{doc.content}' (meta: {doc.meta})")

--- Supercomponent Usage ---
Processed 3 documents (chunks):
- 'This is the first document!!! It has some messy text...' (meta: {'source_id': '410afbde50d05e8ba0d775d12e2965f3ee700c2f90a434e4032dd6e366435776', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0})
- 'Here's the second document??? It also needs cleaning!!' (meta: {'source_id': '4b46b073d7a51a6bd293fdc10f5bed67fb70bce6004a50ac874460c7fdd6de02', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0})
- '!' (meta: {'source_id': '4b46b073d7a51a6bd293fdc10f5bed67fb70bce6004a50ac874460c7fdd6de02', 'page_number': 1, 'split_id': 1, 'split_idx_start': 54})


## 3. Integrating Supercomponents in Larger Pipelines
Supercomponents can be combined with other components (custom or built-in) in larger pipelines. This enables modular, hierarchical pipeline design for complex workflows.

In [4]:
# --- Using the Supercomponent in a Larger Pipeline ---

# Let's imagine a larger pipeline that also uses our custom Prefixer from the previous section
from haystack import component

@component
class Prefixer:
    @component.output_types(documents=List)
    def run(self, documents: List, prefix: str):
        modified_documents = [
            Document(content=f"{prefix}{doc.content}", meta=doc.meta) for doc in documents
        ]
        return {"documents": modified_documents}

# Create a larger pipeline
full_pipeline = Pipeline()

# Add our supercomponent and our custom component
full_pipeline.add_component("processor", text_processor_inst)
full_pipeline.add_component("prefixer", Prefixer())

# Connect them together
full_pipeline.connect("processor.documents", "prefixer.documents")

# Visualize the full pipeline
full_pipeline.draw(path="./images/full_pipeline_with_supercomponent.png")
print("\nFull pipeline visualization saved to 'full_pipeline_with_supercomponent.png'")

# Run the full pipeline
full_pipeline_result = full_pipeline.run({
    "processor": {"documents": messy_docs},
    "prefixer": {"prefix": "PROCESSED: "}
})

print("\n--- Full Pipeline with Supercomponent Result ---")
for doc in full_pipeline_result["prefixer"]["documents"]:
    print(f"- '{doc.content}'")



Full pipeline visualization saved to 'full_pipeline_with_supercomponent.png'

--- Full Pipeline with Supercomponent Result ---
- 'PROCESSED: This is the first document!!! It has some messy text...'
- 'PROCESSED: Here's the second document??? It also needs cleaning!!'
- 'PROCESSED: !'


![](./images/full_pipeline_with_supercomponent.png)

---
## Summary and Next Steps
You have now created and used a supercomponent, and integrated it into a larger Haystack pipeline. Try building your own supercomponents to encapsulate reusable logic for your projects!