🔧 **Setup Required**: Before running this notebook, please follow the [setup instructions](../README.md#setup-instructions) to configure your environment and API keys.

# Introduction to Custom Haystack Components

Welcome to your first hands-on experience with building custom components in Haystack! This notebook will guide you through the fundamental concepts and practical implementation of custom components.

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Understand the anatomy** of a Haystack custom component
2. **Implement the @component decorator** and its key methods
3. **Define input and output types** for your components
4. **Create reusable, modular components** that integrate seamlessly with Haystack pipelines
5. **Test and validate** your custom components before pipeline integration

## What Are Custom Components?

Custom components are the building blocks that allow you to extend Haystack's functionality beyond the pre-built components. They enable you to:

- **Implement domain-specific logic** tailored to your use case
- **Create reusable modules** that can be shared across different pipelines
- **Integrate third-party services** or custom algorithms
- **Transform data** in ways not covered by existing components

## Prerequisites

- Basic Python programming knowledge
- Understanding of Haystack's pipeline architecture
- Familiarity with type hints in Python

## The Anatomy of a Custom Component

### Core Concepts

Every Haystack custom component follows a specific pattern:

1. **Class Definition**: A Python class that encapsulates your component's functionality
2. **@component Decorator**: Marks the class as a Haystack component
3. **__init__ Method**: Initializes component state and parameters
4. **run Method**: Contains the main processing logic
5. **Input/Output Declaration**: Defines what data flows in and out

### Why Build Custom Components?

While Haystack provides many pre-built components, custom components are essential when you need to:

- **Process data in unique ways** not covered by existing components
- **Integrate with proprietary systems** or APIs
- **Implement business-specific logic** that's reusable across pipelines
- **Create specialized transformations** for your domain

### The Example: Prefixer Component

We'll build a simple but illustrative component called `Prefixer` that:
- **Takes a list of documents** as input
- **Adds a custom prefix** to each document's content
- **Returns the modified documents** while preserving metadata
- **Demonstrates all key component concepts** in a clear, understandable way

## Step-by-Step Component Implementation

Let's break down each part of our custom component implementation to understand the key concepts:

### 1. Essential Imports

```python
from typing import List
from haystack import component
from haystack.dataclasses import Document
```

**Why these imports?**
- `typing.List`: Provides type hints for better code clarity and IDE support
- `haystack.component`: The decorator that makes your class a Haystack component
- `haystack.dataclasses.Document`: The standard data structure for text in Haystack

### 2. The @component Decorator

This decorator is **mandatory** for all custom components. It:
- **Registers your class** with Haystack's component system
- **Enables pipeline integration** and connection capabilities
- **Provides runtime validation** of inputs and outputs
- **Handles serialization** for pipeline persistence

### 3. Component Architecture Pattern

```
Input Sockets → Processing Logic → Output Sockets
```

- **Input Sockets**: Parameters of the `run()` method
- **Processing Logic**: The transformation you want to apply
- **Output Sockets**: Keys in the returned dictionary

In [None]:
from typing import List
from haystack import component
from haystack.dataclasses import Document
from typing import List

@component
class Prefixer:
    """
    A custom component that adds a specified prefix to the content of each Document.
    
    This component demonstrates the fundamental patterns of Haystack custom components:
    - Input socket definition through method parameters
    - Output socket declaration via @component.output_types
    - Immutable data processing (creating new objects, not modifying existing ones)
    - Metadata preservation during transformation
    """

    def __init__(self):
        """
        Initialize the component.
        
        The __init__ method is standard Python. While not strictly necessary for this
        simple component, it's good practice to include it for:
        - Setting up component configuration
        - Initializing any required state
        - Preparing resources or connections
        """
        pass

    @component.output_types(documents=List[Document])
    def run(self, documents: List[Document], prefix: str) -> dict:
        """
        The main processing logic of the component.
        
        This method defines the component's input and output sockets:
        - Input sockets: 'documents' and 'prefix' (from method parameters)
        - Output socket: 'documents' (declared in @component.output_types)
        
        Args:
            documents: A list of Document objects to be processed
            prefix: The string prefix to add to each document's content
            
        Returns:
            A dictionary with the key 'documents' containing the modified documents
        """
        
        # Process each document, creating new Document objects to avoid side effects
        modified_documents = []
        
        for doc in documents:
            # Create a new Document to avoid modifying the original in place
            # This is a best practice for component design
            new_doc = Document(
                content=f"{prefix}{doc.content}",
                meta=doc.meta.copy() if doc.meta else {}  # Preserve and copy metadata
            )
            modified_documents.append(new_doc)
            
        # Return a dictionary where the key matches the declared output socket name
        # This is how Haystack knows which output to connect to which input in pipelines
        return {"documents": modified_documents}

### Understanding the Code

Let's examine each part of our `Prefixer` component implementation:

#### Key Components Breakdown:

1. **Class Declaration**: `@component` decorator transforms a regular Python class into a Haystack component
2. **Initialization**: The `__init__` method sets up any component state (none needed here, but good practice)
3. **Output Type Declaration**: `@component.output_types(documents=List)` tells Haystack what this component produces
4. **Main Logic**: The `run` method contains the actual processing logic
5. **Return Format**: Must return a dictionary where keys match declared output socket names

#### Critical Design Patterns:

- **Immutable Processing**: We create new `Document` objects rather than modifying originals
- **Metadata Preservation**: The original document metadata is copied to new documents
- **Type Safety**: Input and output types are clearly declared for validation
- **Clean Interface**: Simple, focused functionality that does one thing well

In [9]:
# Create test documents with content and metadata
documents = [
    Document(content="This is the first document.", meta={"id": 1, "source": "test"}),
    Document(content="This is the second document.", meta={"id": 2, "source": "test"}),
]

# Instantiate our custom component
prefixer_instance = Prefixer()

# Test the component with a sample prefix
result = prefixer_instance.run(documents=documents, prefix=">> ")

print("\nDocument Transformations:")
for i, (original, modified) in enumerate(zip(documents, result['documents'])):
    print(f"\nDocument {i+1}:")
    print(f"  Original: '{original.content}'")
    print(f"  Modified: '{modified.content}'")
    print(f"  Metadata preserved: {original.meta == modified.meta}")
    print(f"  Original unchanged: {original.content == 'This is the first document.' if i == 0 else 'This is the second document.'}")

print("\n✨ Component test completed successfully!")


Document Transformations:

Document 1:
  Original: 'This is the first document.'
  Modified: '>> This is the first document.'
  Metadata preserved: True
  Original unchanged: True

Document 2:
  Original: 'This is the second document.'
  Modified: '>> This is the second document.'
  Metadata preserved: True
  Original unchanged: This is the second document.

✨ Component test completed successfully!


## Using Components in Pipelines

### Pipeline Integration

Now that we've tested our component standalone, let's see how it integrates into a Haystack pipeline. This is where custom components truly shine!

### Pipeline Architecture

```
Document Store → Prefixer → Output
```

The beauty of custom components is that they can be seamlessly connected with other Haystack components to create complex processing workflows.

In [10]:
from haystack import Pipeline

# Create a simple pipeline with our custom component
pipeline = Pipeline()

# Add our custom component to the pipeline
pipeline.add_component("prefixer", Prefixer())

print("🔧 Pipeline Setup")
print("=" * 30)
print(f"✅ Components in pipeline: {list(pipeline.graph.nodes.keys())}")

# Run the pipeline
pipeline_result = pipeline.run(
    {
        "prefixer": {
            "documents": documents,
            "prefix": "[PROCESSED] "
        }
    }
)

print(f"\n🚀 Pipeline Execution Results:")
print(f"✅ Pipeline completed successfully")
print(f"✅ Output documents: {len(pipeline_result['prefixer']['documents'])}")

print(f"\n📄 Pipeline Output Preview:")
for i, doc in enumerate(pipeline_result['prefixer']['documents'][:2]):
    print(f"  Document {i+1}: '{doc.content[:50]}...'")

print(f"\n💡 Key Insight: Your custom component works seamlessly in pipelines!")

🔧 Pipeline Setup
✅ Components in pipeline: ['prefixer']

🚀 Pipeline Execution Results:
✅ Pipeline completed successfully
✅ Output documents: 4

📄 Pipeline Output Preview:
  Document 1: '[PROCESSED] This is the first document....'
  Document 2: '[PROCESSED] This is the second document....'

💡 Key Insight: Your custom component works seamlessly in pipelines!
