## Building custom components with Haystack

Whereas the Haystack library provides a wide range of pre-built components, it is also possible to build custom components. This notebook demonstrates how to build a custom component for Haystack.

The custom component we will build is a simple one: a component that takes a list of strings as input and returns the number of words in each string. This is a simple example, but it demonstrates the basic principles of building a custom component.

In [1]:
!pip install haystack-ai

Collecting haystack-ai
  Downloading haystack_ai-2.3.1-py3-none-any.whl.metadata (13 kB)
Collecting haystack-experimental (from haystack-ai)
  Downloading haystack_experimental-0.1.1-py3-none-any.whl.metadata (6.9 kB)
Collecting lazy-imports (from haystack-ai)
  Downloading lazy_imports-0.3.1-py3-none-any.whl.metadata (10 kB)
Collecting openai>=1.1.0 (from haystack-ai)
  Downloading openai-1.37.1-py3-none-any.whl.metadata (22 kB)
Collecting posthog (from haystack-ai)
  Downloading posthog-3.5.0-py2.py3-none-any.whl.metadata (2.0 kB)
Collecting httpx<1,>=0.23.0 (from openai>=1.1.0->haystack-ai)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting monotonic>=1.5 (from posthog->haystack-ai)
  Downloading monotonic-1.6-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting backoff>=1.10.0 (from posthog->haystack-ai)
  Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai>=1.1.0->haystack-ai)
  Downloading httpcore

In [2]:
from typing import List
from haystack import component, Pipeline

@component
class WelcomeTextGenerator:
  """
  A component generating personal welcome message and making it upper case
  """
  @component.output_types(welcome_text=str, note=str)
  def run(self, name:str):
    return {"welcome_text": ('Hello {name}, welcome to Haystack!'.format(name=name)).upper(),
             "note": "welcome message is ready"}

@component
class WhitespaceSplitter:
  """
  A component for splitting the text by whitespace
  """
  @component.output_types(splitted_text=List[str])
  def run(self, text:str):
    return {"splitted_text": text.split()}

from haystack import  Pipeline
text_pipeline = Pipeline()
text_pipeline.add_component(name="welcome_text_generator", instance= WelcomeTextGenerator())
text_pipeline.add_component(name="splitter", instance= WhitespaceSplitter())

text_pipeline.connect(sender="welcome_text_generator.welcome_text", receiver="splitter.text")



<haystack.core.pipeline.pipeline.Pipeline object at 0x7c94f9cdd750>
🚅 Components
  - welcome_text_generator: WelcomeTextGenerator
  - splitter: WhitespaceSplitter
🛤️ Connections
  - welcome_text_generator.welcome_text -> splitter.text (str)

In [5]:
text_pipeline.draw("./text_pipeline.png")

In [None]:
result = text_pipeline.run({"welcome_text_generator":{"name": "John Doe"}})

print(result["splitter"]["splitted_text"])

['HELLO', 'JOHN', 'DOE,', 'WELCOME', 'TO', 'HAYSTACK!']


In [None]:
result['welcome_text_generator']

{'note': 'welcome message is ready'}

## Incorporating custom components with existing components in a pipeline



### 1 Define custom component

In [11]:
from haystack import component, Document
from typing import Any, Dict, List, Optional, Union
from haystack.dataclasses import ByteStream
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.document_stores.types import DuplicatePolicy

from haystack.components.preprocessors import DocumentCleaner
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.writers import DocumentWriter

@component
class ParseHTML:

    @component.output_types(documents=List[Document])
    def run(self, sources: Dict[str, Any]) -> None:

        documents = []
        for source in sources:

            for key in source:
                if type(source[key]) == str:
                    source[key] = self.clean_text(source[key])

            if source['content'] == "":
                continue

            #drop content from source dictionary
            content = source['content']
            document = Document(content=content, meta=source)

            documents.append(document)

        return {"documents": documents}

    def clean_text(self, text):
        # Remove HTML tags using BeautifulSoup
        soup = BeautifulSoup(text, "html.parser")
        text = soup.get_text()
        # Remove extra whitespace
        text = re.sub(r'\s+', ' ', text).strip()
        return text

### 2 Initialize components

In [23]:
parse_html = ParseHTML()

document_store = InMemoryDocumentStore()

document_cleaner = DocumentCleaner(

                remove_empty_lines=True,

                remove_extra_whitespaces=True,

                remove_repeated_substrings=False)

document_splitter = DocumentSplitter(split_by="passage", split_length=5)

document_writer = DocumentWriter(

                  document_store=document_store,

                  policy = DuplicatePolicy.OVERWRITE)

### 3 Add components to the pipeline

In [24]:
pipeline = Pipeline()

pipeline.add_component( "parse_html", parse_html)

pipeline.add_component( "document_cleaner", document_cleaner)

pipeline.add_component( "document_splitter", document_splitter)

pipeline.add_component( "document_writer", document_writer)

### 4 Connect components to one another

In [25]:
# Connect components to one another

pipeline.connect("parse_html", "document_cleaner")

pipeline.connect("document_cleaner", "document_splitter")

pipeline.connect("document_splitter", "document_writer")

<haystack.core.pipeline.pipeline.Pipeline object at 0x7c941cc92f50>
🚅 Components
  - parse_html: ParseHTML
  - document_cleaner: DocumentCleaner
  - document_splitter: DocumentSplitter
  - document_writer: DocumentWriter
🛤️ Connections
  - parse_html.documents -> document_cleaner.documents (List[Document])
  - document_cleaner.documents -> document_splitter.documents (List[Document])
  - document_splitter.documents -> document_writer.documents (List[Document])

In [26]:
pipeline.draw("./custom_component_pipeline.png")