🔧 **Setup Required**: Before running this notebook, please follow the [setup instructions](../README.md#setup-instructions) to configure your environment and API keys.

# Building Your First Haystack Pipeline
This notebook walks you through the process of building and running a simple Haystack pipeline using prompt building and LLM generation components. Each step is explained to help you understand the pipeline structure and execution.

## 1. Environment Setup
We begin by loading environment variables (such as API keys) from a `.env` file. This keeps sensitive information secure and out of the codebase.

In [1]:
import os
from dotenv import load_dotenv

# Load environment variables from.env file
load_dotenv(".env")

# You can now access the API key using os.getenv
# openai_api_key = os.getenv("OPENAI_API_KEY")


True

## 2. Preparing Data and Components
We create a list of `Document` objects, each representing a piece of text with optional metadata. We also define a prompt template and instantiate the prompt builder and LLM generator components.

In [8]:
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.utils import Secret
from haystack.dataclasses import Document
from dotenv import load_dotenv
import os

load_dotenv()

# --- Prepare Data and Components ---

# We'll use some simple documents for this example
documents_for_pipeline = [
    Document(
        content="Rose Island was a micronation located in the Adriatic Sea. It declared independence in 1968 and had Italian as its official language.",
        meta={"source": "history_docs", "author": "historian1"},
    ),
    Document(
        content="The capital of Rose Island was called 'Isola delle Rose'. It was known for its unique architecture and vibrant culture.",
        meta={"source": "geography_docs", "author": "geographer1"},
    ),
    Document(
        content="Rose Island's economy was primarily based on tourism and the sale of souvenirs to visitors.",
        meta={"source": "economy_docs", "author": "economist1"},
    ),
]

# Define the prompt template again
prompt_template_for_pipeline = """
Answer the question based on this context.

Context:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}

Question: {{ query }}
Answer:
"""

# Instantiate the components we will use
prompt_builder_inst = PromptBuilder(template=prompt_template_for_pipeline,
                                    required_variables="*")
llm_generator_inst = OpenAIGenerator(api_key=Secret.from_env_var("OPENAI_API_KEY"), model="gpt-4o-mini")



## 3. Building the Pipeline
We instantiate a `Pipeline` object, add our components to it, and connect them. This defines the data flow: the prompt builder creates a prompt, which is then sent to the LLM generator.

In [9]:
# --- Build the Pipeline ---

# 1. Instantiate the Pipeline
basic_pipeline = Pipeline()

# 2. Add Component Instances
# We give each component a unique name: "prompter" and "llm".
basic_pipeline.add_component(name="prompter", instance=prompt_builder_inst)
basic_pipeline.add_component(name="llm", instance=llm_generator_inst)

# 3. Connect the Components
# This is the crucial step that defines the data flow.
# We connect the 'prompt' output of the 'prompter' component
# to the 'prompt' input of the 'llm' component.
basic_pipeline.connect("prompter.prompt", "llm.prompt")




<haystack.core.pipeline.pipeline.Pipeline object at 0x10f9c89b0>
🚅 Components
  - prompter: PromptBuilder
  - llm: OpenAIGenerator
🛤️ Connections
  - prompter.prompt -> llm.prompt (str)

## 5. Visualizing the Pipeline
Haystack can visualize your pipeline as a graph, making it easier to understand the flow of data and components.

In [7]:
basic_pipeline.draw(path="./images/prompt_pipeline.png")

print("\nPipeline visualization saved to './images/prompt_pipeline.png'")



Pipeline visualization saved to './images/prompt_pipeline.png'


![](./images/prompt_pipeline.png)

## 4. Running the Pipeline
We provide input data (a query and documents) and execute the pipeline. The output is the LLM's answer to the question, based on the provided context.

In [10]:
# --- Run the Pipeline ---

# The.run() method takes a dictionary as input.
# The keys of the dictionary correspond to the names of the components in the pipeline.
# The values are dictionaries of the inputs for that component.
query_text = "What was the official language of Rose Island?"

run_data = {
    "prompter": {
        "query": query_text,
        "documents": documents_for_pipeline
    }
}

# Execute the pipeline
pipeline_result = basic_pipeline.run(run_data)

# The output is a dictionary with the final results from the terminal component(s).
print("Pipeline Result:")
print(pipeline_result)


Pipeline Result:
{'llm': {'replies': ['The official language of Rose Island was Italian.'], 'meta': [{'model': 'gpt-4o-mini-2024-07-18', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 9, 'prompt_tokens': 103, 'total_tokens': 112, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}}]}}


---
## Summary and Next Steps
You have now built and run a simple Haystack pipeline! Try modifying the documents, prompt template, or components to experiment with different pipeline behaviors.