# OpenStreetMap RAG pipeline

## OpenStreetMap + Haystack: From basic queries to agents

  <img src="https://wiki.openstreetmap.org/w/images/7/79/Public-images-osm_logo.svg" height="170"/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<img src="https://haystack.deepset.ai/images/haystack-ogimage.png" width="350" style="display:inline;">

[OpenStreetMap](https://www.openstreetmap.org/) is a free, community-driven map of the world. In this notebook, we use the [osm-integration-haystack](https://github.com/grexrr/osm-integration-haystack) package to turn OpenStreetMap data into `Haystack Document`s and then plug them into LLM workflows.

We'll together walk through two progressively more advanced scenarios:

1. **Basic OSM query ‚Üí LLM summarization**  
   Use `OSMFetcher` to retrieve and preprocess nearby points of interest (POIs) around Cork city centre, then build a prompt that summarizes the locations for a specific user query (e.g. ‚Äúfind coffee shops nearby‚Äù).

2. **Agent + tools: itinerary planner**  
   Wrap an OSM-based pipeline as a Haystack `PipelineTool`, expose it to an agent and let the LLM call this tool to plan an afternoon itinerary in Cork.

## Setup

In [1]:
!pip install -q haystack-ai osm-integration-haystack

[?25l   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/624.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[90m‚ï∫[0m [32m614.4/624.7 kB[0m [31m18.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m624.7/624.7 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/145.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m145.2/145.2 kB[0m [31m16.6 MB/s[0m eta [36m0:00:00

## Part 1: Knowledge base Vectorization

This part is a **preparation step** before using Agents and tools.  
We focus on turning raw OpenStreetMap data into a small, vector-like knowledge base via `OSMFetcher`, and then in the next part we'll asking an LLM to summarize it. In simpler terms, Part 1 demonstrates the step 1-2 of the basic pattern:

üó∫Ô∏è OpenStreetMap (Overpass API)  
‚ÄÉ‚ÄÉ‚Üí 1. üì° OSMFetcher  
‚ÄÉ‚ÄÉ‚Üí 2. üìÑ Documents (our vectorized knowledge base)  
‚ÄÉ‚ÄÉ‚Üí 3. üß© ChatPromptBuilder + üß† OpenAIChatGenerator  
‚ÄÉ‚ÄÉ‚Üí 4. ü§ñ LLM summarization

This will lay the foundation for more complex, **agentic** behavior introduced in the later sections, where we'll wrap this logic into a reusable tool that an agent can call automatically.

**Authorization**

Before start, you need to provide your own OpenAI API key:

In [2]:
import os
from getpass import getpass

if "OPENAI_API_KEY" in os.environ:
    del os.environ["OPENAI_API_KEY"]

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")

Enter OpenAI API key:¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


**Extra:** From Name (String) to Coordinates (Tuple)

In this example we use [Nominatim](https://nominatim.org/) to **geocode** the place name  
*Saints Peter and Paul's Catholic Church* into latitude/longitude coordinates.  

This is not the main focus of the notebook. In real-world geocoding workflows you usually have to deal with ambiguity, match quality, and various string-cleaning heuristics, which are out of scope here. In most map-based applications, for accuracy and robustness, backend services expect a concrete `(latitude, longitude)` tuple rather than raw location strings.

Feel free to use any places or landmark that you want!

In [3]:
!pip install -q geopy

In [4]:
from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent="haystack-osm-cookbook-demo")

# Geo-decoding a name string into geocode
location_name = "saints peter and paul's catholic church"
location = geolocator.geocode(location_name)

print(f"Query: {location_name}")
print(f"Latitude:  {location.latitude}")
print(f"Longitude: {location.longitude}")
print(f"Display name: {location.address}")


Query: saints peter and paul's catholic church
Latitude:  51.8989077
Longitude: -8.4743188
Display name: Saints Peter and Paul's Catholic Church, Carey's Lane, The Marsh, Centre B ED, Cork, County Cork, Munster, T12 FH27, √âire / Ireland


...here we can just use the coordinate turple as the more conventional input. In this scenario, we start from acquiring all "node" with "amenity" within 1000 meters for future AI processing.

In [5]:
from osm_integration_haystack import OSMFetcher

CENTER = (51.8989077, -8.4743188)  # (lat, lon)
RADIUS_M = 1000

In [6]:
osm_fetcher = OSMFetcher(
        preset_center=CENTER,  # Cork, Ireland
        preset_radius_m=RADIUS_M,  # 1000m radius
        target_osm_types=["node"],  # Only search nodes
        target_osm_tags=["amenity"],  # Search amenity types
        maximum_query_mb=2,  # Limit query size
        overpass_timeout=20
    )

In the context of OpenStreetMap, terms like `"node"` and `"amenity"` refer to well-defined [elements](https://wiki.openstreetmap.org/wiki/Elements) and [map features](https://wiki.openstreetmap.org/wiki/Map_features) that describe how real-world objects are encoded in the map data (for example, a caf√© as a point node with an `amenity=cafe` tag). The exact tagging scheme is not the focus of this tutorial. In the following examples, we‚Äôll use a small subset of these categories to keep the queries simple and focused.

The `OSMFetcher` component wraps the Overpass API and exposes a few key parameters:

- `preset_center: Optional[Tuple[float, float]]`  
  Default center point for all queries, as a `(latitude, longitude)` tuple.  

- `preset_radius_m: Optional[int]`  
  Default search radius in **meters** around the center.  

- `target_osm_types: Optional[Union[str, List[str]]]`  
  Which OSM element types to query: `"node"`, `"way"`, and/or `"relation"`.  
  If omitted, the fetcher queries all three: `["node", "way", "relation"]`.

- `target_osm_tags: Optional[Union[str, List[str]]]`  
  A list of top-level OSM tags to filter by, such as `["amenity", "tourism", "leisure"]`.  
  If set, the Overpass query will only return elements that have at least one of these tags.  
  If left as `None`, the fetcher does **not** filter by tag and will return all matching elements for the chosen types.

- `maximum_query_mb: Optional[int]`  
  Rough upper bound on the Overpass response size, in megabytes.  
  This is passed to Overpass as `maxsize` to avoid huge responses and timeouts (default: `5` MB).

- `max_token: int`  
  Intended as a soft budget for how much data should be returned to downstream LLM components.  
  In an LLM/Agent setting, this can be used to limit or compress the total amount of text and metadata so that it fits comfortably within the model's context window (default: `12000`).

- `overpass_timeout: Optional[int]`  
  Timeout for the Overpass API request, in seconds (default: `25`).  
  If the query is too heavy or the server is slow, this helps prevent the call from hanging indefinitely.

In most map-based backends, the typical pattern is to accept concrete `(lat, lon)` coordinates (for example, from the frontend's map widget or the user's GPS location) and then query nearby OSM elements using these parameters.


... then we transform the returned OpenStreetMap data into `Haystack Document`s.

In [7]:
result = osm_fetcher.run()
documents = result["documents"]

Current Query:

        [out:json][timeout:20][maxsize:2000000];
        (
            node[amenity](around:1000,51.8989077,-8.4743188);
        );
        out geom;
        
Status: 200
Response: {
  "version": 0.6,
  "generator": "Overpass API 0.7.62.8 e802775f",
  "osm3s": {
    "timestamp_osm_base": "2025-11-16T00:05:43Z",
    "copyright": "The data included in this document is from www.ope...
[OSM_Doc_Converter] Reading Raw OSM GeoJson...
[OSM_Doc_Converter] Loaded 955 entries.
[OSM_Doc_Converter] Batch-processing data cleaning.


### Inspecting a single `Document`

Haystack represents each piece of retrieved data as a `Document` with two main parts:

- `content`: human-readable, unstructured text.  
  This is what we usually embed, retrieve and show to the user. LLMs and retrievers
  mainly "look at" this field.

- `meta`: machine-readable, structured metadata stored as a Python dictionary.  
  This is where we keep all the fields that are useful for filtering, ranking or
  business logic (ids, coordinates, categories, tags, etc.).

In [16]:
from pprint import pprint

first_doc = documents[1]
print("üìÑ type:", type(first_doc))

print("\n--- content ---")
print(first_doc.content)

print("\n--- meta keys ---")
print(list(first_doc.meta.keys()))

print("\n--- full meta ---")
pprint(first_doc.meta)


üìÑ type: <class 'haystack.dataclasses.document.Document'>

--- content ---
Cafe: Dukes, Carey's Lane, 4, Cork.

--- meta keys ---
['source', 'osm_id', 'osm_type', 'lat', 'lon', 'name', 'category', 'tags', 'tags_norm', 'address', 'distance_m']

--- full meta ---
{'address': {'city': 'Cork',
             'country': 'IE',
             'housenumber': '4',
             'street': "Carey's Lane"},
 'category': 'cafe',
 'distance_m': 28.70318839718862,
 'lat': 51.8991234,
 'lon': -8.474089,
 'name': 'Dukes',
 'osm_id': 1128095411,
 'osm_type': 'node',
 'source': 'openstreetmap',
 'tags': {'amenity': 'cafe',
          'cuisine': 'coffee_shop',
          'entrance': 'main',
          'internet_access': 'wlan',
          'phone': '00353214905877',
          'wheelchair': 'yes'},
 'tags_norm': {'amenity': 'cafe',
               'cuisine': 'coffee_shop',
               'entrance': 'main',
               'internet_access': 'wlan',
               'phone': '00353214905877',
               'wheelchai

... and here is the preview of the preprocessed documents which will be passed to the subsequent pipeline.

In [9]:
def preview_documents(docs, limit=5):
    print(f"Previewing first {min(len(docs), limit)} documents:\n")

    for i, doc in enumerate(docs[:limit], start=1):
        name = doc.meta.get("name", "Unknown")
        category = doc.meta.get("category", "Unknown")
        distance = doc.meta.get("distance_m", 0.0)
        lat = doc.meta.get("lat")
        lon = doc.meta.get("lon")

        print(f"{i}. {name}")
        print(f"   Type: {category}")
        print(f"   Distance: {distance:.1f} m")
        print(f"   Location: ({lat}, {lon})")
        print(f"   Content: {doc.content[:120]}{'...' if len(doc.content) > 120 else ''}")
        print()

preview_documents(documents, limit=5)


Previewing first 5 documents:

1. Koto
   Type: restaurant
   Distance: 27.9 m
   Location: (51.8990101, -8.4739482)
   Content: Restaurant: Koto, Carey's Lane, 6-7, T12 FH27. Tags: opening_hours=Mo-Su 12:00-22:00

2. Dukes
   Type: cafe
   Distance: 28.7 m
   Location: (51.8991234, -8.474089)
   Content: Cafe: Dukes, Carey's Lane, 4, Cork.

3. Soba Asian Street Food
   Type: fast_food
   Distance: 30.1 m
   Location: (51.8989516, -8.4738856)
   Content: Fast_food: Soba Asian Street Food.

4. OffBeat Donuts
   Type: fast_food
   Distance: 35.1 m
   Location: (51.8990968, -8.4739097)
   Content: Fast_food: OffBeat Donuts, French Church Street, 17, Cork.

5. Burritos and Blues
   Type: fast_food
   Distance: 43.6 m
   Location: (51.899271, -8.4745565)
   Content: Fast_food: Burritos and Blues, Paul Street, 9, Cork. Tags: opening_hours=Mo-We 12:00-20:00; Th-Sa 12:00-21:00; Su 13:00-...



## Part2: Pipeline to look for the nearest coffee shop

I know that a query like ‚Äúfind the nearest coffee shop‚Äù is, by itself, a very simple geo-filtering task which you can solve it with a couple of distance calculations and a sort. That's why in this example, however, I frame it as an LLM task to show how preprocessing can enable richer logic on top of the same data.

`OSMFetcher` converts each OpenStreetMap point of interest into a Haystack `Document` with two sides (as you have seen in the previous section):

- `content` holds a short, human-readable description of the place (name, category, address, and a few tags).
- `meta` stores all the structured fields, such as `lat`, `lon`, `category`, `address`, and a pre-computed `distance_m` from the search center (the user's location passed into `OSMFetcher`).

In a real pipeline you would typically embed the `content` of each Document so that the embeddings capture the semantic meaning of the place descriptions - for example whether the text mentions ‚Äúlaptop‚Äù, ‚ÄúWi-Fi‚Äù, ‚Äústudy‚Äù, ‚Äúquiet‚Äù, ‚Äúbusy bar‚Äù, ‚Äútraditional pub‚Äù, and so on. At the same time, the numeric `distance_m` in `meta` gives you the classic ‚Äúmap-style‚Äù filter: how far this place is from the user.

In this pipeline the LLM never has to implement raw geospatial math. Instead, it reads the semantic description in `content` and combines it with the pre-computed `distance_m` field to decide which places both match the user's intent and are close enough. The low-level geospatial logic is pushed into `OSMFetcher`, and the LLM focuses purely on semantic filtering and ranking.


### Step 1. Build the Prompt and initialize a Pipeline
We begin by building prompt and specify the llm we are using.

In [10]:
from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

In [11]:
prompt_template = [
    ChatMessage.from_system(
        "You are a geographic information assistant. "
        "Based on the provided OpenStreetMap data, help the user find nearby places that match the user's query."
    ),
    ChatMessage.from_user(
        """
        User location: {{ user_location }}
        Search radius: {{ radius }}m
        User query: {{ query }}

        Available location data:
        {% for document in documents %}
        - {{ document.content }}
          Location: ({{ document.meta.lat }}, {{ document.meta.lon }})
          Distance: {{ document.meta.distance_m }}m
          Type: {{ document.meta.category }}
        {% endfor %}

        Please:
        1. Find all locations that are relevant to the user's query
        2. Sort them by distance
        3. Recommend the nearest 3 locations
        4. Provide a short description for each

        Please respond in English.
        """
    ),
]

prompt_builder = ChatPromptBuilder(
    template=prompt_template,
    required_variables=["user_location", "radius", "query", "documents"], # optional, depends on what your pipeline requires
)


In [12]:
llm = OpenAIChatGenerator(
    api_key=Secret.from_env_var("OPENAI_API_KEY"),
    model="gpt-4o-mini",
)

Here we output the `osm_fetcher.documents` to `prompt_builder` and the `prompt_builder.prompt` to the selected llm.

In [13]:
coffee_pipeline = Pipeline()
coffee_pipeline.add_component("osm_fetcher", osm_fetcher)
coffee_pipeline.add_component("prompt_builder", prompt_builder)
coffee_pipeline.add_component("llm", llm)

# documents to prompt_builder
coffee_pipeline.connect("osm_fetcher.documents", "prompt_builder.documents")
# ChatPromptBuilder output toward prompt(List[ChatMessage]) as llm.messages
coffee_pipeline.connect("prompt_builder.prompt", "llm.messages")


<haystack.core.pipeline.pipeline.Pipeline object at 0x7826361a4260>
üöÖ Components
  - osm_fetcher: OSMFetcher
  - prompt_builder: ChatPromptBuilder
  - llm: OpenAIChatGenerator
üõ§Ô∏è Connections
  - osm_fetcher.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> llm.messages (list[ChatMessage])

### Step 2. Query with natural language

In [19]:
search_query = "find me the nearest coffee shop for work, needs wifi"

In [20]:
user_location = "Cork, Ireland"
radius = 1000

result = coffee_pipeline.run(
    {
        "osm_fetcher": {},
        "prompt_builder": {
            "user_location": user_location,
            "radius": radius,
            "query": search_query,
        },
    }
)

reply = result["llm"]["replies"][0]
print("Role:", reply.role)
print("\nAssistant reply:\n")
print(reply.text)


Current Query:

        [out:json][timeout:20][maxsize:2000000];
        (
            node[amenity](around:1000,51.8989077,-8.4743188);
        );
        out geom;
        
Status: 200
Response: {
  "version": 0.6,
  "generator": "Overpass API 0.7.62.8 e802775f",
  "osm3s": {
    "timestamp_osm_base": "2025-11-16T00:08:54Z",
    "copyright": "The data included in this document is from www.ope...
[OSM_Doc_Converter] Reading Raw OSM GeoJson...
[OSM_Doc_Converter] Loaded 955 entries.
[OSM_Doc_Converter] Batch-processing data cleaning.
Role: ChatRole.ASSISTANT

Assistant reply:

Based on your query for the nearest coffee shops with Wi-Fi in Cork, here are the top three recommendations sorted by distance:

1. **Dukes**
   - **Type:** Cafe
   - **Location:** Carey's Lane, 4, Cork.
   - **Distance:** 28.7m
   - **Description:** A cozy cafe offering a selection of coffee and pastries, perfect for a work session. 

2. **Rebel Coffee Cork**
   - **Type:** Cafe
   - **Location:** French Church 

If you can recalled the document I showed in the previous section, you'll notice that `Dukes` has a tag that saids `'internet_access': 'wlan'` which matches the result we are looking for!

## Part 3 : Planning an afternoon itinerary with an Agent and OSM tools

Of course, in real application we are looking for a more open-ended, multi-step reasoning task. Rather than answering a single question like ‚ÄúWhere's the nearest coffee shop that has wifi‚Äù, the user now gives a vague but structured request: plan an afternoon itinerary with three stages ‚Äî a historic site, a quiet cafe to work in, and a nice bar or pub nearby.

To tackle this, we expose `OSMFetcher` as a tool and give it to an agent built with `OpenAIChatGenerator`. The agent receives a list of nearby places and is solely responsible for selecting, organizing, and justifying an itinerary ‚Äî using both semantic and geographic reasoning.

This setup allows the LLM to act more like a local guide: instead of answering one-shot prompts, it explores tool outputs and composes a meaningful plan in response to an open-ended user request.

### Step 1. Initial Setup

In [3]:
from osm_integration_haystack import OSMFetcher

CENTER = (51.898403, -8.473978)
RADIUS_M = 1000

itinerary_fetcher = OSMFetcher(
    preset_center=CENTER,
    preset_radius_m=RADIUS_M,
    target_osm_types=["node"],
    target_osm_tags=[
        "amenity",
        "tourism",
        "leisure",
    ],
    maximum_query_mb=4,
    overpass_timeout=30,
)


In [4]:
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage

itinerary_prompt_template = [
    ChatMessage.from_user(
        "User request:\n{{ user_request }}\n\n"
        "Here are some nearby locations from OpenStreetMap:\n"
        "{% if documents %}"
        "{% for doc in documents[:40] %}"
        "- {{ doc.meta.get('name', 'Unknown') }} "
        "(type: {{ doc.meta.get('category', 'unknown') }}, "
        "distance: {{ '%.1f'|format(doc.meta.get('distance_m', 0)) }} m)\n"
        "{% endfor %}"
        "{% else %}"
        "No locations available.\n"
        "{% endif %}\n\n"
    ),
]

itinerary_prompt_builder = ChatPromptBuilder(template=itinerary_prompt_template)




### Step 2. Build a pipeline for the Agent tool

In the agentic scenario, you are **STRONGLY ADVICED** to wrap the `OSMFetcher` and `ChatPromptBuilder` into a single pipeline. If you exposed `OSMFetcher` directly as a tool, the agent would receive a large, complex list of Documents ‚Äî which can easily exceed the context window and make planning harder. By composing this pipeline first and then wrapping it as a `PipelineTool`, we give the agent just enough curated information to reason effectively.



In [5]:
from haystack import Pipeline

agent_itinerary_pipeline = Pipeline()
agent_itinerary_pipeline.add_component("itinerary_osm_fetcher", itinerary_fetcher)
agent_itinerary_pipeline.add_component("itinerary_prompt_builder", itinerary_prompt_builder)

# Pass OSMFetcher's documents into ChatPromptBuilder's template_variables.documents
agent_itinerary_pipeline.connect(
    "itinerary_osm_fetcher.documents",
    "itinerary_prompt_builder.documents",
)


<haystack.core.pipeline.pipeline.Pipeline object at 0x7c52eda0ce00>
üöÖ Components
  - itinerary_osm_fetcher: OSMFetcher
  - itinerary_prompt_builder: ChatPromptBuilder
üõ§Ô∏è Connections
  - itinerary_osm_fetcher.documents -> itinerary_prompt_builder.documents (List[Document])

...we first test the pipeline output with a simple user prompt.

In [6]:
test_res = agent_itinerary_pipeline.run(
    {
        "itinerary_prompt_builder": {
            "user_request": "I want to spend an afternoon in Cork city centre...",
            "template_variables": {}
        }
    }
)

msgs = test_res["itinerary_prompt_builder"]["prompt"]
for m in msgs:
    print(m.role, ":\n", m.text, "\n")


Current Query:

        [out:json][timeout:30][maxsize:4000000];
        (
            node[amenity](around:1000,51.898403,-8.473978);
node[tourism](around:1000,51.898403,-8.473978);
node[leisure](around:1000,51.898403,-8.473978);
        );
        out geom;
        
Status: 200
Response: {
  "version": 0.6,
  "generator": "Overpass API 0.7.62.8 e802775f",
  "osm3s": {
    "timestamp_osm_base": "2025-11-16T00:45:46Z",
    "copyright": "The data included in this document is from www.ope...
[OSM_Doc_Converter] Reading Raw OSM GeoJson...
[OSM_Doc_Converter] Loaded 1052 entries.
[OSM_Doc_Converter] Batch-processing data cleaning.
ChatRole.USER :
 User request:
I want to spend an afternoon in Cork city centre...

Here are some nearby locations from OpenStreetMap:
- bicycle_parking (type: bicycle_parking, distance: 2.0 m)
- bicycle_parking (type: bicycle_parking, distance: 9.9 m)
- bicycle_parking (type: bicycle_parking, distance: 12.5 m)
- bicycle_parking (type: bicycle_parking, distance: 

### Step 3. **Wrap-up** the pipeline with `PipelineTool`.

This will be used by the agent as a single callable tool, while also helping reduce total token usage and avoid exceeding GPT's context limit (e.g., 12,000 tokens). Of course, the actual token usage depends on your own configuration - in particular, the size of the search area and how much detail each fetched location includes by the `OSMFetcher`.

In [7]:
from haystack.tools import PipelineTool

osm_itinerary_tool = PipelineTool(
    pipeline=agent_itinerary_pipeline,
    name="osm_itinerary_tool",
    description=(
        "Fetches nearby POIs and "
        "builds a chat-style prompt summarizing."
    ),

    input_mapping={
        "user_request": ["itinerary_prompt_builder.user_request"],
    },

    output_mapping={
        "itinerary_prompt_builder.prompt": "prompt",
    },
)




### Step 4. Create the Agent

We now create a `Haystack Agent` that knows how to use our `osm_itinerary_tool`.
This agent uses a chat-based LLM (`OpenAIChatGenerator`) and is given both:

* The `PipelineTool` (so it can fetch and summarize nearby POIs)
* A `system_prompt` (so it knows **when** to call the tool and **how** to respond)


In [8]:
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.agents import Agent
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

itinerary_llm = OpenAIChatGenerator(
    api_key=Secret.from_env_var("OPENAI_API_KEY"),
    model="gpt-4o-mini",
)

itinerary_agent = Agent(
    chat_generator=itinerary_llm,
    tools=[osm_itinerary_tool],
    system_prompt=(
        "You are a helpful local guide in Cork, Ireland.\n\n"
        "When the user asks you to plan an itinerary, first call 'osm_itinerary_tool'. "
        "This tool returns a list of chat messages under the field 'prompt', which already "
        "contains the user's request and a list of nearby locations.\n\n"
        "Read those messages carefully, then respond with 1‚Äì2 itineraries "
        "(church -> cafe -> bar/pub), including approximate walking distances."
    ),
)

itinerary_agent.warm_up()

...then we give it a user prompt that is complicated enough.

In [11]:
user_request = (
    "I want to spend an afternoon in Cork city centre. "
    "Please plan 1‚Äì2 possible itineraries where I:\n"
    "1) start by visiting a church or historic religious site,\n"
    "2) then go to the dentist for painful torture,\n"
    "3) and finally end the day in a nice bar or pub nearby.\n\n"
    "All places should be within reasonable walking distance. "
    "For each itinerary, please include the place names, approximate distances between stops, "
    "and a short explanation of why you chose them."
)

result = itinerary_agent.run(messages=[ChatMessage.from_user(user_request)])

final_msg = result["messages"][-1]
print("Final role:", final_msg.role)
print("\nAssistant final reply:\n")
print(final_msg.text)


Current Query:

        [out:json][timeout:30][maxsize:4000000];
        (
            node[amenity](around:1000,51.898403,-8.473978);
node[tourism](around:1000,51.898403,-8.473978);
node[leisure](around:1000,51.898403,-8.473978);
        );
        out geom;
        
Status: 200
Response: {
  "version": 0.6,
  "generator": "Overpass API 0.7.62.8 e802775f",
  "osm3s": {
    "timestamp_osm_base": "2025-11-16T01:00:12Z",
    "copyright": "The data included in this document is from www.ope...
[OSM_Doc_Converter] Reading Raw OSM GeoJson...
[OSM_Doc_Converter] Loaded 1052 entries.
[OSM_Doc_Converter] Batch-processing data cleaning.
Final role: ChatRole.ASSISTANT

Assistant final reply:

Here are two possible itineraries for spending an afternoon in Cork city centre, incorporating your requests:

### Itinerary 1
1. **Visit St. Fin Barre's Cathedral**  
   - **Distance from Starting Point:** Approximately 1.0 km (12 minutes walk)
   - **Why:** This stunning Gothic cathedral is one of Cork's m