# OpenStreetMap RAG pipeline

## OpenStreetMap + Haystack: From basic queries to agents

  <img src="https://wiki.openstreetmap.org/w/images/7/79/Public-images-osm_logo.svg" height="170"/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<img src="https://haystack.deepset.ai/images/haystack-ogimage.png" width="350" style="display:inline;">

[OpenStreetMap](https://www.openstreetmap.org/) is a free, community-driven map of the world. In this notebook, we use the [osm-integration-haystack](https://github.com/grexrr/osm-integration-haystack) package to turn OpenStreetMap data into `Haystack Document`s and then plug them into LLM workflows.

We'll together walk through two progressively more advanced scenarios:

1. **Basic OSM query ‚Üí LLM summarization**  
   Use `OSMFetcher` to retrieve and preprocess nearby points of interest (POIs) around Cork city centre, then build a prompt that summarizes the locations for a specific user query (e.g. ‚Äúfind coffee shops nearby‚Äù).

2. **Agent + tools: itinerary planner**  
   Wrap an OSM-based pipeline as a Haystack `PipelineTool`, expose it to an agent and let the LLM call this tool to plan an afternoon itinerary in Cork.

## Setup

In [None]:
!pip install -q haystack-ai osm-integration-haystack

## Part 1: OpenStreetMap + LLM Summarization

This part is a **preparation step** before using Agents and tools.  
We focus on turning raw OpenStreetMap data into a small, vector-like knowledge base via `OSMFetcher`, and then asking an LLM to summarize it. In simpler terms, Part 1 demonstrates the basic pattern:

üó∫Ô∏è OpenStreetMap (Overpass API)  
‚ÄÉ‚ÄÉ‚Üí üì° OSMFetcher  
‚ÄÉ‚ÄÉ‚Üí üìÑ Documents (our vectorized knowledge base)  
‚ÄÉ‚ÄÉ‚Üí üß© ChatPromptBuilder + üß† OpenAIChatGenerator  
‚ÄÉ‚ÄÉ‚Üí ü§ñ LLM summarization

This will lay the foundation for more complex, **agentic** behavior in Part 2, where we'll wrap this logic into a reusable tool that an Agent can call automatically.

**Authorization**

Before start, you need to provide your own OpenAI API key:

In [None]:
import os
from getpass import getpass

if "OPENAI_API_KEY" in os.environ:
    del os.environ["OPENAI_API_KEY"]

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")

**Extra:** From Name (String) to Coordinates (Tuple)

In this example we use [Nominatim](https://nominatim.org/) to **geocode** the place name  
*Saints Peter and Paul's Catholic Church* into latitude/longitude coordinates.  

This is not the main focus of the notebook. In real-world geocoding workflows you usually have to deal with ambiguity, match quality, and various string-cleaning heuristics, which are out of scope here. In most map-based applications, for accuracy and robustness, backend services expect a concrete `(latitude, longitude)` tuple rather than raw location strings.

In [None]:
!pip install -q geopy

In [None]:
from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent="haystack-osm-cookbook-demo")

# Geo-decoding a name string into geocode
location_name = "saints peter and paul's catholic church"
location = geolocator.geocode(location_name)

print(f"Query: {location_name}")
print(f"Latitude:  {location.latitude}")
print(f"Longitude: {location.longitude}")
print(f"Display name: {location.address}")


Query: saints peter and paul's catholic church
Latitude:  51.8989077
Longitude: -8.4743188
Display name: Saints Peter and Paul's Catholic Church, Carey's Lane, The Marsh, Centre B ED, Cork, County Cork, Munster, T12 FH27, √âire / Ireland


### Step 1
Here we can just use the coordinate turple as the more conventional input.

In [None]:
from osm_integration_haystack import OSMFetcher

CENTER = (51.8989077, -8.4743188)  # (lat, lon)
RADIUS_M = 1000

In [None]:
osm_fetcher = OSMFetcher(
        preset_center=CENTER,  # Cork, Ireland
        preset_radius_m=RADIUS_M,  # 200m radius
        target_osm_types=["node"],  # Only search nodes
        target_osm_tags=["amenity"],  # Search amenity types
        maximum_query_mb=2,  # Limit query size
        overpass_timeout=20
    )

In [None]:
result = osm_fetcher.run()     # Haystack component Ê†áÂáÜÊé•Âè£
documents = result["documents"]

Current Query:

        [out:json][timeout:20][maxsize:2000000];
        (
            node[amenity](around:1000,51.8989077,-8.4743188);
        );
        out geom;
        
Status: 200
Response: {
  "version": 0.6,
  "generator": "Overpass API 0.7.62.8 e802775f",
  "osm3s": {
    "timestamp_osm_base": "2025-11-15T15:10:27Z",
    "copyright": "The data included in this document is from www.ope...
[OSM_Doc_Converter] Reading Raw OSM GeoJson...
[OSM_Doc_Converter] Loaded 955 entries.
[OSM_Doc_Converter] Batch-processing data cleaning.


In [None]:
from pprint import pprint

first_doc = documents[0]
print("üìÑ type:", type(first_doc))

print("\n--- content ---")
print(first_doc.content)

print("\n--- meta keys ---")
print(list(first_doc.meta.keys()))

print("\n--- full meta ---")
pprint(first_doc.meta)


üìÑ type: <class 'haystack.dataclasses.document.Document'>

--- content ---
Restaurant: Koto, Carey's Lane, 6-7, T12 FH27. Tags: opening_hours=Mo-Su 12:00-22:00

--- meta keys ---
['source', 'osm_id', 'osm_type', 'lat', 'lon', 'name', 'category', 'tags', 'tags_norm', 'address', 'distance_m']

--- full meta ---
{'address': {'housenumber': '6-7',
             'postcode': 'T12 FH27',
             'street': "Carey's Lane"},
 'category': 'restaurant',
 'distance_m': 27.86087599824802,
 'lat': 51.8990101,
 'lon': -8.4739482,
 'name': 'Koto',
 'osm_id': 5203928867,
 'osm_type': 'node',
 'source': 'openstreetmap',
 'tags': {'amenity': 'restaurant',
          'contact:facebook': 'https://www.facebook.com/KotoCork/',
          'contact:instagram': 'https://www.instagram.com/kotocork',
          'cuisine': 'asian',
          'email': 'info@koto.ie',
          'opening_hours': 'Mo-Su 12:00-22:00',
          'phone': '+353-21-4274172',
          'smoking': 'no',
          'website': 'https://koto.

In [None]:
def preview_documents(docs, limit=5):
    print(f"Previewing first {min(len(docs), limit)} documents:\n")

    for i, doc in enumerate(docs[:limit], start=1):
        name = doc.meta.get("name", "Unknown")
        category = doc.meta.get("category", "Unknown")
        distance = doc.meta.get("distance_m", 0.0)
        lat = doc.meta.get("lat")
        lon = doc.meta.get("lon")

        print(f"{i}. {name}")
        print(f"   Type: {category}")
        print(f"   Distance: {distance:.1f} m")
        print(f"   Location: ({lat}, {lon})")
        print(f"   Content: {doc.content[:120]}{'...' if len(doc.content) > 120 else ''}")
        print()

preview_documents(documents, limit=5)


Previewing first 5 documents:

1. Koto
   Type: restaurant
   Distance: 27.9 m
   Location: (51.8990101, -8.4739482)
   Content: Restaurant: Koto, Carey's Lane, 6-7, T12 FH27. Tags: opening_hours=Mo-Su 12:00-22:00

2. Dukes
   Type: cafe
   Distance: 28.7 m
   Location: (51.8991234, -8.474089)
   Content: Cafe: Dukes, Carey's Lane, 4, Cork.

3. Soba Asian Street Food
   Type: fast_food
   Distance: 30.1 m
   Location: (51.8989516, -8.4738856)
   Content: Fast_food: Soba Asian Street Food.

4. OffBeat Donuts
   Type: fast_food
   Distance: 35.1 m
   Location: (51.8990968, -8.4739097)
   Content: Fast_food: OffBeat Donuts, French Church Street, 17, Cork.

5. Burritos and Blues
   Type: fast_food
   Distance: 43.6 m
   Location: (51.899271, -8.4745565)
   Content: Fast_food: Burritos and Blues, Paul Street, 9, Cork. Tags: opening_hours=Mo-We 12:00-20:00; Th-Sa 12:00-21:00; Su 13:00-...



## Part2: Pipeline to look for the nearest coffee shop

In [None]:
from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

In [None]:
prompt_template = [
    ChatMessage.from_system(
        "You are a geographic information assistant. "
        "Based on the provided OpenStreetMap data, help the user find nearby places that match the user's query."
    ),
    ChatMessage.from_user(
        """
        User location: {{ user_location }}
        Search radius: {{ radius }}m
        User query: {{ query }}

        Available location data:
        {% for document in documents %}
        - {{ document.content }}
          Location: ({{ document.meta.lat }}, {{ document.meta.lon }})
          Distance: {{ document.meta.distance_m }}m
          Type: {{ document.meta.category }}
        {% endfor %}

        Please:
        1. Find all locations that are relevant to the user's query
        2. Sort them by distance
        3. Recommend the nearest 3 locations
        4. Provide a short description for each

        Please respond in English.
        """
    ),
]

prompt_builder = ChatPromptBuilder(
    template=prompt_template,
    required_variables=["user_location", "radius", "query", "documents"], # optional, depends on what your pipeline requires
)


In [None]:
llm = OpenAIChatGenerator(
    api_key=Secret.from_env_var("OPENAI_API_KEY"),
    model="gpt-4o-mini",
)

In [None]:
coffee_pipeline = Pipeline()
coffee_pipeline.add_component("osm_fetcher", osm_fetcher)
coffee_pipeline.add_component("prompt_builder", prompt_builder)
coffee_pipeline.add_component("llm", llm)

# documents to prompt_builder
coffee_pipeline.connect("osm_fetcher.documents", "prompt_builder.documents")
# ChatPromptBuilder output toward prompt(List[ChatMessage]) as llm.messages
coffee_pipeline.connect("prompt_builder.prompt", "llm.messages")


<haystack.core.pipeline.pipeline.Pipeline object at 0x7dadaadb0350>
üöÖ Components
  - osm_fetcher: OSMFetcher
  - prompt_builder: ChatPromptBuilder
  - llm: OpenAIChatGenerator
üõ§Ô∏è Connections
  - osm_fetcher.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> llm.messages (list[ChatMessage])

In [None]:
search_query = "coffee shop"

In [None]:
user_location = "Cork, Ireland"
radius = 1000

result = coffee_pipeline.run(
    {
        "osm_fetcher": {},
        "prompt_builder": {
            "user_location": user_location,
            "radius": radius,
            "query": search_query,
        },
    }
)

reply = result["llm"]["replies"][0]
print("Role:", reply.role)
print("\nAssistant reply:\n")
print(reply.text)


Current Query:

        [out:json][timeout:20][maxsize:2000000];
        (
            node[amenity](around:1000,51.8989077,-8.4743188);
        );
        out geom;
        
Status: 200
Response: {
  "version": 0.6,
  "generator": "Overpass API 0.7.62.8 e802775f",
  "osm3s": {
    "timestamp_osm_base": "2025-11-15T15:11:30Z",
    "copyright": "The data included in this document is from www.ope...
[OSM_Doc_Converter] Reading Raw OSM GeoJson...
[OSM_Doc_Converter] Loaded 955 entries.
[OSM_Doc_Converter] Batch-processing data cleaning.
Role: ChatRole.ASSISTANT

Assistant reply:

Based on your query for coffee shops in Cork within a 1000m radius from your location, here are the nearest three options:

1. **Dukes, Carey's Lane, 4, Cork**  
   - **Distance:** 28.70m  
   - **Description:** A cozy caf√© located on Carey's Lane, perfect for grabbing a quick coffee or enjoying a light snack in a relaxed atmosphere.

2. **Plus & Minus, Cork**  
   - **Distance:** 45.59m  
   - **Description:** 

## Part 3 : Planning an afternoon itinerary with an Agent and OSM tools



In [197]:
from osm_integration_haystack import OSMFetcher

CENTER = (51.898403, -8.473978)
RADIUS_M = 1000

itinerary_fetcher = OSMFetcher(
    preset_center=CENTER,
    preset_radius_m=RADIUS_M,
    target_osm_types=["node"],
    target_osm_tags=[
        "amenity",
        "tourism",
        "leisure",
    ],
    maximum_query_mb=4,
    overpass_timeout=30,
)


In [198]:
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage

itinerary_prompt_template = [
    ChatMessage.from_system(
        "You are a local travel planner in Cork, Ireland. "
        "Always answer in concise English."
    ),
    ChatMessage.from_user(
        "User request:\n{{ user_request }}\n\n"
        "Here are some nearby locations from OpenStreetMap:\n"
        "{% if documents %}"
        "{% for doc in documents[:40] %}"
        "- {{ doc.meta.get('name', 'Unknown') }} "
        "(type: {{ doc.meta.get('category', 'unknown') }}, "
        "distance: {{ '%.1f'|format(doc.meta.get('distance_m', 0)) }} m)\n"
        "{% endfor %}"
        "{% else %}"
        "No locations available.\n"
        "{% endif %}\n\n"
        "Using this information, suggest 1‚Äì2 itineraries starting from a church or "
        "historic religious site, then a study-friendly cafe, and ending at a bar/pub."
    ),
]

itinerary_prompt_builder = ChatPromptBuilder(template=itinerary_prompt_template)




In [199]:
from haystack import Pipeline

agent_itinerary_pipeline = Pipeline()
agent_itinerary_pipeline.add_component("itinerary_osm_fetcher", itinerary_fetcher)
agent_itinerary_pipeline.add_component("itinerary_prompt_builder", itinerary_prompt_builder)

# Êää OSMFetcher ÁöÑ documents Â°ûËøõ ChatPromptBuilder ÁöÑ template_variables.documents
agent_itinerary_pipeline.connect(
    "itinerary_osm_fetcher.documents",
    "itinerary_prompt_builder.documents",
)


<haystack.core.pipeline.pipeline.Pipeline object at 0x7dadaa059460>
üöÖ Components
  - itinerary_osm_fetcher: OSMFetcher
  - itinerary_prompt_builder: ChatPromptBuilder
üõ§Ô∏è Connections
  - itinerary_osm_fetcher.documents -> itinerary_prompt_builder.documents (List[Document])

Test Pipeline output

In [201]:
test_res = agent_itinerary_pipeline.run(
    {
        "itinerary_prompt_builder": {
            "user_request": "I want to spend an afternoon in Cork city centre...",
            "template_variables": {}
        }
    }
)

msgs = test_res["itinerary_prompt_builder"]["prompt"]
for m in msgs:
    print(m.role, ":\n", m.text, "\n")


Current Query:

        [out:json][timeout:30][maxsize:4000000];
        (
            node[amenity](around:1000,51.898403,-8.473978);
node[tourism](around:1000,51.898403,-8.473978);
node[leisure](around:1000,51.898403,-8.473978);
        );
        out geom;
        
Status: 200
Response: {
  "version": 0.6,
  "generator": "Overpass API 0.7.62.8 e802775f",
  "osm3s": {
    "timestamp_osm_base": "2025-11-15T16:52:29Z",
    "copyright": "The data included in this document is from www.ope...
[OSM_Doc_Converter] Reading Raw OSM GeoJson...
[OSM_Doc_Converter] Loaded 1052 entries.
[OSM_Doc_Converter] Batch-processing data cleaning.
ChatRole.SYSTEM :
 You are a local travel planner in Cork, Ireland. Always answer in concise English. 

ChatRole.USER :
 User request:
I want to spend an afternoon in Cork city centre...

Here are some nearby locations from OpenStreetMap:
- bicycle_parking (type: bicycle_parking, distance: 2.0 m)
- bicycle_parking (type: bicycle_parking, distance: 9.9 m)
- bicycl

Wrap with PipelineTool

In [202]:
from haystack.tools import PipelineTool

osm_itinerary_tool = PipelineTool(
    pipeline=agent_itinerary_pipeline,
    name="osm_itinerary_tool",
    description=(
        "Fetches nearby POIs and "
        "builds a chat-style prompt summarizing."
    ),
    # Tool ËæìÂÖ• -> Pipeline ËæìÂÖ•
    input_mapping={
        # tool ÁöÑ "user_request" -> pipeline ÁöÑ "prompt_builder.user_request"
        "user_request": ["itinerary_prompt_builder.user_request"],
    },
    # Pipeline ËæìÂá∫ -> Tool ËæìÂá∫Âêç
    output_mapping={
        "itinerary_prompt_builder.prompt": "prompt",
    },
)




In [203]:
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.agents import Agent
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

itinerary_llm = OpenAIChatGenerator(
    api_key=Secret.from_env_var("OPENAI_API_KEY"),
    model="gpt-4o-mini",
)

itinerary_agent = Agent(
    chat_generator=itinerary_llm,
    tools=[osm_itinerary_tool],
    system_prompt=(
        "You are a helpful local guide in Cork, Ireland.\n\n"
        "When the user asks you to plan an itinerary, first call 'osm_itinerary_tool'. "
        "This tool returns a list of chat messages under the field 'prompt', which already "
        "contains the user's request and a list of nearby locations.\n\n"
        "Read those messages carefully, then respond with 1‚Äì2 itineraries "
        "(church -> cafe -> bar/pub), including approximate walking distances."
    ),
)

itinerary_agent.warm_up()

In [204]:
user_request = (
    "I want to spend an afternoon in Cork city centre. "
    "Please plan 1‚Äì2 possible itineraries where I:\n"
    "1) start by visiting a church or historic religious site,\n"
    "2) then go to a quiet cafe where I can study or work on my laptop,\n"
    "3) and finally end the day in a nice bar or pub nearby.\n\n"
    "All places should be within reasonable walking distance. "
    "For each itinerary, please include the place names, approximate distances between stops, "
    "and a short explanation of why you chose them."
)

result = itinerary_agent.run(messages=[ChatMessage.from_user(user_request)])

final_msg = result["messages"][-1]
print("Final role:", final_msg.role)
print("\nAssistant final reply:\n")
print(final_msg.text)


Current Query:

        [out:json][timeout:30][maxsize:4000000];
        (
            node[amenity](around:1000,51.898403,-8.473978);
node[tourism](around:1000,51.898403,-8.473978);
node[leisure](around:1000,51.898403,-8.473978);
        );
        out geom;
        
Status: 200
Response: {
  "version": 0.6,
  "generator": "Overpass API 0.7.62.8 e802775f",
  "osm3s": {
    "timestamp_osm_base": "2025-11-15T16:56:42Z",
    "copyright": "The data included in this document is from www.ope...
[OSM_Doc_Converter] Reading Raw OSM GeoJson...
[OSM_Doc_Converter] Loaded 1052 entries.
[OSM_Doc_Converter] Batch-processing data cleaning.
Final role: ChatRole.ASSISTANT

Assistant final reply:

Here are two possible itineraries for an afternoon in Cork city centre:

### Itinerary 1:
1. **Start: St. Anne's Shandon Church**
   - **Description:** This iconic church is famous for its stunning architecture and views from the tower. It's an excellent spot to explore Cork's history.
   - **Distance to nex