# Travel Guide Generator using Web Scraping and LLMs

In this notebook, we'll create a travel guide generator that:
1. Scrapes web content about a destination
2. Uses Ollama (with OpenAI-style interface) to process the information
3. Generates a structured travel guide

First, let's import our required libraries.

In [81]:
import requests
from bs4 import BeautifulSoup
import json
from IPython.display import Markdown, display
from openai import OpenAI

# Constants
# OLLAMA_API = "http://localhost:11434/api/chat"
# HEADERS = {"Content-Type": "application/json"}

# Define our model
MODEL = "qwen2"

ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='qwen2')

In [80]:
!ollama list

NAME               ID              SIZE      MODIFIED   
qwen2:latest       dd314f039b9d    4.4 GB    3 days ago    
llama3.2:latest    a80c4f17acd5    2.0 GB    3 days ago    


## Step 1: Web Scraping

Let's create our web scraping function to gather information about the destination.

In [82]:
def scrape_destination_info(destination):
    """Scrape travel information from Wikitravel"""
    formatted_destination = destination.replace(' ', '_')
    url = f"https://wikitravel.org/en/{formatted_destination}"
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    }
    
    try:
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Get main content div
        content = soup.find('div', {'id': 'mw-content-text'})
        if not content:
            return None
            
        # Initialize dictionary for different sections
        info = {
            'overview': '',
            'attractions': '',
            'transportation': '',
            'food': '',
            'tips': ''
        }
        
        # Get overview (usually first few paragraphs before any heading)
        intro_paras = []
        current = content.find('p')
        while current and current.name == 'p':
            intro_paras.append(current.get_text().strip())
            current = current.find_next_sibling()
        info['overview'] = ' '.join(intro_paras)
        
        # Find specific sections by their headings
        headings = content.find_all(['h2', 'h3'])
        current_section = None
        for heading in headings:
            heading_text = heading.get_text().lower()
            
            # Map headings to our sections
            if 'see' in heading_text or 'sight' in heading_text:
                current_section = 'attractions'
            elif 'get around' in heading_text or 'transport' in heading_text:
                current_section = 'transportation'
            elif 'eat' in heading_text or 'food' in heading_text:
                current_section = 'food'
            elif 'understand' in heading_text or 'tips' in heading_text:
                current_section = 'tips'
            
            # If we found a relevant section, get its content
            if current_section:
                section_content = []
                next_elem = heading.find_next_sibling()
                while next_elem and next_elem.name not in ['h2', 'h3']:
                    if next_elem.name == 'p':
                        section_content.append(next_elem.get_text().strip())
                    next_elem = next_elem.find_next_sibling()
                info[current_section] += ' '.join(section_content)
        
        # # Print preview of what we found
        # print("Scraped content preview:")
        # for section, content in info.items():
        #     print(f"\n{section.upper()}:")
        #     print(content[:200] + "..." if content else "No content found")
            
        return info
        
    except Exception as e:
        print(f"Error scraping data: {e}")
        return None

## Step 2: Define Prompts

We'll create system and user prompts for our LLM to process the scraped content.

In [83]:
def get_system_prompt():
    return """You are a travel guide generator. You MUST output ONLY a JSON object with exactly these 5 keys:
    - overview
    - attractions
    - transportation
    - food_and_dining
    - tips
    
    Each key MUST contain a plain text string value (not a list or object).
    For attractions, transportation, and other structured information, 
    include it as formatted text with bullet points or numbering.
    
    Example format:
    {
        "overview": "Paris is a beautiful city...",
        "attractions": "1. Eiffel Tower - The iconic symbol of Paris...\n2. The Louvre - World's largest museum...",
        "transportation": "Metro: The Paris Metro is extensive...\nBus: Bus services run throughout...",
        "food_and_dining": "Paris offers world-class dining...\n- Cafes: Traditional French cafes...\n- Restaurants: Michelin-starred...",
        "tips": "1. Learn basic French phrases\n2. Buy metro tickets in bulk\n3. Many shops close on Sundays"
    }
    
    The response MUST be a valid JSON object and MUST contain all these keys with text string values.
    Do not use nested objects or arrays."""

def get_user_prompt(destination, info):
    return f"""Create a travel guide for {destination} using this information:

Overview: {info['overview']}

Attractions: {info['attractions']}

Transportation: {info['transportation']}

Food & Dining: {info['food']}

Tips & Understanding: {info['tips']}

Generate a travel guide as a JSON object with exactly these keys:
{{
    "overview": "brief city introduction",
    "attractions": "main tourist sites",
    "transportation": "how to get around",
    "food_and_dining": "food recommendations",
    "tips": "practical advice"
}}
"""

## Step 3: Generate Travel Guide

Now let's create a function that uses Ollama to generate our guide.

In [84]:
def generate_guide(destination, content):
    """Generate travel guide using Ollama"""
    try:
        response = ollama_via_openai.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "system", "content": get_system_prompt()},
                {"role": "user", "content": get_user_prompt(destination, content)}
            ],
            response_format={"type": "json_object"},
            temperature=0.3  # Add lower temperature for more consistent output
        )
        
        # Print raw response for debugging
        # print("Raw response content:")
        # print(response.choices[0].message.content)
        
        try:
            result = json.loads(response.choices[0].message.content)
            
            # Verify all required keys are present
            required_keys = ['overview', 'attractions', 'transportation', 'food_and_dining', 'tips']
            missing_keys = [key for key in required_keys if key not in result]
            if missing_keys:
                print(f"Missing required keys: {missing_keys}")
                return None
                
            return result
            
        except json.JSONDecodeError as json_err:
            print(f"JSON parsing error: {json_err}")
            print("Problematic content:")
            print(response.choices[0].message.content)
            return None
            
    except Exception as e:
        print(f"Error generating guide: {e}")
        return None

## Step 4: Format and Display Guide

Let's create a function to format our guide in a readable way.

In [85]:
def format_guide(guide_data, destination):
    """Format the guide data as markdown"""
    markdown_text = f"# Travel Guide: {destination}\n\n"
    
    # For Overview section
    if 'overview' in guide_data:
        markdown_text += "## Overview\n\n"
        markdown_text += f"{guide_data['overview']}\n\n"
    
    # For Attractions section
    if 'attractions' in guide_data:
        markdown_text += "## Must-See Attractions\n\n"
        attractions = guide_data['attractions']
        if isinstance(attractions, list):
            for attraction in attractions:
                markdown_text += f"### {attraction['name']}\n"
                markdown_text += f"{attraction['description']}\n\n"
        else:
            markdown_text += f"{attractions}\n\n"
    
    # For Transportation section
    if 'transportation' in guide_data:
        markdown_text += "## Getting Around\n\n"
        transport = guide_data['transportation']
        if isinstance(transport, dict):
            for method, info in transport.items():
                markdown_text += f"### {method}\n"
                markdown_text += f"{info}\n\n"
        else:
            markdown_text += f"{transport}\n\n"
    
    # For Food & Dining section
    if 'food_and_dining' in guide_data:
        markdown_text += "## Food & Dining\n\n"
        dining = guide_data['food_and_dining']
        if isinstance(dining, list):
            for item in dining:
                if isinstance(item, dict):
                    for title, desc in item.items():
                        markdown_text += f"### {title}\n"
                        markdown_text += f"{desc}\n\n"
                else:
                    markdown_text += f"- {item}\n"
        else:
            markdown_text += f"{dining}\n\n"
    
    # For Tips section
    if 'tips' in guide_data:
        markdown_text += "## Practical Tips\n\n"
        tips = guide_data['tips']
        if isinstance(tips, list):
            for tip in tips:
                markdown_text += f"- {tip}\n"
        else:
            markdown_text += f"{tips}\n\n"
    
    return markdown_text

## Step 5: Put It All Together

Finally, let's create our main function that combines all steps.

In [86]:
def create_travel_guide(destination):
    """Main function to create a travel guide"""
    print(f"Scraping information for {destination}...")
    content = scrape_destination_info(destination)
    if not content:
        print("Failed to scrape content")
        return
    
    print("Generating guide...")
    guide_data = generate_guide(destination, content)
    if not guide_data:
        print("Failed to generate guide")
        return
    
    markdown_guide = format_guide(guide_data, destination)
    display(Markdown(markdown_guide))

## Try It Out!

Now you can generate a travel guide for any destination:

In [87]:
# Example usage
create_travel_guide("Paris")

Scraping information for Paris...
Generating guide...


# Travel Guide: Paris

## Overview

{'title': 'Paris, the City of Light', 'description': "Paris, often referred to as 'The City of Light', is a world-renowned capital that combines history with modernity. It's famous for its iconic landmarks like the Eiffel Tower and Notre-Dame Cathedral, as well as its rich cultural heritage and artistic contributions."}

## Must-See Attractions

{'top_sights': [{'name': 'Eiffel Tower', 'description': 'A symbol of Paris with stunning views from its top floors.'}, {'name': 'Louvre Museum', 'description': 'Home to the Mona Lisa, this museum is one of the largest in the world.'}, {'name': 'Notre-Dame Cathedral', 'description': 'An iconic Gothic masterpiece that has been a center for worship and culture for centuries.'}, {'name': 'Montmartre', 'description': 'A bohemian neighborhood with the Sacré-Cœur Basilica offering panoramic views of Paris.'}, {'name': 'Champs-Élysées', 'description': 'A famous avenue lined with luxury shops, cafes, and theaters ending at the Arc de Triomphe.'}]}

## Getting Around

### options
[{'mode': 'Public Transportation', 'details': 'Paris has an extensive network of buses, metro lines (RER), and suburban trains that cover most areas of the city.'}, {'mode': 'Biking', 'details': "The Vélib' bike-sharing system allows visitors to explore Paris on two wheels."}, {'mode': 'Walking', 'details': 'Paris is designed for walking with many pedestrian zones, making it easy to navigate and enjoy its architecture.'}, {'mode': 'Taxis and Ride-Sharing', 'details': 'Uber, Lyft alternatives, and traditional taxis are readily available in the city.'}]

### tips
[{'title': 'Plan your route', 'description': 'Use maps or apps like Google Maps to plan routes ahead of time.'}, {'title': 'Avoid rush hour', 'description': 'Traffic can be heavy during peak hours, especially on weekdays from 7-9am and 5-8pm.'}]

## Food & Dining

{'recommendations': [{'name': 'Bistros and Cafés', 'description': 'Parisian bistros offer classic French cuisine in a casual setting.'}, {'name': 'Pâtisseries', 'description': 'Indulge in pastries like croissants, macarons, and eclairs from famous bakeries.'}, {'name': 'Street Food', 'description': 'Discover street food markets for local delicacies and snacks.'}, {'name': 'Wine Bars', 'description': 'Savor French wines paired with cheese or charcuterie at wine bars.'}], 'local_favorites': [{'name': 'Crêperies', 'description': 'Try traditional crêpes filled with sweet or savory ingredients.'}, {'name': 'Bread Shops', 'description': 'Visit artisanal bakeries for freshly baked baguettes and pastries.'}]}

## Practical Tips

{'general_tips': [{'title': 'Get a Paris Pass', 'description': 'The Paris Pass offers free or discounted entry to many attractions and includes unlimited travel on public transport.'}, {'title': 'Visit during off-peak times', 'description': 'Avoid the crowds by visiting popular sites early in the morning or late afternoon.'}, {'title': 'Use public transportation', 'description': "Paris' efficient public transport system is an excellent way to explore the city without a car."}], 'safety_and_security': [{'title': 'Be aware of pickpockets', 'description': 'Keep your belongings secure, especially in crowded areas like tourist sites and on public transportation.'}, {'title': 'Stay informed about local events', 'description': 'Check for any city-wide events that may affect travel or crowds.'}]}



## Learning Exercises

1. Try modifying the system prompt to get different types of information
2. Add error handling for cases where certain information is missing
3. Experiment with different formatting styles for the output
4. Add functionality to save guides to files
5. Implement caching to avoid repeated web scraping

## Notes

- The quality of the guide depends on the available content on Wikitravel
- You might want to add more sources for comprehensive information
- Consider rate limiting for web scraping to be respectful to the servers