## Project : Build your own Project

### Travel Planner AI: Develop a chatbot that assists travellers in planning their trips. It recommend top hotels and suggest day to day itineraries & top attractions based on user-specified destinations.

**Stage 1** — Intent Clarity & Intent Confirmation
Goal

Have a natural conversation until we capture all required trip details, and store them as a structured requirements dictionary / JSON.

This is a conversation layer, includes a moderation check, and ends by producing a User Requirements Dictionary (JSON) for downstream processing 

Why each step exists

- Initialize conversation
  Why: the bot needs to guide users into providing the key constraints (destination/dates/budget/etc.), rather than asking everything upfront. 

- Moderation check
  Why: we should include a moderation layer to flag unsafe/sensitive content and discontinue if needed 
 
- Intent confirmation (Yes/No flag)
  Why: the bot needs a deterministic “are we done collecting requirements?” gate

- Output must be JSON/dict
  Why: because we need to programmatically call APIs and score results, not parse free-text; JSON output can be used for parsing convenience and storing requirements as a Python dictionary

**Stage 2** — Product Mapping Layer  & Product Information Extraction Layer


**Stage 3** — Product Recommendation Layer

**Approach:**
- Conversation and Information Gathering: The chatbot follows a natural, guided interaction. User expresses travel intent in free text. The system asks clarifying questions if needed. Once intent is confirmed, recommendations are generated

- Information Extraction: Once the essential information is collected, rule-based functions come into play, extracting top hotels, attractions and customised itinerary that best matches the user's needs.
- Personalized Recommendation: Leveraging this extracted information, the chatbot engages in further dialogue with the user, efficiently addressing their queries and aiding them in finding the perfect Travel itinerary.

### System Design

In [0]:
!pip install -U -q openai tenacity

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
import json
from typing import Tuple, Dict, Any
from openai import OpenAI
from pydantic import BaseModel, Field

In [0]:
!pip install kagglehub

Collecting kagglehub
  Downloading kagglehub-0.4.2-py3-none-any.whl.metadata (38 kB)
Collecting kagglesdk<1.0,>=0.1.14 (from kagglehub)
  Downloading kagglesdk-0.1.15-py3-none-any.whl.metadata (13 kB)
Downloading kagglehub-0.4.2-py3-none-any.whl (69 kB)
Downloading kagglesdk-0.1.15-py3-none-any.whl (160 kB)
Installing collected packages: kagglesdk, kagglehub
Successfully installed kagglehub-0.4.2 kagglesdk-0.1.15
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
import kagglehub

path = kagglehub.dataset_download(
    "cosmox23/popular-tourist-destinations-and-their-features"
)

print("Path to dataset files:", path)

Downloading to /home/spark-cf30f1ea-fa87-4ad7-a340-37/.cache/kagglehub/datasets/cosmox23/popular-tourist-destinations-and-their-features/1.archive...


  0%|          | 0.00/25.8k [00:00<?, ?B/s]100%|██████████| 25.8k/25.8k [00:00<00:00, 42.2MB/s]

Extracting files...
Path to dataset files: /home/spark-cf30f1ea-fa87-4ad7-a340-37/.cache/kagglehub/datasets/cosmox23/popular-tourist-destinations-and-their-features/versions/1





In [0]:
import os

print(os.listdir(path))

['Tourist_Destinations.csv']


**Downloaded Hotels and Attractions data from Kaggle**

In [0]:
import pandas as pd
destinations_csv = f"{path}/Tourist_Destinations.csv"
attractions_df = pd.read_csv(destinations_csv)
print(attractions_df.head())
print(attractions_df.columns)


  Destination Name       Country  ... Annual Visitors (M) UNESCO Site
0    Serene Temple       Morocco  ...                7.45          No
1    Sacred Valley       Germany  ...                1.98          No
2    Serene Temple  South Africa  ...                0.70         Yes
3     Sacred Plaza     Australia  ...                2.24          No
4     Golden Ruins        Mexico  ...                4.60          No

[5 rows x 9 columns]
Index(['Destination Name', 'Country', 'Continent', 'Type',
       'Avg Cost (USD/day)', 'Best Season', 'Avg Rating',
       'Annual Visitors (M)', 'UNESCO Site'],
      dtype='object')


In [0]:
hotels_df = pd.read_csv("/Workspace/Users/rubalpreet.kaur@oportun.com/hotels.csv")

print(hotels_df.head())
print(hotels_df.columns)

                                    Hotel name continent_name  ... info.6 info.7
0                    Shangri-La Hotel, Beijing           Asia  ...    NaN    NaN
1            InterContinental Beijing Sanlitun           Asia  ...    NaN    NaN
2         Holiday Inn Express Beijing Yizhuang           Asia  ...    NaN    NaN
3  Shangri-La China World Summit Wing, Beijing           Asia  ...    NaN    NaN
4                          Kerry Hotel Beijing           Asia  ...    NaN    NaN

[5 rows x 14 columns]
Index(['Hotel name', 'continent_name', 'city_name', 'country_name', 'Price',
       'Rating', 'reviews count', 'info.1', 'info.2', 'info.3', 'info.4',
       'info.5', 'info.6', 'info.7'],
      dtype='object')


In [0]:
attractions_df.columns =  [c.lower().strip().replace(" ", "_") for c in attractions_df.columns]
hotels_df.columns = [c.lower().strip().replace(" ", "_") for c in hotels_df.columns]
attractions_df.columns = [
    c.lower()
     .strip()
     .replace(" ", "_")
     .replace("(", "")
     .replace(")", "")
     .replace("/", "_")
     for c in attractions_df.columns
]



In [0]:
import json
import openai
from openai import OpenAI
import re

In [0]:

# ========== LLM CONFIG ==========
USE_LLM = False   # Set True if you have working OpenAI quota/key
OPENAI_API_KEY = ""  # or set env var OPENAI_API_KEY

if USE_LLM:
    from openai import OpenAI
    client = OpenAI(api_key="")
else:
    client = None

**initialize_conversation()**

In [0]:

def initialize_conversation():
    conversation = []
    system_message = """
    You are a Travel Planner AI.
    Your task is to understand user travel requirements.
    Ask questions until all requirements are captured.
    Do NOT recommend hotels yet.
    """
    conversation.append({"role": "system", "content": system_message})
    return conversation


**moderation_check()**

In [0]:
def moderation_check(message: str) -> bool:
    # Placeholder: always true
    return True

**get_chat_completions()**

In [0]:
def get_chat_completions(conversation):
    # MOCK MODE (no OpenAI API)
    if not USE_LLM:
        # Return last user message echoed or a mock response
        last_user = conversation[-1]["content"]

        # Simple mock for Stage 1 (intent collection)
        if "dictionary" in last_user.lower() or "return" in last_user.lower():
            return json.dumps({
                "destination": "Beijing",
                "budget_per_night": 200,
                "hotel_preferences": ["free wifi", "pool"],
                "interests": ["museum", "food"]
            })

        # Mock for Stage 3 (recommendation)
        return "Here are your recommended hotels and attractions. Type 'exit' if satisfied."

    # REAL OPENAI CALL (only if USE_LLM=True)
    response = client.chat.completions.create(
        model="gpt-4.1-mini",
        messages=conversation,
        temperature=0.2
    )
    return response.choices[0].message.content


**intent_confirmation_layer()**

In [0]:
def intent_confirmation_layer(user_dict: dict):
    required = ["destination", "budget_per_night", "hotel_preferences", "interests"]
    for k in required:
        if k not in user_dict:
            return "no"
    # destination can’t be empty
    if not user_dict.get("destination"):
        return "no"
    return "yes"

**dictionary_present()**

In [0]:
def dictionary_present(response):
    try:
        data = json.loads(response)
        return True, data
    except:
        return False, None


**RUN STAGE 1**

In [0]:
def run_stage1(user_input: str):
    conversation = initialize_conversation()

    if not moderation_check(user_input):
        return None

    conversation.append({"role": "user", "content": user_input})

    # In ShopAssist, this loops until dictionary + intent confirmed
    for _ in range(3):
        assistant_response = get_chat_completions(conversation)
        conversation.append({"role": "assistant", "content": assistant_response})

        present, user_dict = dictionary_present(assistant_response)
        if present and intent_confirmation_layer(user_dict) == "yes":
            return user_dict

        # Ask again (ShopAssist-style)
        conversation.append({
            "role": "user",
            "content": "Please return ONLY a dictionary with destination, budget_per_night, hotel_preferences, interests."
        })

    return None


**Mapping of two datsets**

In [0]:
def enrich_destination(user_req: dict):
    city_to_country = {
        "beijing": "china",
        "paris": "france",
        "london": "united kingdom",
        "rome": "italy",
        "new york": "united states"
    }

    city = user_req.get("destination", "").lower()
    country = city_to_country.get(city)

    user_req["destination_city"] = city.title() if city else None
    user_req["destination_country"] = country.title() if country else None

    return user_req


In [0]:
user_req = enrich_destination(user_req)

### STAGE 2 — PRODUCT MAPPING & VALIDATION

**product_map_layer()**

In [0]:
def product_map_layer_hotels(hotels_df: pd.DataFrame):
    hotel_features = []

    for _, row in hotels_df.iterrows():
        amenities = []
        for i in range(1, 8):
            col = f"info_{i}"
            if col in hotels_df.columns:
                val = row.get(col)
                if isinstance(val, str) and val.strip():
                    amenities.append(val.lower().strip())

        hotel_features.append({
            "name": row.get("hotel_name"),
            "city": row.get("city_name"),
            "country": row.get("country_name"),
            "price": row.get("price"),
            "rating": row.get("rating"),
            "reviews": row.get("reviews_count"),
            "amenities": amenities
        })

    return hotel_features



In [0]:
def product_map_layer_attractions(attractions_df: pd.DataFrame):
    attraction_features = []

    for _, row in attractions_df.iterrows():
        name_val = str(row.get("destination_name", "")).strip()
        country_val = str(row.get("country", "")).strip()
        continent_val = str(row.get("continent", "")).strip()
        type_val = str(row.get("type", "")).strip()

        best_season = str(row.get("best_season", "")).strip()
        unesco = str(row.get("unesco_site", "")).strip()

        # Combine fields into searchable text
        features_text = " ".join([
            type_val,
            best_season,
            unesco
        ]).lower()

        attraction_features.append({
            "name": name_val,
            "country": country_val,
            "continent": continent_val,
            "category": type_val,
            "avg_cost_per_day": row.get("avg_cost_(usd/day)"),
            "avg_rating": row.get("avg_rating"),
            "annual_visitors_m": row.get("annual_visitors_(m)"),
            "features_text": features_text
        })

    return attraction_features


In [0]:
def score_hotel(user_req: dict, hotel_features: list):
    scored = []

    destination_city = user_req.get("destination_city")
    if not destination_city:
        return []

    destination_city = destination_city.lower()
    budget = user_req.get("budget_per_night")
    prefs = [p.lower() for p in user_req.get("hotel_preferences", [])]

    for h in hotel_features:
        # Match hotels by CITY
        if not h.get("city") or str(h["city"]).lower() != destination_city:
            continue

        score = 0

        # Budget check
        if budget is not None and pd.notna(h.get("price")):
            if float(h["price"]) <= float(budget):
                score += 1

        # Rating check
        if pd.notna(h.get("rating")) and float(h["rating"]) >= 4.0:
            score += 1

        # Popularity check
        if pd.notna(h.get("reviews")) and float(h["reviews"]) > 500:
            score += 1

        # Amenities check
        if prefs:
            amenities = h.get("amenities", [])
            if any(p in amenities for p in prefs):
                score += 1

        h2 = dict(h)
        h2["score"] = score
        scored.append(h2)

    scored = sorted(scored, key=lambda x: x["score"], reverse=True)
    return scored[:20]



In [0]:
def compare_attractions_with_user(user_req: dict, attraction_features: list):
    scored = []

    destination_country = user_req.get("destination_country")
    if not destination_country:
        return []

    destination_country = destination_country.lower()
    interests = [i.lower() for i in user_req.get("interests", [])]

    for a in attraction_features:
        # Match attractions by COUNTRY
        if not a.get("country") or str(a["country"]).lower() != destination_country:
            continue

        score = 0

        # Combine searchable text
        hay = f"""
        {a.get('name','')}
        {a.get('category','')}
        {a.get('features_text','')}
        """.lower()

        # Interest matching
        for it in interests:
            if it and it in hay:
                score += 1

        # Quality signal
        if a.get("avg_rating") and a["avg_rating"] >= 4.0:
            score += 1

        # Popularity signal
        if a.get("annual_visitors_m") and a["annual_visitors_m"] >= 1:
            score += 1

        a2 = dict(a)
        a2["score"] = score
        scored.append(a2)

    scored = sorted(scored, key=lambda x: x["score"], reverse=True)
    return scored[:30]



**product_validation()**

In [0]:
def product_validation(scored_items: list, threshold: int = 2, top_k: int = 3):
    validated = [x for x in scored_items if x.get("score", 0) > threshold]
    validated = sorted(validated, key=lambda x: x["score"], reverse=True)
    return validated[:top_k]


### STAGE 3 — RECOMMENDATION

**initialize_conv_reco()**

In [0]:
def initialize_conv_reco():
    conversation = []
    system_message = """
You are TravelPlannerAI (Stage 3: Recommendation).
You will be given:
- user requirements dictionary
- validated hotels
- validated attractions

Return:
1) Recommended hotels (with short reason)
2) Recommended attractions (with short reason)
3) A simple 3-day itinerary using those attractions
4) Ask the user to type 'exit' if satisfied or ask what to refine
"""
    conversation.append({"role": "system", "content": system_message})
    return conversation


### FINAL RECOMMENDATION FLOW

In [0]:
def stage3_recommendation(user_req: dict, hotels: list, attractions: list):
    conversation = initialize_conv_reco()

    conversation.append({
        "role": "user",
        "content": f"""
USER REQUIREMENTS:
{json.dumps(user_req, indent=2)}

VALIDATED HOTELS:
{json.dumps(hotels, indent=2)}

VALIDATED ATTRACTIONS:
{json.dumps(attractions, indent=2)}
"""
    })

    # =========================
    # MOCK MODE (NO OPENAI API)
    # =========================
    if not USE_LLM:
        out = "=== Recommended Hotels ===\n"
        for h in hotels:
            out += (
                f"- {h['name']} "
                f"(rating={h.get('rating')}, price={h.get('price')}, score={h['score']})\n"
            )

        out += "\n=== Recommended Attractions ===\n"
        for a in attractions:
            out += (
                f"- {a['name']} "
                f"({a.get('category')}, rating={a.get('avg_rating')}, "
                f"visitors={a.get('annual_visitors_m')}M, score={a['score']})\n"
            )

        out += "\n=== Sample 3-Day Itinerary ===\n"

        if attractions:
            out += f"Day 1: Visit {attractions[0]['name']} and explore local culture\n"
        else:
            out += "Day 1: City exploration\n"

        if len(attractions) > 1:
            out += f"Day 2: Explore {attractions[1]['name']} and nearby sites\n"
        else:
            out += "Day 2: Cultural landmarks and food\n"

        out += "Day 3: Leisure activities, shopping, and relaxation\n"

        out += "\nType 'exit' if satisfied, or tell me what you want to refine."
        return out

    # =========================
    # REAL LLM MODE
    # =========================
    return get_chat_completions(conversation)


In [0]:
# 1) Stage 1: collect user requirements
user_input = "I want a trip to Beijing, budget 200 per night, prefer free wifi and pool, I like museums and food."
user_req = run_stage1(user_input)

if user_req is None:
    print("Stage 1 failed to capture requirements.")
else:
    #  REQUIRED: enrich destination (city + country)
    user_req = enrich_destination(user_req)

    # Ensure defaults exist (mock safety)
    user_req.setdefault("hotel_preferences", [])
    user_req.setdefault("interests", [])
    user_req.setdefault("budget_per_night", 200)

    # 2) Stage 2: map products (hotels + attractions)
    hotel_features = product_map_layer_hotels(hotels_df)
    attraction_features = product_map_layer_attractions(attractions_df)

    scored_hotels = score_hotel(user_req, hotel_features)
    scored_attractions = compare_attractions_with_user(user_req, attraction_features)

    # Validation thresholds
    validated_hotels = product_validation(scored_hotels, threshold=2, top_k=3)
    validated_attractions = product_validation(scored_attractions, threshold=0, top_k=6)

    # 3) Stage 3: generate final recommendation
    final_answer = stage3_recommendation(user_req, validated_hotels, validated_attractions)
    print(final_answer)



=== Recommended Hotels ===
- Shangri-La Hotel, Beijing (rating=5.0, price=156.6666667, score=3)
- Holiday Inn Express Beijing Yizhuang (rating=5.0, price=68.0, score=3)

=== Recommended Attractions ===
- Hidden Ruins (Historical, rating=5.0, visitors=NoneM, score=1)
- Crystal Plaza (Adventure, rating=4.4, visitors=NoneM, score=1)
- Ancient Pagoda (Historical, rating=4.1, visitors=NoneM, score=1)
- Sacred Pagoda (Nature, rating=4.9, visitors=NoneM, score=1)
- Serene Park (Adventure, rating=4.4, visitors=NoneM, score=1)
- Lush Plaza (City, rating=4.9, visitors=NoneM, score=1)

=== Sample 3-Day Itinerary ===
Day 1: Visit Hidden Ruins and explore local culture
Day 2: Explore Crystal Plaza and nearby sites
Day 3: Leisure activities, shopping, and relaxation

Type 'exit' if satisfied, or tell me what you want to refine.


Please Note: 
During local execution, the OpenAI client was disabled due to quota restrictions.
A mock LLM layer was used to simulate intent extraction and recommendation generation.
The system design, data flow, and multi-stage architecture remain identical to a production deployment