# Travel Planner Chatbot

The Travel Planner Chatbot project is an intelligent, user-driven assistant that recommends Airbnb listings in New York City based on personalized preferences such as price range, room type (e.g., entire home/apt), and neighborhood. Built by using Python, OpenAI’s GPT model, and LangChain’s Retrieval-Augmented Generation (RAG) framework, the chatbot leverages structured CSV data and vector search to match users with relevant, high-quality rental options. It enhances reliability by including only listings with sufficient reviews, high ratings, and confirmed current availability based on both dataset filtering and real-time URL checks.

The final output delivers concise listing suggestions that include pricing, rating, room details, and direct Airbnb links for checking the accomodation details for travelers. While the dataset does not include images, the system is structured to allow clickable listing URLs for users to view photos and book directly. The project demonstrates how AI, embeddings, and structured data can come together to solve real-world planning problems, offering a scalable template for other cities or platforms in future travel applications.

## Dataset: New York Airbnb Open Data 2024

Airbnb listings and metrics in NYC, NY, USA as of 05 January, 2024, which is available at [Kaggle.](https://www.kaggle.com/datasets/vrindakallu/new-york-dataset)

## Setup

In [None]:
# Mounting to Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
cd "your-path-here"

In [None]:
# Install Dependencies
%%capture
!pip install openai faiss-cpu pandas langchain langchain-community tiktoken
!pip install --upgrade openai
!pip install -U langchain-openai

In [None]:
# Import libraries
%%capture
import os
import openai
from openai import OpenAI
from langchain.vectorstores import FAISS
from langchain.docstore.document import Document
from langchain.chains import RetrievalQA
from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI
import pandas as pd
import textwrap
import requests


In [None]:
# Set your OpenAI API key
os.environ['OPENAI_API_KEY'] = 'your-openai-api-key-here'

## Load Data

In [None]:
# Load csv file

file_path = "your-path-here/new_york_listings_2024.csv"   # your path to csv file here
df = pd.read_csv(file_path)

df.columns

Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
       'neighbourhood', 'latitude', 'longitude', 'room_type', 'price',
       'minimum_nights', 'number_of_reviews', 'last_review',
       'reviews_per_month', 'calculated_host_listings_count',
       'availability_365', 'number_of_reviews_ltm', 'license', 'rating',
       'bedrooms', 'beds', 'baths'],
      dtype='object')

In [None]:
# Convert rating column to numeric, forcing errors to NaN
df['rating'] = pd.to_numeric(df['rating'], errors='coerce')

## Set User Preferences

In [None]:
# Ask user preferences
budget_input = input("What's your budget per night in USD? (e.g., 100–300): ").strip()
room_type_input = input("Preferred room type (e.g., entire home/apt, private room): ").strip()
neighbourhood_input = input("Preferred neighborhood (e.g., SoHo, Brooklyn, etc): ").strip()

# Save original for title
budget_display = budget_input.replace(" ", "")  # remove spaces for cleaner display

# Parse budget range for filtering
try:
    budget_min, budget_max = map(float, budget_display.split("–"))
    valid_budget = True
except:
    budget_min, budget_max = 0, 1000
    valid_budget = False
    if not budget_display:
        budget_display = "0–1000"  # Only override if user input was blank or broken

What's your budget per night in USD? (e.g., 100–300): 100-300
Preferred room type (e.g., entire home/apt, private room): entire home/apt
Preferred neighborhood (e.g., SoHo, Brooklyn, etc): Brooklyn


In [None]:
# Filter Listings Based on User Criteria
# Apply filters: price, room type, area, rating, reviews, availability
filtered_df = df[
    (df['price'].between(budget_min, budget_max, inclusive='both')) &
    (df['room_type'].str.lower().str.contains(room_type_input.lower())) &
    (df['neighbourhood'].str.contains(neighbourhood_input, case=False, na=False)) &
    (df['number_of_reviews'] >= 5) &
    (df['rating'] >= 4.0) &
    (df['availability_365'] > 0) &
    (df['name'].notnull()) &
    (df['name'].str.strip() != "")
]

# Sort listings by rating and reviews (descending)
filtered_df = filtered_df.sort_values(by=['rating', 'number_of_reviews'], ascending=[False, False])

In [None]:
# Define Function to Check URL Validity
# Function to check if Airbnb listing is still active

def is_listing_active(url):
    try:
        response = requests.head(url, allow_redirects=True, timeout=5)
        return response.status_code == 200
    except:
        return False

## Build Documents

In [None]:
# Build Verified Listing Descriptions
# Only use listings with active URLs

documents = []

# Limit to top N listings to avoid delay from too many HTTP checks
top_n = 30
checked = 0

for _, row in filtered_df.head(top_n).iterrows():
    listing_id = row.get('id')
    url = f"https://airbnb.com/rooms/{listing_id}" if pd.notna(listing_id) else None

    # Skip if URL is invalid or unavailable
    if not url or not is_listing_active(url):
        continue

    name = row.get('name', 'Unknown Title')
    neighbourhood = row.get('neighbourhood', 'Unknown')
    room_type = row.get('room_type', 'N/A')
    price = row.get('price', 'N/A')
    rating = row.get('rating', 'N/A')
    reviews = int(row.get('number_of_reviews', 0))
    bedrooms = row.get('bedrooms', 'N/A')
    baths = row.get('baths', 'N/A')

    description = (
        f"Highly rated {room_type.lower()} in {neighbourhood}.\n"
        f"{rating}/5 with {reviews} reviews.\n"
        f"{bedrooms} bedroom ·{baths} baths ·${price} per night.\n"
    )

    doc = (
        f"Name: {name}\n"
        f"Neighbourhood: {neighbourhood}\n"
        f"Room Type: {room_type}\n"
        f"Price: ${price} per night\n"
        f"Rating: {rating}/5 ({reviews} reviews)\n"
        f"Bedrooms: {bedrooms} | Baths: {baths}\n"
        f"URL: {url}\n"
        f"Description: {description}"
    )

    documents.append(Document(page_content=doc))
    checked += 1

## Create Embeddings and Vectore Store

In [None]:
embedding_model = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(documents, embedding_model)

## Setup Retrieval QA Chain

In [None]:
llm = ChatOpenAI(model_name='gpt-3.5-turbo-0125', temperature=0.0)
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=vectorstore.as_retriever())

## Recommendations for Traveler

In [None]:
# Build a prompt based on filtered, verified listings
user_query = (
    f"Suggest the best verified Airbnb listings in {neighbourhood_input} "
    f"with a {room_type_input} around ${budget_display} per night. "
    f"Include name, price, rating, bedrooms, baths, and URL."
)

In [None]:
# Run the RAG pipeline
response = qa_chain.invoke(user_query)

# Format heading using user preferences
title_room = room_type_input.title().replace("Apt", "Apartment")
title_neighborhood = neighbourhood_input.title()

# Print clean heading
print(f"**Top-Rated Airbnb Listings for You as {title_room} in "
      f"{title_neighborhood} for the Price Range ${budget_display}:**\n")

# Print only structured result
print(response['result'])

**Top-Rated Airbnb Listings for You as Entire Home/Apartment in Brooklyn for the Price Range $100-300:**

1. Name: Rental unit in Brooklyn · ★4.83
   Neighbourhood: Brooklyn Heights
   Room Type: Entire home/apt
   Price: $100.0 per night
   Rating: 4.83/5 (35 reviews)
   Bedrooms: 1 | Baths: 1
   URL: https://airbnb.com/rooms/4465274

2. Name: Rental unit in Brooklyn · ★4.99
   Neighbourhood: Brooklyn Heights
   Room Type: Entire home/apt
   Price: $130.0 per night
   Rating: 4.99/5 (80 reviews)
   Bedrooms: 1 | Baths: 1
   URL: https://airbnb.com/rooms/40731114
