# Milestone 3: Graph-RAG Implementation

**Theme:** Hotel

**Task:** Hybrid (Hotel Recommendation + Visa Assistant)

**Retrieval Approach:** Knowledge Graph + Embeddings (Graph-RAG)

---

## Table of Contents

1. [Part 1: Input Preprocessing](#part1)
   - [1.a Intent Classification](#part1a)
   - [1.b Entity Extraction](#part1b)
   - [1.c Input Embedding](#part1c)
2. [Part 2: Graph Retrieval + Experiments](#part2)
   - [2.a Baseline (Cypher Queries)](#part2a)
   - [2.b Embeddings-Based Retrieval (2 model comparison)](#part2b)
3. [Part 3: LLM Layer + Experiments](#part3)
   - [3.a Context Construction](#part3a)
   - [3.b Prompt Engineering](#part3b)
   - [3.c LLM Comparison (3 models)](#part3c)
4. [Part 4: UI + Full Pipeline Demo](#part4)
   - [4.a Streamlit Interface](#part4a)
   - [4.b End-to-End Demonstration](#part4b)

---
<a id='part1'></a>
# Part 1: Input Preprocessing

This section preprocesses natural language user queries into structured data for graph retrieval.

**Three components:**

**1.a Intent Classification** - Determines user's goal using action-based intents (LIST_HOTELS, RECOMMEND_HOTEL, DESCRIBE_HOTEL, COMPARE_HOTELS, CHECK_VISA)

**1.b Entity Extraction** - Extracts specific information (hotel names, cities, countries, traveler demographics, preferences)

**1.c Input Embedding** - Converts user query into vector representation for semantic similarity search (Part 2.b)

<a id='part1a'></a>
## 1.a Intent Classification

### Purpose
Classify user queries into specific intent categories to determine:
- What type of information the user needs
- Which Cypher queries to execute
- How to structure the response

### Design Decision: LLM-Based Classification

We use an **LLM-based approach** (OpenAI GPT-4o-mini) because:
- Handles natural language variations effectively
- Understands context and nuance
- Can apply complex tie-breaking rules
- Flexible for conversational queries

### Defined Intents

We define **5 action-based intents** that represent user goals:

| Intent | Purpose | Keywords | Example Queries |
|--------|---------|----------|----------------|
| **`LIST_HOTELS`** | Find multiple hotels matching filters | "show", "find", "list" | - "Show me hotels in Paris"<br>- "Find 5-star hotels in Dubai" |
| **`RECOMMEND_HOTEL`** | Get personalized suggestions or best options | "recommend", "suggest", "best", "top" | - "Recommend a hotel for families in Cairo"<br>- "What's the best cheap hotel in Rome?" |
| **`DESCRIBE_HOTEL`** | Get detailed info about a specific hotel | hotel name mentioned | - "Tell me about Hilton Cairo"<br>- "What is The Azure Tower like?" |
| **`COMPARE_HOTELS`** | Compare two or more hotels | "compare", "vs", "which is better" | - "Compare Hilton and Marriott in Cairo"<br>- "Which is better: Hotel A or Hotel B?" |
| **`CHECK_VISA`** | Visa requirements between countries | "visa", "entry requirement" | - "Do Egyptians need a visa for Turkey?"<br>- "Visa rules from UK to UAE" |

---

### Intent Design Philosophy

- **Action-based naming** — Each intent name describes what the user wants to DO
- **Clear semantic boundaries** — LIST (neutral) vs RECOMMEND (opinion) vs DESCRIBE (specific)
- **Explicit tie-breaking rules** — Clear keyword priorities reduce confusion

---

### Out-of-Scope Handling

If the query does not match any of the 5 intents  
(e.g., *"What's the weather?"*, *"Tell me a joke"*),  
the classifier returns `None`, and the system responds:

> **"I cannot help with this request. I can assist with hotel search, recommendations, visa information, and hotel comparisons."**

In [1]:
%pip install openai python-dotenv

Collecting openai
  Downloading openai-2.11.0-py3-none-any.whl.metadata (29 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting jiter<1,>=0.10.0 (from openai)
  Downloading jiter-0.12.0-cp313-cp313-win_amd64.whl.metadata (5.3 kB)
Downloading openai-2.11.0-py3-none-any.whl (1.1 MB)
   ---------------------------------------- 0.0/1.1 MB ? eta -:--:--
   ------------------- -------------------- 0.5/1.1 MB 3.9 MB/s eta 0:00:01
   ---------------------------------------- 1.1/1.1 MB 3.8 MB/s eta 0:00:00
Downloading distro-1.9.0-py3-none-any.whl (20 kB)
Downloading jiter-0.12.0-cp313-cp313-win_amd64.whl (204 kB)
Installing collected packages: jiter, distro, openai

   -------------------------- ------------- 2/3 [openai]
   -------------------------- ------------- 2/3 [openai]
   -------------------------- ------------- 2/3 [openai]
   -------------------------- ------------- 2/3 [openai]
   -------------------------- --------


[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
from typing import Dict, List, Any, Optional
import os
from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables
load_dotenv()

print("Libraries imported successfully")

Libraries imported successfully


### Implementation: LLM-Based Intent Classifier with OpenAI

The classifier uses a structured prompt with:
- Clear intent definitions
- Positive and negative examples for each intent
- Explicit tie-breaking rules to reduce ambiguity

In [3]:
class IntentClassifier:
    
    def __init__(self):
        # Load OpenAI API key
        self.api_key = os.getenv("OPENAI_API_KEY")
        if not self.api_key:
            raise ValueError("OPENAI_API_KEY not found in environment variables")
        
        # Initialize OpenAI client
        self.client = OpenAI(api_key=self.api_key)
        
        # Model configuration
        self.model = "gpt-4o-mini"  # Fast and cost-effective for classification
        
        # Define intents with descriptions
        self.intents = {
            "LIST_HOTELS": "Find multiple hotels matching filters (neutral listing)",
            "RECOMMEND_HOTEL": "Get personalized suggestions or best options (opinion/advice)",
            "DESCRIBE_HOTEL": "Get detailed information about one specific hotel",
            "COMPARE_HOTELS": "Compare two or more specific hotels",
            "CHECK_VISA": "Check visa requirements between countries"
        }
    
    def classify(self, user_query: str) -> Optional[str]:
        """
        Classify user query into one of 5 intents using LLM.
        Returns intent name or None if out-of-scope.
        """
        prompt = self._build_prompt(user_query)
        
        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": "You are a precise intent classifier for a hotel search system. Return ONLY the intent name, nothing else."},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.0,
                max_tokens=20
            )
            
            intent = response.choices[0].message.content.strip().upper()
            
            # Validate response
            if intent in self.intents.keys():
                return intent
            elif intent == "NONE":
                return None
            else:
                print(f"Warning: OpenAI returned unexpected value '{intent}', treating as out-of-scope")
                return None
                
        except Exception as e:
            print(f"Error calling OpenAI API: {e}")
            return None
    
    def _build_prompt(self, user_query: str) -> str:
        """Build the classification prompt with clear examples and tie-breaking rules."""
        
        return f"""Classify the following user query into ONE of these intents, or return "NONE" if it doesn't match any:

         ═══════════════════════════════════════════════════════════════════════════════

         **1. LIST_HOTELS**
            Purpose: User wants a neutral list of hotels matching filters
            Keywords: "show", "find", "list", "search for", "hotels in"
            
            Examples:
            - "Show me hotels in Paris"
            - "Find hotels in Dubai"
            - "List 5-star hotels in Rome"
            - "Hotels near the city center in Cairo"
            - "Search for hotels in Berlin"

         ═══════════════════════════════════════════════════════════════════════════════

         **2. RECOMMEND_HOTEL**
            Purpose: User wants advice, suggestions, or "best" options
            Keywords: "recommend", "suggest", "best", "top", "which should I", "what do you recommend"
            Can include personal preferences or demographics
            
            Examples:
            - "Recommend a hotel for families in Dubai"
            - "What's the best cheap hotel in Paris?"
            - "I'm 25 traveling solo, suggest a hotel in Cairo"
            - "Top hotels in Rome"
            - "Which hotel should I book in Berlin?"
            - "Suggest a romantic hotel for couples"

         ═══════════════════════════════════════════════════════════════════════════════

         **3. DESCRIBE_HOTEL**
            Purpose: Get detailed information about ONE specific hotel
            Keywords: "tell me about", "what is", "information on", "describe"
            Must mention a specific hotel name
            
            Examples:
            - "Tell me about Hilton Cairo"
            - "What is The Azure Tower like?"
            - "Information about Marriott Dubai"
            - "Describe the Grand Hotel Paris"
            - "What can you tell me about Hotel Adlon?"

         ═══════════════════════════════════════════════════════════════════════════════

         **4. COMPARE_HOTELS**
            Purpose: Compare TWO or MORE specific hotels
            Keywords: "compare", "vs", "versus", "which is better", "difference between"
            Must mention multiple hotels
            
            Examples:
            - "Compare Hilton Cairo and Marriott Cairo"
            - "Which is better: The Azure Tower or Nile Grandeur?"
            - "Hotel A vs Hotel B in Paris"
            - "Difference between Sheraton and Radisson in Dubai"
            - "Compare these three hotels: X, Y, and Z"

         ═══════════════════════════════════════════════════════════════════════════════

         **5. CHECK_VISA**
            Purpose: Check visa requirements between countries
            Keywords: "visa", "entry requirement", "travel document", "do I need"
            
            Examples:
            - "Do Egyptians need a visa for Turkey?"
            - "Visa requirements from UK to UAE"
            - "Do Indians need a visa to visit Germany?"
            - "What are the entry requirements for France?"
            - "Travel document requirements from US to Japan"

         ═══════════════════════════════════════════════════════════════════════════════

         **TIE-BREAKING RULES (VERY IMPORTANT):**

         1. If the query contains ANY of: "recommend", "suggest", "best", "top", "which should I"
            -> ALWAYS choose RECOMMEND_HOTEL (never LIST_HOTELS)

         2. If the query mentions personal info (age, traveler type: "family", "solo", "couple", "business")
            -> Choose RECOMMEND_HOTEL

         3. If the query mentions TWO or MORE hotel names
            -> Choose COMPARE_HOTELS (not DESCRIBE_HOTEL)

         4. If the query mentions ONLY ONE specific hotel name
            -> Choose DESCRIBE_HOTEL (not LIST_HOTELS)

         5. If the query is about "visa" or "entry requirements"
            -> Choose CHECK_VISA

         6. If the query uses only neutral search words ("show", "find", "list") WITHOUT opinion keywords
            -> Choose LIST_HOTELS

         7. If the query is COMPLETELY unrelated to hotels or travel
            -> Return NONE

         ═══════════════════════════════════════════════════════════════════════════════

         **User Query:** "{user_query}"

         **Your Response (return ONLY ONE of these):**
         LIST_HOTELS
         RECOMMEND_HOTEL
         DESCRIBE_HOTEL
         COMPARE_HOTELS
         CHECK_VISA
         NONE
         """

print("IntentClassifier class defined")

IntentClassifier class defined


### Testing Intent Classification

We test all 5 intents plus out-of-scope queries to validate the classifier's accuracy.

In [4]:
# Initialize classifier
classifier = IntentClassifier()

# Comprehensive test cases: (query, expected_intent)
test_cases = [
    # ===== LIST_HOTELS - neutral searching =====
    ("Show me hotels in Paris", "LIST_HOTELS"),
    ("Find 5-star hotels in Dubai", "LIST_HOTELS"),
    ("List hotels near the beach in Rome", "LIST_HOTELS"),
    ("Hotels in Cairo", "LIST_HOTELS"),
    ("Search for hotels in Berlin city center", "LIST_HOTELS"),
    
    # ===== RECOMMEND_HOTEL - advice/opinion/personalization =====
    ("Recommend a hotel for families in Dubai", "RECOMMEND_HOTEL"),
    ("What's the best cheap hotel in Paris?", "RECOMMEND_HOTEL"),
    ("I'm 25 traveling solo, suggest a clean hotel in Cairo", "RECOMMEND_HOTEL"),
    ("Top hotels in Rome", "RECOMMEND_HOTEL"),
    ("Which hotel should I book in Berlin?", "RECOMMEND_HOTEL"),
    ("Suggest a romantic hotel for our honeymoon", "RECOMMEND_HOTEL"),
    ("Best hotel for business travelers in London", "RECOMMEND_HOTEL"),
    
    # ===== DESCRIBE_HOTEL - single hotel info =====
    ("Tell me about Hilton Cairo", "DESCRIBE_HOTEL"),
    ("What is The Azure Tower like?", "DESCRIBE_HOTEL"),
    ("Information about Marriott Dubai", "DESCRIBE_HOTEL"),
    ("Describe the Grand Hotel Paris", "DESCRIBE_HOTEL"),
    
    # ===== COMPARE_HOTELS - compare multiple hotels =====
    ("Compare Hilton Cairo and Marriott Cairo", "COMPARE_HOTELS"),
    ("Which is better: The Azure Tower or Nile Grandeur?", "COMPARE_HOTELS"),
    ("Hotel A vs Hotel B in Paris", "COMPARE_HOTELS"),
    ("Difference between Sheraton and Radisson in Dubai", "COMPARE_HOTELS"),
    
    # ===== CHECK_VISA - visa requirements =====
    ("Do Egyptians need a visa for Turkey?", "CHECK_VISA"),
    ("Visa requirements from UK to UAE", "CHECK_VISA"),
    ("Do Indians need a visa to visit Germany?", "CHECK_VISA"),
    ("What are the entry requirements for France from USA?", "CHECK_VISA"),
    
    # ===== Out of scope - should return None =====
    ("What's the weather like today?", None),
    ("Tell me a joke", None),
    ("How do I cook pasta?", None),
    ("What's the capital of France?", None),
]

print("=" * 100)
print("TESTING INTENT CLASSIFICATION")
print("=" * 100)

num_total = len(test_cases)
num_correct = 0
failures = []

for i, (query, expected) in enumerate(test_cases, 1):
    predicted = classifier.classify(query)
    is_correct = (predicted == expected)
    
    if is_correct:
        num_correct += 1
        status = "[PASS]"
    else:
        status = "[FAIL]"
        failures.append((query, expected, predicted))
    
    # Format output
    print(f"\n[{i}/{num_total}] {status}")
    print(f"Query:     {query}")
    print(f"Expected:  {expected}")
    print(f"Predicted: {predicted}")
    
    if predicted is None:
        print(f"System Response: 'I cannot help with this request. I can assist with hotel search, recommendations, visa information, and hotel comparisons.'")

print("\n" + "=" * 100)
print(f"SUMMARY: {num_correct}/{num_total} correct ({(num_correct / num_total) * 100:.1f}% accuracy)")
print("=" * 100)

if failures:
    print(f"\nFAILED CASES ({len(failures)}):")
    for query, expected, predicted in failures:
        print(f"  - '{query}'")
        print(f"    Expected: {expected}, Got: {predicted}")
else:
    print("\nALL TESTS PASSED!")

TESTING INTENT CLASSIFICATION

[1/28] [PASS]
Query:     Show me hotels in Paris
Expected:  LIST_HOTELS
Predicted: LIST_HOTELS

[2/28] [PASS]
Query:     Find 5-star hotels in Dubai
Expected:  LIST_HOTELS
Predicted: LIST_HOTELS

[3/28] [PASS]
Query:     List hotels near the beach in Rome
Expected:  LIST_HOTELS
Predicted: LIST_HOTELS

[4/28] [PASS]
Query:     Hotels in Cairo
Expected:  LIST_HOTELS
Predicted: LIST_HOTELS

[5/28] [PASS]
Query:     Search for hotels in Berlin city center
Expected:  LIST_HOTELS
Predicted: LIST_HOTELS

[6/28] [PASS]
Query:     Recommend a hotel for families in Dubai
Expected:  RECOMMEND_HOTEL
Predicted: RECOMMEND_HOTEL

[7/28] [PASS]
Query:     What's the best cheap hotel in Paris?
Expected:  RECOMMEND_HOTEL
Predicted: RECOMMEND_HOTEL

[8/28] [PASS]
Query:     I'm 25 traveling solo, suggest a clean hotel in Cairo
Expected:  RECOMMEND_HOTEL
Predicted: RECOMMEND_HOTEL

[9/28] [PASS]
Query:     Top hotels in Rome
Expected:  RECOMMEND_HOTEL
Predicted: RECOMMEND_HO