# Milestone 3: Graph-RAG Implementation

**Theme:** Hotel

**Task:** Hybrid (Hotel Recommendation + Visa Assistant)

**Retrieval Approach:** Knowledge Graph + Embeddings (Graph-RAG)

---

## Table of Contents

1. [Part 1: Input Preprocessing](#part1)
   - [1.a Intent Classification](#part1a)
   - [1.b Entity Extraction](#part1b)
   - [1.c Input Embedding](#part1c)
2. [Part 2: Graph Retrieval + Experiments](#part2)
   - [2.a Baseline (Cypher Queries)](#part2a)
   - [2.b Embeddings-Based Retrieval (2 model comparison)](#part2b)
3. [Part 3: LLM Layer + Experiments](#part3)
   - [3.a Context Construction](#part3a)
   - [3.b Prompt Engineering](#part3b)
   - [3.c LLM Comparison (3 models)](#part3c)
4. [Part 4: UI + Full Pipeline Demo](#part4)
   - [4.a Streamlit Interface](#part4a)
   - [4.b End-to-End Demonstration](#part4b)

---
<a id='part1'></a>
# Part 1: Input Preprocessing

This section preprocesses natural language user queries into structured data for graph retrieval.

**Three components:**

**1.a Intent Classification** - Determines what the user wants (recommendation, visa info, comparison, search)

**1.b Entity Extraction** - Extracts specific information (hotel names, cities, countries, traveler demographics, preferences)

**1.c Input Embedding** - Converts user query into vector representation for semantic similarity search (Part 2.b)

<a id='part1a'></a>
## 1.a Intent Classification

### Purpose
Classify user queries into specific intent categories to determine:
- What type of information the user needs
- Which Cypher queries to execute
- How to structure the response

### Design Decision: LLM-Based Classification

We use an **LLM-based approach** because:
- Handles natural language variations effectively
- Understands context and nuance
- Flexible for complex, conversational queries

### Defined Intents

We define **4 intents** that represent user goals:

| Intent | Purpose | Example Queries |
|--------|---------|----------------|
| `recommendation` | Personalized hotel suggestions | "I'm 20 and travelling alone, recommend a clean hotel in Cairo"<br>"We're a family with kids, suggest hotels in Dubai" |
| `visa_info` | Visa requirements between countries | "Do Egyptians need a visa for Turkey?"<br>"What are visa rules from UK to UAE?" |
| `hotel_comparison` | Compare two specific hotels | "Compare Hilton Cairo and Marriott Cairo"<br>"Which is better for families: Hotel A or Hotel B?" |
| `general_search` | Standard hotel search and filtering | "Hotels in Paris"<br>"Cheap hotels in Rome"<br>"5-star hotels in Dubai" |

**Intent Design:**
- `recommendation` - Personalized queries with user demographics
- `visa_info` - Travel planning (NEEDS_VISA relationships)
- `hotel_comparison` - Decision-making between specific hotels
- `general_search` - Standard filtering (price, rating, location, amenities)

**Out-of-Scope Handling:**
If the query doesn't match any of the 4 intents (e.g., "What's the weather?", "Tell me a joke"), the classifier returns `None`, and the system responds: **"I do not have the answer for your request"**

In [8]:
%pip install google-generativeai

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [9]:
from typing import Dict, List, Any, Optional
import os
import json
import google.generativeai as genai

print("Libraries imported successfully")

Libraries imported successfully


### Implementation: Intent Classifier with Google Gemini

In [None]:
class IntentClassifier:
    
    def __init__(self):
        self.api_key = 'AIzaSyDn1FBKRJEGks9xWodx6BxD7zFG2fLZ1Cg'
        
        if not self.api_key:
            raise ValueError("GEMINI_API_KEY not found.")
        
        genai.configure(api_key=self.api_key)
        self.model = genai.GenerativeModel('gemini-flash-lite-latest')
        
        self.intents = {
            'recommendation': 'Personalized hotel suggestions based on user profile',
            'visa_info': 'Visa requirements between countries',
            'hotel_comparison': 'Compare two specific hotels',
            'general_search': 'Standard hotel search and filtering'
        }
    
    def classify(self, user_query: str) -> Optional[str]:
        prompt = self._build_prompt(user_query)
        
        try:
            response = self.model.generate_content(prompt)
            intent = response.text.strip().lower()
            
            if intent in self.intents:
                return intent
            elif intent == 'none':
                return None
            else:
                print(f"Warning: Gemini returned unexpected value '{intent}', treating as out-of-scope")
                return None
                
        except Exception as e:
            print(f"Error calling Gemini API: {e}")
            return None
    
    def _build_prompt(self, user_query: str) -> str:
        prompt = f"""Classify the following user query into ONE of these intents, or return 'none' if it doesn't match any:

        1. recommendation - User wants personalized hotel suggestions based on their profile/preferences
        Examples: "I'm 20 and travelling alone, recommend a hotel", "We're a family with kids, suggest hotels"

        2. visa_info - User asks about visa requirements between countries
        Examples: "Do Egyptians need a visa for Turkey?", "Visa rules from UK to UAE"

        3. hotel_comparison - User wants to compare two specific hotels
        Examples: "Compare Hilton and Marriott", "Which is better: Hotel A or Hotel B?"

        4. general_search - User wants to search/filter hotels by location, price, rating, amenities
        Examples: "Hotels in Paris", "Cheap hotels in Rome", "5-star hotels in Dubai"

        If the query is unrelated to hotels, travel, or visa information (e.g., weather, jokes, cooking), return 'none'.

        User Query: "{user_query}"

        Return ONLY one of: recommendation, visa_info, hotel_comparison, general_search, or none.
        No explanation, no punctuation, just the label."""
        return prompt

    print("IntentClassifier with Gemini defined")

IntentClassifier with Gemini defined


### Testing Intent Classification

In [None]:
classifier = IntentClassifier()

# Test queries for each intent
test_queries = [
    # recommendation
    "I'm 20 and travelling alone, recommend a clean hotel in Cairo",
    "We're a family with kids, suggest hotels in Dubai",
    
    # visa_info
    "Do Egyptians need a visa for Turkey?",
    "Do Indians need a visa to visit Germany?",
    
    # hotel_comparison
    "Compare Hilton Cairo and Marriott Cairo",
    "Which is better for families: The Azure Tower or Nile Grandeur?",
    
    # general_search
    "Hotels in Paris",
    "5-star hotels in Dubai",
    
    # Out of scope (should return None)
    "What's the weather like today?",
    "Tell me a joke"
]

print("Testing Intent Classification:")
print("="*80)
for query in test_queries:
    intent = classifier.classify(query)

    if intent is None:
        response_note = " -> System Response: 'I do not have the answer for your request'"
    else:
        response_note = ""
    
    print(f"\nQuery: {query}")
    print(f"Intent: {intent}{response_note}")

Testing Intent Classification:

Query: I'm 20 and travelling alone, recommend a clean hotel in Cairo
Intent: recommendation

Query: We're a family with kids, suggest hotels in Dubai
Intent: recommendation

Query: Do Egyptians need a visa for Turkey?
Intent: visa_info

Query: Do Indians need a visa to visit Germany?
Intent: visa_info

Query: Compare Hilton Cairo and Marriott Cairo
Intent: hotel_comparison

Query: Which is better for families: The Azure Tower or Nile Grandeur?
Intent: hotel_comparison

Query: Hotels in Paris
Intent: general_search

Query: 5-star hotels in Dubai
Intent: general_search

Query: What's the weather like today?
Intent: None -> System Response: 'I do not have the answer for your request'

Query: Tell me a joke
Intent: None -> System Response: 'I do not have the answer for your request'


### Summary: Part 1.a

**Accomplished:**
- Defined 4 intents representing user goals
- Implemented Gemini-based classifier
- Handles out-of-scope queries with None
- Tested successfully

**Next:** Part 1.b Entity Extraction

### Implementation: Entity Extractor