<a href="https://colab.research.google.com/github/KaifAhmad1/code-test/blob/main/CommVersion_Assignment_Mohd_Kaif.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **RealtyFlow AI: Intelligent Real Estate Chatbot**

RealtyFlow AI: Intelligent Real Estate Chatbot
---------------------------------------------
A complete implementation of a conversational AI agent for real estate businesses
that qualifies leads by collecting necessary information based on whether they
want to buy or sell property.

**RealtyFlow AI** is an intelligent chatbot designed to streamline initial customer interactions for real estate businesses. Built using LangGraph for state management, Google Gemini for natural language understanding and generation, and FAISS for efficient postcode similarity searching, it guides users through a predefined decision tree to understand their intent (buy/sell), budget, and location preferences.

The chatbot automates lead qualification, provides instant responses 24/7, and ensures consistent information gathering, ultimately enhancing agent productivity and improving the customer experience.

### **Core Functionality & Flow**

The chatbot operates based on a structured decision tree:

1.  **Initial Greeting & Intent Capture:**
    *   Bot: "Home. How may I help you? (e.g., 'I want to buy', 'I'm looking to sell')"
    *   User provides input (e.g., "I want to buy a house").
    *   **Gemini (Intent Classifier):** Determines if the user wants to "buy" or "sell".

2.  **Contact Information Gathering:**
    *   Bot: "Great. Can I get your name?" -> User provides name.
    *   Bot: "Can I get your phone number?" -> User provides phone.
    *   Bot: "Can I get your email address?" -> User provides email.

3.  **Path Divergence (Buy vs. Sell):**

    *   **If Intent is "Buy":**
        *   Bot: "Are you looking for a new home or a re-sale home?"
        *   **Gemini (Buy Type Classifier):** Determines "new\_home" or "re\_sale".
        *   Bot: "What is your budget?"
        *   **Budget Processor Tool:** Parses the budget amount.
            *   **Rule (New Home):** If budget < £1,000,000 for a "new\_home", the bot informs the user about the minimum budget and ends the interaction with a referral to call the office.
        *   Bot (if budget is sufficient or re-sale): "Can I know the postcode of your location of interest?"
        *   **Postcode Processor Tool (Exact Match & FAISS):**
            *   Checks if the normalized postcode is in the pre-approved list.
            *   If not an exact match, FAISS suggests similar valid postcodes.
            *   **Rule (New Home, Postcode Not Covered):** Informs the user and ends with a referral.
            *   **Rule (Postcode Covered or Re-sale/Sell with Uncovered Postcode):** Proceeds to reassistance.

    *   **If Intent is "Sell":**
        *   (After contact info) Bot: "What is your postcode?"
        *   **Postcode Processor Tool (Exact Match & FAISS):**
            *   Checks eligibility. FAISS suggests similar if no exact match.
            *   Proceeds to reassistance regardless of coverage (as per flowchart, only a message change).

4.  **Outcome & Reassistance:**

    *   **If Postcode Covered (for any valid path):**
        *   Bot: "Great! That postcode is covered. I can expect someone to get in touch with you within 24 hours via phone or email. Is there anything else I can help you with? (yes/no)"
    *   **If Postcode NOT Covered (for Sell or Re-sale Buy path):**
        *   Bot: "Sorry, we don't cater to the Post code '{postcode}' that you provided. (Did you perhaps mean {suggestion}?) Please call the office on {phone\_number} to get help. Is there anything else I can help you with? (yes/no)"

5.  **Handling Reassistance Choice:**
    *   **Gemini (Yes/No Classifier):** Determines if the user said "yes" or "no".
    *   If "yes": Bot: "Okay, let's start over." (Restarts the flow).
    *   If "no": Bot: "Thank you for chatting with us. Good bye." (Ends conversation).
    *   If unclear: Re-prompts for yes/no.

6.  **Error/Max Attempts Handling:**
    *   If the bot fails to understand an input after 3 attempts, it politely ends the conversation and refers the user to call the office.

### **Technical Stack & Components**

*   **Orchestration:** `LangGraph` (manages the conversational state and flow between nodes).
*   **Language Model (LLM):** `Google Gemini` (via `ChatGoogleGenerativeAI` from `langchain-google-genai`) for:
    *   Intent classification (buy/sell, new\_home/re\_sale, yes/no).
    *   Generating conversational responses.
*   **Embeddings:** `GoogleGenerativeAIEmbeddings` (model: `models/embedding-001`) for converting postcodes into vector representations.
*   **Vector Search:** `FAISS` (Facebook AI Similarity Search) for:
    *   Building an index of eligible postcode embeddings.
    *   Finding the most similar eligible postcodes if a user's input isn't an exact match (useful for typo correction/suggestions).
*   **Data Handling:** `pandas` for loading the initial list of eligible postcodes.
*   **Core Logic:** Python functions and classes acting as "tools" or "agents":
    *   `IntentClassifierAgent`: Wraps Gemini calls for classification tasks.
    *   `BudgetProcessorTool`: Parses budget strings into numerical values.
    *   `PostcodeProcessorTool`: Handles exact postcode validation and FAISS-based similarity search.
*   **State Management:** A `TypedDict` (`ChatbotState`) holds all relevant information during the conversation (history, extracted details, pending actions).

### **Usefulness & Problem Solving**

**RealtyFlow AI** is particularly useful for:

*   **Real Estate Agencies & Property Developers:** To automate initial customer engagement and lead qualification.

**In essence, RealtyFlow AI streamlines the top of the sales funnel, enhances agent productivity, improves customer experience, and helps reduce operational costs by acting as an intelligent, automated front-desk for real estate businesses.**

In [None]:
!pip install -q langchain langgraph langchain-google-genai google-generativeai pandas faiss-cpu tiktoken python-dotenv

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.5/43.5 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m151.1/151.1 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m58.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m56.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.3/42.3 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.6/47.6 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.8/194.8 kB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [28]:
import os
import re
from typing import TypedDict, List, Optional, Literal, Set, Union, Tuple
import pandas as pd
import numpy as np
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_core.messages import HumanMessage, AIMessage, BaseMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
from langgraph.graph import StateGraph, END
import faiss

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [39]:
# Load environment variables
load_dotenv()

# Constants
MIN_BUDGET_NEW_HOME = 1_000_000
COMPANY_PHONE_NUMBER = "1800 111 222"
POSTCODE_FILE = "/content/drive/MyDrive/uk_postcodes 1.csv"

# Set Google API Key
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY", "AIzaSyD9ljvMl4t9ucEnQpi3RfAJsoCgViE7O9Q")

In [40]:
# ==================== STATE DEFINITION ====================
class ChatbotState(TypedDict):
    """
    Represents the state of the chatbot conversation.
    """
    messages: List[BaseMessage]
    main_intent: Optional[Literal["buy", "sell"]]
    buy_type: Optional[Literal["new_home", "re_sale"]]
    name: Optional[str]
    phone: Optional[str]
    email: Optional[str]
    budget: Optional[float]
    budget_raw: Optional[str]
    postcode: Optional[str]
    postcode_raw: Optional[str]
    postcode_covered: Optional[bool]
    suggested_postcode: Optional[str]
    pending_action: str
    attempts: int
    max_attempts: int
    conversation_ended: bool
    error_message: Optional[str]

In [42]:
# ==================== UTILITY FUNCTIONS ====================
def normalize_postcode(postcode: str) -> str:
    """
    Normalize postcode format by removing spaces and converting to uppercase.

    Args:
        postcode (str): The input postcode

    Returns:
        str: Normalized postcode
    """
    return str(postcode).upper().replace(" ", "")

def load_eligible_postcodes(file_path: str) -> Tuple[Set[str], List[str]]:
    """
    Load the list of eligible postcodes from a CSV file.

    Args:
        file_path (str): Path to the postcode CSV file

    Returns:
        Tuple[Set[str], List[str]]: A set and list of normalized postcodes
    """
    try:
        # First check if file exists
        if not os.path.exists(file_path):
            print(f"Warning: Postcode file not found at {file_path}. Using sample data.")
            # Create sample data for demonstration
            sample_postcodes = ["SW1A1AA", "SW1A2AA", "W1A1AA", "E1W3SS", "NW10HE"]
            return set(sample_postcodes), sample_postcodes

        # Read the CSV file
        df = pd.read_csv(file_path)

        # Check if 'Postcode' column exists
        postcode_col = next((col for col in df.columns if col.lower() == 'postcode'), None)
        if not postcode_col:
            print(f"Warning: No 'Postcode' column found in {file_path}. Columns: {df.columns.tolist()}")
            print("Using first column as postcode data.")
            postcode_col = df.columns[0]

        # Extract and normalize postcodes
        postcode_list = [normalize_postcode(pc) for pc in df[postcode_col] if pd.notna(pc) and str(pc).strip()]
        postcode_set = set(postcode_list)

        print(f"Successfully loaded {len(postcode_set)} unique postcodes.")
        return postcode_set, postcode_list

    except Exception as e:
        print(f"Error loading postcodes: {e}")
        # Create fallback sample data
        sample_postcodes = ["SW1A1AA", "SW1A2AA", "W1A1AA", "E1W3SS", "NW10HE"]
        return set(sample_postcodes), sample_postcodes

In [44]:
# ==================== LLM SETUP ====================
def setup_llm_backend(provider: str = "google"):
    """
    Initialize the LLM based on the specified provider.

    Args:
        provider (str): The LLM provider ("google" or "openai")

    Returns:
        Tuple: (llm_instance, embedding_model_instance)
    """
    load_dotenv()  # Load environment variables

    if provider.lower() == "google":
        # Set Google API Key from environment variable
        google_api_key = os.getenv("GOOGLE_API_KEY")

        if not google_api_key:
            raise ValueError("GOOGLE_API_KEY not found in environment variables")

        # Initialize Google Gemini LLM
        llm = ChatGoogleGenerativeAI(
            model="gemini-1.5-flash-latest",
            temperature=0.1,
            google_api_key=google_api_key,
            convert_system_message_to_human=True
        )

        # Initialize Google Gemini Embeddings
        embedding_model = GoogleGenerativeAIEmbeddings(
            model="models/embedding-001",
            google_api_key=google_api_key
        )

        print("Google Gemini LLM and Embeddings initialized.")

    return llm, embedding_model

In [45]:
# ==================== TOOLS AND AGENTS ====================
class IntentClassifier:
    """
    Tool for classifying user intents using the LLM.
    """

    def __init__(self, llm):
        """
        Initialize with an LLM instance.

        Args:
            llm: A LangChain LLM instance
        """
        self.llm = llm

    def _create_classifier_chain(self, allowed_options: List[str], task_description: str):
        """
        Create a chain to classify user input into predefined categories.

        Args:
            allowed_options (List[str]): List of allowed classification options
            task_description (str): Description of the classification task

        Returns:
            Chain: A LangChain chain for classification
        """
        prompt_text = f"""Your task is to classify the user's message.
Task Description: {task_description}
Allowed Categories: {', '.join(allowed_options)}.
Based on the user's message, respond with ONLY ONE of the allowed category names.
Do not add any other text, explanation, or punctuation.
If the user's intent is unclear, ambiguous, or doesn't fit any category, respond with "unknown".

User message: {{user_message}}
Classification:"""

        prompt = ChatPromptTemplate.from_template(prompt_text)
        return prompt | self.llm | StrOutputParser()

    def get_main_intent(self, user_message: str) -> Literal["buy", "sell", "unknown"]:
        """
        Classify whether user wants to buy or sell a property.

        Args:
            user_message (str): The user's input message

        Returns:
            str: "buy", "sell", or "unknown"
        """
        chain = self._create_classifier_chain(
            ["buy", "sell"],
            "Determine if the user wants to 'buy' or 'sell' a property."
        )
        return chain.invoke({"user_message": user_message}).strip().lower()

    def get_buy_type(self, user_message: str) -> Literal["new_home", "re_sale", "unknown"]:
        """
        Classify whether user is interested in a new home or resale property.

        Args:
            user_message (str): The user's input message

        Returns:
            str: "new_home", "re_sale", or "unknown"
        """
        chain = self._create_classifier_chain(
            ["new_home", "re_sale"],
            "Determine if the user is interested in a 'new_home' or a 're_sale' home. 'New home' implies a newly built property. 'Re-sale' implies an existing property."
        )
        return chain.invoke({"user_message": user_message}).strip().lower()

    def get_yes_no(self, user_message: str) -> Literal["yes", "no", "unknown"]:
        """
        Classify whether user's response means yes or no.

        Args:
            user_message (str): The user's input message

        Returns:
            str: "yes", "no", or "unknown"
        """
        chain = self._create_classifier_chain(
            ["yes", "no"],
            "Determine if the user's response means 'yes' or 'no'."
        )
        return chain.invoke({"user_message": user_message}).strip().lower()

In [46]:
class BudgetProcessor:
    """
    Tool for parsing budget information from user input.
    """

    def parse_budget(self, text: str) -> Optional[float]:
        """
        Extract and parse budget amount from text.

        Args:
            text (str): User's input containing budget information

        Returns:
            Optional[float]: Parsed budget amount or None if parsing failed
        """
        if not text:
            return None

        text_lower = str(text).lower()

        # Skip words that shouldn't be interpreted as numbers
        context_keywords = ["email", "name", "home", "time", "team"]
        if any(kw in text_lower for kw in context_keywords) and not re.search(r'[\$£€]\s*\d+', text_lower):
            return None

        # Default multiplier
        multiplier = 1.0

        # Check for millions
        if ("million" in text_lower or "mil" in text_lower or
            (re.search(r'\b\d+(\.\d+)?\s*m\b', text_lower)) or
            (re.search(r'\b\d+(\.\d+)?m\b', text_lower))):
            multiplier = 1_000_000.0
            text_lower = text_lower.replace("million", "").replace("mil", "")
            text_lower = re.sub(r'(?<=\d)m\b', '', text_lower)

        # Check for thousands
        elif ("thousand" in text_lower or "grand" in text_lower or "k" in text_lower or
              (re.search(r'\b\d+(\.\d+)?\s*k\b', text_lower)) or
              (re.search(r'\b\d+(\.\d+)?k\b', text_lower))):
            multiplier = 1_000.0
            text_lower = text_lower.replace("thousand", "").replace("grand", "")
            text_lower = re.sub(r'(?<=\d)k\b', '', text_lower)

        # Remove currency symbols and words
        text_lower = re.sub(r"[£$€,]|aud|gbp|usd", "", text_lower)

        # Find all numbers in the text
        numbers = re.findall(r"\b\d+(?:\.\d+)?\b", text_lower)

        if not numbers:
            return None

        try:
            parsed_values = [float(num_str) * multiplier for num_str in numbers]
            return max(parsed_values) if parsed_values else None
        except ValueError:
            return None

In [47]:
class PostcodeProcessor:
    """
    Tool for validating postcodes and finding similar postcodes using FAISS.
    """

    def __init__(self, eligible_postcodes_set: Set[str], eligible_postcodes_list: List[str], embedding_model):
        """
        Initialize the postcode processor.

        Args:
            eligible_postcodes_set (Set[str]): Set of eligible postcodes for exact matching
            eligible_postcodes_list (List[str]): List of eligible postcodes for FAISS indexing
            embedding_model: The embedding model for vectorizing postcodes
        """
        self.eligible_set = eligible_postcodes_set
        self.eligible_list = eligible_postcodes_list
        self.embedding_model = embedding_model
        self.index = None
        self.dimension = 0

        if self.eligible_list:
            self._build_faiss_index()

    def _build_faiss_index(self):
        """
        Build a FAISS index for fast similarity search of postcodes.
        """
        try:
            print("Building FAISS index for postcodes...")

            postcode_texts = [str(pc) for pc in self.eligible_list]
            if not postcode_texts:
                print("No postcodes to index.")
                return

            # Get embeddings for all postcodes
            embeddings = np.array(self.embedding_model.embed_documents(postcode_texts)).astype('float32')

            # Handle potential issues with embeddings
            if embeddings.ndim == 1:
                embeddings = embeddings.reshape(1, -1)

            if embeddings.shape[0] == 0:
                print("No embeddings generated for postcodes.")
                return

            # Set dimension and create index
            self.dimension = embeddings.shape[1]
            self.index = faiss.IndexFlatL2(self.dimension)
            self.index.add(embeddings)

            print(f"FAISS index built with {self.index.ntotal} postcodes, dimension {self.dimension}.")

        except Exception as e:
            print(f"Error building FAISS index: {e}")
            self.index = None

    def is_covered(self, postcode: str) -> bool:
        """
        Check if a postcode is covered (available in the eligible set).

        Args:
            postcode (str): The postcode to check

        Returns:
            bool: True if postcode is covered, False otherwise
        """
        return normalize_postcode(postcode) in self.eligible_set

    def find_similar_postcodes(self, postcode: str, k: int = 1) -> Optional[List[Tuple[str, float]]]:
        """
        Find similar postcodes using FAISS.

        Args:
            postcode (str): The postcode to find similar matches for
            k (int): Number of similar postcodes to return

        Returns:
            Optional[List[Tuple[str, float]]]: List of (postcode, distance) tuples or None if no matches
        """
        if not self.index or not self.eligible_list:
            return None

        try:
            # Get embedding for query postcode
            query_embedding = np.array(self.embedding_model.embed_query(postcode)).astype('float32')

            # Reshape if needed
            if query_embedding.ndim == 1:
                query_embedding = query_embedding.reshape(1, -1)

            # Search for similar postcodes
            distances, indices = self.index.search(query_embedding, k)

            # Format results
            results = []
            for i in range(len(indices[0])):
                idx = indices[0][i]
                if 0 <= idx < len(self.eligible_list):
                    results.append((self.eligible_list[idx], float(distances[0][i])))

            return results if results else None

        except Exception as e:
            print(f"Error searching FAISS index for postcode '{postcode}': {e}")
            return None

In [50]:
# ==================== GRAPH NODES ====================
def start_conversation(state: ChatbotState) -> ChatbotState:
    """
    Initialize the conversation with a greeting.

    Args:
        state (ChatbotState): Current state (ignored)

    Returns:
        ChatbotState: Updated state with initial greeting
    """
    initial_message = "Hello! Welcome to RealtyFlow. How may I help you? (e.g., 'I want to buy', 'I'm looking to sell')"

    # Initialize fresh state
    return ChatbotState(
        messages=[AIMessage(content=initial_message)],
        main_intent=None,
        buy_type=None,
        name=None,
        phone=None,
        email=None,
        budget=None,
        budget_raw=None,
        postcode=None,
        postcode_raw=None,
        postcode_covered=None,
        suggested_postcode=None,
        pending_action="process_initial_intent",
        attempts=0,
        max_attempts=5,
        conversation_ended=False,
        error_message=None
    )

def process_initial_intent(state: ChatbotState) -> ChatbotState:
    """
    Process the user's initial intent (buy or sell).

    Args:
        state (ChatbotState): Current state

    Returns:
        ChatbotState: Updated state based on user's intent
    """
    # Get the last message from the user
    messages = state["messages"]
    if not messages or len(messages) < 2 or not isinstance(messages[-1], HumanMessage):
        return {
            **state,
            "error_message": "No user message found.",
            "attempts": state["attempts"] + 1
        }

    user_input = messages[-1].content

    # Use the intent classifier to determine if user wants to buy or sell
    intent = intent_classifier.get_main_intent(user_input)

    if intent in ["buy", "sell"]:
        # Intent successfully classified
        return {
            **state,
            "main_intent": intent,
            "pending_action": "get_name",
            "messages": state["messages"] + [AIMessage(content="Great. Can I get your name?")],
            "attempts": 0
        }
    else:
        # Intent unclear or not recognized
        if intent != "unknown":
            print(f"Warning: Intent classifier returned unexpected value: '{intent}' for input: '{user_input}'")

        return {
            **state,
            "pending_action": "process_initial_intent",
            "messages": state["messages"] + [AIMessage(content="Sorry, I didn't quite understand. Are you looking to buy or sell a property?")],
            "attempts": state["attempts"] + 1
        }

def get_name(state: ChatbotState) -> ChatbotState:
    """
    Process user's name.

    Args:
        state (ChatbotState): Current state

    Returns:
        ChatbotState: Updated state with user's name
    """
    messages = state["messages"]
    if not messages or len(messages) < 2 or not isinstance(messages[-1], HumanMessage):
        return {
            **state,
            "error_message": "No user message found.",
            "attempts": state["attempts"] + 1
        }

    user_input = messages[-1].content.strip()

    if not user_input:
        return {
            **state,
            "pending_action": "get_name",
            "messages": state["messages"] + [AIMessage(content="I didn't catch your name. Could you please tell me your name?")],
            "attempts": state["attempts"] + 1
        }

    return {
        **state,
        "name": user_input,
        "pending_action": "get_phone",
        "messages": state["messages"] + [AIMessage(content="Can I get your phone number?")],
        "attempts": 0
    }

def get_phone(state: ChatbotState) -> ChatbotState:
    """
    Process user's phone number.

    Args:
        state (ChatbotState): Current state

    Returns:
        ChatbotState: Updated state with user's phone number
    """
    messages = state["messages"]
    if not messages or len(messages) < 2 or not isinstance(messages[-1], HumanMessage):
        return {
            **state,
            "error_message": "No user message found.",
            "attempts": state["attempts"] + 1
        }

    user_input = messages[-1].content.strip()

    # Basic validation - ensure there are enough digits
    # This is a simplified validation, you might want more sophisticated validation in production
    if not user_input or not re.search(r'\d{7,}', user_input):
        return {
            **state,
            "pending_action": "get_phone",
            "messages": state["messages"] + [AIMessage(content="That doesn't look like a valid phone number. Please provide at least 7 digits.")],
            "attempts": state["attempts"] + 1
        }

    return {
        **state,
        "phone": user_input,
        "pending_action": "get_email",
        "messages": state["messages"] + [AIMessage(content="Can I get your email address?")],
        "attempts": 0
    }

def get_email(state: ChatbotState) -> ChatbotState:
    """
    Process user's email address.

    Args:
        state (ChatbotState): Current state

    Returns:
        ChatbotState: Updated state with user's email and next question
    """
    messages = state["messages"]
    if not messages or len(messages) < 2 or not isinstance(messages[-1], HumanMessage):
        return {
            **state,
            "error_message": "No user message found.",
            "attempts": state["attempts"] + 1
        }

    user_input = messages[-1].content.strip()

    # Basic email validation
    if not user_input or "@" not in user_input or "." not in user_input.split('@')[-1]:
        return {
            **state,
            "pending_action": "get_email",
            "messages": state["messages"] + [AIMessage(content="That doesn't look like a valid email address. Please try again.")],
            "attempts": state["attempts"] + 1
        }

    # Determine next question based on intent
    if state["main_intent"] == "buy":
        next_message = "Are you looking for a new home or a re-sale home?"
        next_action = "get_buy_type"
    elif state["main_intent"] == "sell":
        next_message = "What is your postcode?"
        next_action = "get_postcode"
    else:
        return {
            **state,
            "error_message": "Error in flow: main intent not set.",
            "conversation_ended": True
        }

    return {
        **state,
        "email": user_input,
        "pending_action": next_action,
        "messages": state["messages"] + [AIMessage(content=next_message)],
        "attempts": 0
    }

def get_buy_type(state: ChatbotState) -> ChatbotState:
    """
    Process what type of property the user wants to buy.

    Args:
        state (ChatbotState): Current state

    Returns:
        ChatbotState: Updated state with property type
    """
    messages = state["messages"]
    if not messages or len(messages) < 2 or not isinstance(messages[-1], HumanMessage):
        return {
            **state,
            "error_message": "No user message found.",
            "attempts": state["attempts"] + 1
        }

    user_input = messages[-1].content

    # Classify buy type
    buy_type = intent_classifier.get_buy_type(user_input)

    if buy_type in ["new_home", "re_sale"]:
        return {
            **state,
            "buy_type": buy_type,
            "pending_action": "get_budget",
            "messages": state["messages"] + [AIMessage(content="What is your budget?")],
            "attempts": 0
        }
    else:
        if buy_type != "unknown":
            print(f"Warning: Buy type classifier returned unexpected value: '{buy_type}' for input: '{user_input}'")

        return {
            **state,
            "pending_action": "get_buy_type",
            "messages": state["messages"] + [AIMessage(content="Sorry, I didn't catch that. Are you interested in a new home or a re-sale home?")],
            "attempts": state["attempts"] + 1
        }

def get_budget(state: ChatbotState) -> ChatbotState:
    """
    Process user's budget.

    Args:
        state (ChatbotState): Current state

    Returns:
        ChatbotState: Updated state with budget information
    """
    messages = state["messages"]
    if not messages or len(messages) < 2 or not isinstance(messages[-1], HumanMessage):
        return {
            **state,
            "error_message": "No user message found.",
            "attempts": state["attempts"] + 1
        }

    user_input = messages[-1].content

    # Parse budget
    parsed_budget = budget_processor.parse_budget(user_input)

    if parsed_budget is not None:
        # Handle minimum budget requirement for new homes
        if (state["main_intent"] == "buy" and
            state["buy_type"] == "new_home" and
            parsed_budget < MIN_BUDGET_NEW_HOME):

            goodbye_msg = (f"Sorry, we don't cater to new home properties under £{MIN_BUDGET_NEW_HOME:,}. "
                          f"Please call our office on {COMPANY_PHONE_NUMBER} for assistance. "
                          "Thank you for chatting with us. Goodbye.")

            return {
                **state,
                "budget": parsed_budget,
                "budget_raw": user_input,
                "messages": state["messages"] + [AIMessage(content=goodbye_msg)],
                "conversation_ended": True,
                "attempts": 0
            }
        else:
            # Budget is sufficient, ask for postcode
            question = "What is the postcode of the location you're interested in?"

            return {
                **state,
                "budget": parsed_budget,
                "budget_raw": user_input,
                "pending_action": "get_postcode",
                "messages": state["messages"] + [AIMessage(content=question)],
                "attempts": 0
            }
    else:
        # Failed to parse budget
        return {
            **state,
            "budget_raw": user_input,
            "pending_action": "get_budget",
            "messages": state["messages"] + [AIMessage(content="Sorry, I couldn't understand the budget. Please provide it as a number (e.g., 1,200,000, 1.2m, or 500k).")],
            "attempts": state["attempts"] + 1
        }

def get_postcode(state: ChatbotState) -> ChatbotState:
    """
    Process user's postcode and check if it's covered.

    Args:
        state (ChatbotState): Current state

    Returns:
        ChatbotState: Updated state with postcode information
    """
    messages = state["messages"]
    if not messages or len(messages) < 2 or not isinstance(messages[-1], HumanMessage):
        return {
            **state,
            "error_message": "No user message found.",
            "attempts": state["attempts"] + 1
        }

    user_input = messages[-1].content
    normalized_pc = normalize_postcode(user_input)

    # Check if postcode is covered
    is_covered = postcode_processor.is_covered(normalized_pc)

    # Try to find similar postcodes if not covered
    suggestion = None
    faiss_msg_part = ""

    if not is_covered and postcode_processor.index:
        similar = postcode_processor.find_similar_postcodes(normalized_pc)
        if similar:
            suggested_pc, distance = similar[0]
            # Only suggest if reasonably close
            if distance < 0.7:  # Threshold can be adjusted
                suggestion = suggested_pc
                faiss_msg_part = f" (Did you perhaps mean {suggestion}? If so, please re-enter it.)"

    # Update state with postcode info
    updated_state = {
        **state,
        "postcode": normalized_pc,
        "postcode_raw": user_input,
        "postcode_covered": is_covered,
        "suggested_postcode": suggestion,
        "attempts": 0
    }

    # Handle covered postcodes
    if is_covered:
        msg = ("Great! That postcode is covered. Someone will get in touch with you within 24 hours "
              "via phone or email. Is there anything else I can help you with? (yes/no)")

        updated_state.update({
            "pending_action": "check_reassistance",
            "messages": state["messages"] + [AIMessage(content=msg)]
        })
    else:
        # Different handling based on property type
        if state["main_intent"] == "buy" and state["buy_type"] == "new_home":
            # New homes with uncovered postcode - end conversation
            msg = (f"Sorry, we don't cater to that postcode for new homes.{faiss_msg_part} "
                  f"Please call our office on {COMPANY_PHONE_NUMBER} for assistance. Thank you for chatting with us. Goodbye.")

            updated_state.update({
                "messages": state["messages"] + [AIMessage(content=msg)],
                "conversation_ended": True
            })
        else:
            # Resale or sell with uncovered postcode - offer reassistance
            msg = (f"Sorry, we don't currently cover the postcode '{normalized_pc}'.{faiss_msg_part} "
                  f"Please call our office on {COMPANY_PHONE_NUMBER} for assistance. "
                  "Is there anything else I can help you with? (yes/no)")

            updated_state.update({
                "pending_action": "check_reassistance",
                "messages": state["messages"] + [AIMessage(content=msg)]
            })

    return updated_state


def check_reassistance(state: ChatbotState) -> ChatbotState:
    """
    Process whether user wants additional assistance.

    Args:
        state (ChatbotState): Current state

    Returns:
        ChatbotState: Updated state based on user's response
    """
    messages = state["messages"]
    if not messages or len(messages) < 2 or not isinstance(messages[-1], HumanMessage):
        return {
            **state,
            "error_message": "No user message found.",
            "attempts": state["attempts"] + 1
        }

    user_input = messages[-1].content

    # Determine if user wants more assistance
    response = intent_classifier.get_yes_no(user_input)

    if response == "yes":
        # User wants more assistance - restart conversation
        restart_msg = "What else can I help you with today? Are you looking to buy or sell a property?"
        return {
            **state,
            "main_intent": None,
            "buy_type": None,
            "budget": None,
            "budget_raw": None,
            "postcode": None,
            "postcode_raw": None,
            "postcode_covered": None,
            "suggested_postcode": None,
            "pending_action": "process_initial_intent",
            "messages": state["messages"] + [AIMessage(content=restart_msg)],
            "attempts": 0
        }
    elif response == "no":
        # User doesn't want more assistance - end conversation
        farewell_msg = "Thank you for chatting with us. We'll be in touch if needed. Have a great day!"
        return {
            **state,
            "messages": state["messages"] + [AIMessage(content=farewell_msg)],
            "conversation_ended": True,
            "attempts": 0
        }
    else:
        # Unclear response
        if response != "unknown":
            print(f"Warning: Yes/No classifier returned unexpected value: '{response}' for input: '{user_input}'")

        return {
            **state,
            "pending_action": "check_reassistance",
            "messages": state["messages"] + [AIMessage(content="Sorry, I didn't catch that. Would you like help with anything else? (yes/no)")],
            "attempts": state["attempts"] + 1
        }

def handle_max_attempts(state: ChatbotState) -> ChatbotState:
    """
    Handle cases where the maximum number of attempts has been exceeded.

    Args:
        state (ChatbotState): Current state

    Returns:
        ChatbotState: Updated state with fallback message
    """
    fallback_msg = (f"I'm having trouble understanding your responses. "
                  f"For better assistance, please call our office at {COMPANY_PHONE_NUMBER}. "
                  "Thank you for your patience.")

    return {
        **state,
        "messages": state["messages"] + [AIMessage(content=fallback_msg)],
        "conversation_ended": True
    }

def error_handler(state: ChatbotState) -> ChatbotState:
    """
    Handle errors encountered during conversation flow.

    Args:
        state (ChatbotState): Current state with error information

    Returns:
        ChatbotState: Updated state with error message
    """
    error_msg = "I apologize, but I encountered an issue processing your request. "

    if state.get("error_message"):
        print(f"Error in conversation flow: {state['error_message']}")
        if "dev_mode" in state and state["dev_mode"]:
            error_msg += f"Error details: {state['error_message']}. "

    error_msg += f"Please try again or contact our support team at {COMPANY_PHONE_NUMBER}."

    return {
        **state,
        "messages": state["messages"] + [AIMessage(content=error_msg)],
        "conversation_ended": True
    }

def should_end_conversation(state: ChatbotState) -> bool:
    """
    Determine if the conversation should end.

    Args:
        state (ChatbotState): Current state

    Returns:
        bool: True if conversation should end, False otherwise
    """
    return state.get("conversation_ended", False)

def should_handle_max_attempts(state: ChatbotState) -> bool:
    """
    Determine if max attempts have been exceeded.

    Args:
        state (ChatbotState): Current state

    Returns:
        bool: True if max attempts exceeded, False otherwise
    """
    return state.get("attempts", 0) >= state.get("max_attempts", MAX_ATTEMPTS)

def should_handle_error(state: ChatbotState) -> bool:
    """
    Determine if an error occurred that needs handling.

    Args:
        state (ChatbotState): Current state

    Returns:
        bool: True if error occurred, False otherwise
    """
    return state.get("error_message") is not None

In [51]:
# ==================== GRAPH CONSTRUCTION ====================
def build_conversation_graph() -> StateGraph:
    """
    Build the conversation flow graph.

    Returns:
        StateGraph: The constructed graph
    """
    # Create new graph
    workflow = StateGraph(ChatbotState)

    # Add all nodes
    workflow.add_node("start_conversation", start_conversation)
    workflow.add_node("process_initial_intent", process_initial_intent)
    workflow.add_node("get_name", get_name)
    workflow.add_node("get_phone", get_phone)
    workflow.add_node("get_email", get_email)
    workflow.add_node("get_buy_type", get_buy_type)
    workflow.add_node("get_budget", get_budget)
    workflow.add_node("get_postcode", get_postcode)
    workflow.add_node("check_reassistance", check_reassistance)
    workflow.add_node("handle_max_attempts", handle_max_attempts)
    workflow.add_node("error_handler", error_handler)

    # Set the entrypoint
    workflow.set_entry_point("start_conversation")

    # Add conditional edges based on pending_action
    workflow.add_conditional_edges(
        "process_initial_intent",
        lambda x: "handle_max_attempts" if should_handle_max_attempts(x) else x["pending_action"]
    )

    workflow.add_conditional_edges(
        "get_name",
        lambda x: "handle_max_attempts" if should_handle_max_attempts(x) else x["pending_action"]
    )

    workflow.add_conditional_edges(
        "get_phone",
        lambda x: "handle_max_attempts" if should_handle_max_attempts(x) else x["pending_action"]
    )

    workflow.add_conditional_edges(
        "get_email",
        lambda x: "handle_max_attempts" if should_handle_max_attempts(x) else x["pending_action"]
    )

    workflow.add_conditional_edges(
        "get_buy_type",
        lambda x: "handle_max_attempts" if should_handle_max_attempts(x) else x["pending_action"]
    )

    workflow.add_conditional_edges(
        "get_budget",
        lambda x: "handle_max_attempts" if should_handle_max_attempts(x) else x["pending_action"]
    )

    workflow.add_conditional_edges(
        "get_postcode",
        lambda x: "handle_max_attempts" if should_handle_max_attempts(x) else x["pending_action"]
    )

    workflow.add_conditional_edges(
        "check_reassistance",
        lambda x: "handle_max_attempts" if should_handle_max_attempts(x) else x["pending_action"]
    )

    # Add edges for max attempts and errors
    workflow.add_edge("handle_max_attempts", END)
    workflow.add_edge("error_handler", END)

    # Add conditional edges for conversation ending
    for node in ["process_initial_intent", "get_name", "get_phone", "get_email",
                "get_buy_type", "get_budget", "get_postcode", "check_reassistance"]:
        workflow.add_conditional_edges(
            node,
            lambda x, node=node: END if should_end_conversation(x) else
                                "error_handler" if should_handle_error(x) else node
        )

    # Compile the graph
    return workflow.compile()

In [54]:
from typing import Dict, Any
# ==================== MAIN APPLICATION ====================
class RealtyFlowChatbot:
    """
    Main chatbot class that manages the conversation flow.
    """

    def __init__(self, llm_provider: str = "google", postcode_file: str = POSTCODE_FILE):
        """
        Initialize the RealtyFlow chatbot.

        Args:
            llm_provider (str): LLM provider to use ("google" or "openai")
            postcode_file (str): Path to the postcode CSV file
        """
        print(f"Initializing RealtyFlow Chatbot with {llm_provider} provider...")

        # Set up LLM and embedding model
        self.llm, self.embedding_model = setup_llm_backend(llm_provider)

        # Load postcodes
        self.eligible_postcodes_set, self.eligible_postcodes_list = load_eligible_postcodes(postcode_file)

        # Set up tools
        global intent_classifier, budget_processor, postcode_processor
        intent_classifier = IntentClassifier(self.llm)
        budget_processor = BudgetProcessor()
        postcode_processor = PostcodeProcessor(
            self.eligible_postcodes_set,
            self.eligible_postcodes_list,
            self.embedding_model
        )

        # Build conversation graph
        self.graph = build_conversation_graph()

        # Initialize state
        self.state = None
        self.reset()

        print("RealtyFlow Chatbot initialized successfully.")

    def reset(self):
        """Reset the conversation to its initial state."""
        self.state = None

    def process_message(self, user_message: str) -> str:
        """
        Process a user message and return the chatbot's response.

        Args:
            user_message (str): User's input message

        Returns:
            str: Chatbot's response
        """
        try:
            # Initialize state if none exists
            if self.state is None:
                # First run through the graph to get initial greeting
                self.state = self.graph.invoke({})
                initial_greeting = self.state["messages"][0].content

                # Add user message to state
                self.state["messages"].append(HumanMessage(content=user_message))
            else:
                # Add user message to existing state
                self.state["messages"].append(HumanMessage(content=user_message))

                # Process through graph
                self.state = self.graph.invoke(self.state)

            # Get the latest assistant message
            if self.state["messages"] and isinstance(self.state["messages"][-1], AIMessage):
                return self.state["messages"][-1].content
            else:
                return "I'm sorry, something went wrong in our conversation flow."

        except Exception as e:
            print(f"Error processing message: {e}")
            return f"I apologize, but an error occurred. Please try again or contact support at {COMPANY_PHONE_NUMBER}."

    def get_collected_info(self) -> Dict[str, Any]:
        """
        Get information collected during the conversation.

        Returns:
            Dict[str, Any]: Dictionary with collected information
        """
        if not self.state:
            return {}

        return {
            "name": self.state.get("name"),
            "phone": self.state.get("phone"),
            "email": self.state.get("email"),
            "main_intent": self.state.get("main_intent"),
            "buy_type": self.state.get("buy_type"),
            "budget": self.state.get("budget"),
            "postcode": self.state.get("postcode"),
            "postcode_covered": self.state.get("postcode_covered")
        }

    def is_conversation_ended(self) -> bool:
        """
        Check if the conversation has ended.

        Returns:
            bool: True if conversation has ended, False otherwise
        """
        return self.state and self.state.get("conversation_ended", False)

In [55]:
# ==================== EXAMPLE USAGE ====================
def run_example_conversation():
    """
    Run an example conversation with the chatbot.
    """
    chatbot = RealtyFlowChatbot()

    print("\n===== RealtyFlow AI Chatbot Demo =====\n")
    print("[System] Bot initialized. Send a message to begin.")

    # Get initial greeting
    bot_response = chatbot.process_message("start")
    print(f"Bot: {bot_response}")

    # Continue conversation until ended
    while not chatbot.is_conversation_ended():
        user_input = input("You: ")
        if user_input.lower() in ["quit", "exit", "bye"]:
            print("Exiting conversation.")
            break

        bot_response = chatbot.process_message(user_input)
        print(f"Bot: {bot_response}")

    # Print collected information
    print("\n===== Information Collected =====")
    info = chatbot.get_collected_info()
    for key, value in info.items():
        if value is not None:
            print(f"{key.capitalize()}: {value}")

    print("\nConversation ended.")

# Run if script is executed directly
if __name__ == "__main__":
    run_example_conversation()

Initializing RealtyFlow Chatbot with google provider...
Google Gemini LLM and Embeddings initialized.
Successfully loaded 100 unique postcodes.
Building FAISS index for postcodes...
FAISS index built with 100 postcodes, dimension 768.


ValueError: Branch with name `None` already exists for node `process_initial_intent`