<a href="https://colab.research.google.com/github/KaifAhmad1/code-test/blob/main/CommVersion_Assignment_Mohd_Kaif.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **RealtyFlow: Real Estate Lead Qualification**

#### **1. Project Goal**
RealtyFlow AI is a conversational AI that automates real estate lead qualification. It gathers user intent (buy/sell), contact details, budget, and location preferences through a structured dialogue.

#### **2. Core Approach & Technologies**
*   **Orchestration:** `LangGraph` manages the stateful conversation flow.
*   **LLM:** `Google Gemini` (via `langchain-google-genai`) for:
    *   Natural Language Understanding (intent classification, input validation).
    *   Natural Language Generation (dynamic responses).
*   **Vector Search:** `FAISS` with `GoogleGenerativeAIEmbeddings` for efficient postcode validation against an approved list and similarity-based suggestions (typo correction, nearby options).
*   **State:** A `ChatState` TypedDict tracks all conversational variables (user details, history, current stage).
*   **Data:** `pandas` for loading eligible postcodes.

#### **3. Key Functionality**
1.  **Greeting & Intent:** Determines if the user wants to "buy" or "sell".
2.  **Contact Info:** Collects and validates name, phone, and email.
3.  **Buy Flow:**
    *   Asks for property type ("new home" / "re-sale").
    *   Gathers budget (applies minimum budget rule for new homes).
    *   Asks for target postcode (checks coverage, stricter for new homes).
4.  **Sell Flow:**
    *   Asks for property postcode (checks coverage).
5.  **Postcode Handling:**
    *   Validates format (regex & LLM).
    *   Checks against an internal list of serviceable postcodes.
    *   Uses FAISS to suggest similar valid postcodes if the input is not an exact match or not covered.
6.  **Outcome & Reassistance:**
    *   Informs if the postcode is covered and an agent will follow up.
    *   If not covered (especially for new home purchases), or if an unsupported path is taken, refers to the office, potentially with FAISS suggestions.

In [None]:
!pip install -q langchain langgraph langchain-google-genai google-generativeai pandas faiss-cpu tiktoken python-dotenv

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.5/43.5 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m151.2/151.2 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m61.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m56.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.3/42.3 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.6/47.6 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.8/194.8 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
import os
import re
import json
import uuid
import pandas as pd
import numpy as np
import faiss
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, BaseMessage
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langgraph.graph import StateGraph, END
from typing import Dict, List, Optional, Set, Tuple, Any, Union, TypedDict
from enum import Enum

from google.colab import drive
drive.mount('/content/drive')

GOOGLE_API_KEY = "AIzaSyD9ljvMl4t9ucEnQpi3RfAJsoCgViE7O9Q"

MIN_BUDGET_NEW_HOME = 1_000_000
COMPANY_PHONE_NUMBER = "1800 111 222"
MAX_ATTEMPTS = 3
POSTCODE_FILE = "/content/drive/MyDrive/uk_postcodes 1.csv"

class Intent(str, Enum):
    BUY = "buy"
    SELL = "sell"
    UNKNOWN = "unknown"

class BuyType(str, Enum):
    NEW_HOME = "new_home"
    RE_SALE = "re_sale"
    UNKNOWN = "unknown"

class YesNo(str, Enum):
    YES = "yes"
    NO = "no"
    UNKNOWN = "unknown"

class ConversationStage(str, Enum):
    GREETING = "greeting"
    AWAITING_INTENT = "awaiting_intent"
    AWAITING_NAME = "awaiting_name"
    AWAITING_PHONE = "awaiting_phone"
    AWAITING_EMAIL = "awaiting_email"
    AWAITING_BUY_TYPE = "awaiting_buy_type"
    AWAITING_BUDGET = "awaiting_budget"
    AWAITING_POSTCODE = "awaiting_postcode"
    AWAITING_REASSISTANCE = "awaiting_reassistance"
    AWAITING_FOLLOWUP = "awaiting_followup"
    ENDED = "ended"
    ERROR = "error"

class ChatState(TypedDict):
    messages: List[BaseMessage]
    intent: Optional[Intent]
    buy_type: Optional[BuyType]
    name: Optional[str]
    phone: Optional[str]
    email: Optional[str]
    budget: Optional[float]
    postcode: Optional[str]
    postcode_covered: Optional[bool]
    suggested_postcode: Optional[str]
    attempts: int
    conversation_ended: bool
    conversation_stage: ConversationStage
    last_error: Optional[str]
    session_id: str
    interaction_history: List[Dict[str, Any]]

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# Utility Functions
def normalize_postcode(postcode: str) -> str:
    if not postcode: return ""
    return re.sub(r'[^A-Z0-9]', '', postcode.upper())

def load_eligible_postcodes(file_path: str) -> Tuple[Set[str], List[str]]:
    sample_postcodes = ["SW1A1AA", "WC2N5DU", "EH11BQ", "M11AE", "B11HH", "L18JQ", "CF101AU"]
    try:
        if not os.path.exists(file_path):
            print(f"{file_path} not found. Creating a dummy file with sample postcodes.")
            pd.DataFrame({'Postcode': sample_postcodes}).to_csv(file_path, index=False)

        df = pd.read_csv(file_path)
        postcode_col = next((col for col in df.columns if 'postcode' in col.lower()), df.columns[0])
        postcodes = [normalize_postcode(str(pc)) for pc in df[postcode_col] if pd.notna(pc) and str(pc).strip()]
        if not postcodes:
            print(f"No valid postcodes found in {file_path}. Using sample data.")
            return set(sample_postcodes), sample_postcodes
        print(f"Loaded {len(postcodes)} eligible postcodes from {file_path}")
        return set(postcodes), postcodes
    except Exception as e:
        print(f"Error loading postcodes from {file_path}: {e}. Using sample data.")
        return set(sample_postcodes), sample_postcodes

def log_interaction(state: ChatState, action_type: str, user_input: Optional[str] = None, bot_response: Optional[str] = None, details: Optional[Dict] = None) -> ChatState:
    interaction = {
        "timestamp": pd.Timestamp.now().isoformat(),
        "session_id": state["session_id"],
        "stage": state["conversation_stage"].value if isinstance(state["conversation_stage"], Enum) else state["conversation_stage"],
        "action_type": action_type,
        "attempts_at_stage": state["attempts"],
        "user_input": user_input,
        "bot_response": bot_response,
        "current_intent": state.get("intent", Intent.UNKNOWN).value if isinstance(state.get("intent"), Enum) else state.get("intent"),
        "details": details or {}
    }
    state["interaction_history"].append(interaction)
    return state

# LLM and Embedding Models
try:
    llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest", google_api_key=os.environ["GOOGLE_API_KEY"], convert_system_message_to_human=True)
    embedding_model = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key=os.environ["GOOGLE_API_KEY"])
except Exception as e:
    print(f"Failed to initialize Google Generative AI models: {e}")
    raise

In [None]:
# Agent Implementations
class EnhancedIntentClassifierAgent:
    def __init__(self, llm_model):
        self.llm = llm_model

    def _build_classification_chain(self, entity_name: str, allowed_options: List[str], description: str, examples: List[Dict[str, str]]):
        examples_text = "\n".join([f"User: \"{e['input']}\" -> Assistant: {e['output']}" for e in examples])
        prompt_template_str = f"""Your task is to classify the user's input for '{entity_name}'.
{description}
Allowed classifications: {', '.join(allowed_options)}.
Respond with ONLY ONE of the allowed classifications. Do not add any other text or explanation.

Examples:
{examples_text}

User: "{{user_message}}"
Assistant:"""
        prompt = ChatPromptTemplate.from_template(prompt_template_str)
        return prompt | self.llm | StrOutputParser()

    def classify_intent(self, user_message: str) -> Intent:
        examples = [
            {"input": "I want to buy a house", "output": "buy"}, {"input": "I'm looking to sell my apartment", "output": "sell"},
            {"input": "Purchase property", "output": "buy"}, {"input": "List my home", "output": "sell"},
            {"input": "Can you help me find a place?", "output": "buy"}, {"input": "Need to offload my current place", "output": "sell"}
        ]
        chain = self._build_classification_chain("User Intent", [Intent.BUY.value, Intent.SELL.value],
                                               "Determine if the user primarily wants to buy or sell a property.", examples)
        try:
            result = chain.invoke({"user_message": user_message}).strip().lower()
            print(f"Intent classification for '{user_message}': {result}")
            return Intent(result) if result in [Intent.BUY.value, Intent.SELL.value] else Intent.UNKNOWN
        except Exception as e:
            print(f"Error in intent classification: {e}")
            return Intent.UNKNOWN

    def classify_buy_type(self, user_message: str) -> BuyType:
        examples = [
            {"input": "A new build", "output": "new_home"}, {"input": "Something pre-owned", "output": "re_sale"},
            {"input": "Newly constructed", "output": "new_home"}, {"input": "An existing house", "output": "re_sale"},
        ]
        chain = self._build_classification_chain("Property Type for Buyer", [BuyType.NEW_HOME.value, BuyType.RE_SALE.value],
                                               "Determine if the buyer is looking for a new home or a re-sale property.", examples)
        try:
            result = chain.invoke({"user_message": user_message}).strip().lower()
            print(f"Buy type classification for '{user_message}': {result}")
            return BuyType(result) if result in [BuyType.NEW_HOME.value, BuyType.RE_SALE.value] else BuyType.UNKNOWN
        except Exception as e:
            print(f"Error in buy type classification: {e}")
            return BuyType.UNKNOWN

    def classify_yes_no(self, user_message: str) -> YesNo:
        examples = [
            {"input": "Yes, please", "output": "yes"}, {"input": "Nope", "output": "no"},
            {"input": "Sure", "output": "yes"}, {"input": "Not right now", "output": "no"},
            {"input": "Affirmative", "output": "yes"}, {"input": "I don't think so", "output": "no"}
        ]
        chain = self._build_classification_chain("Yes/No Answer", [YesNo.YES.value, YesNo.NO.value],
                                               "Determine if the user's response is affirmative (yes) or negative (no).", examples)
        try:
            result = chain.invoke({"user_message": user_message}).strip().lower()
            print(f"Yes/No classification for '{user_message}': {result}")
            return YesNo(result) if result in [YesNo.YES.value, YesNo.NO.value] else YesNo.UNKNOWN
        except Exception as e:
            print(f"Error in yes/no classification: {e}")
            return YesNo.UNKNOWN

In [None]:
class EnhancedInfoGathererAgent:
    def __init__(self, llm_model):
        self.llm = llm_model

    def _validate_with_llm(self, field_name: str, value_to_validate: str) -> Tuple[bool, str]:
        template_str = """You are a data validation assistant.
Task: Check if the provided value for the field '{field}' is plausible and correctly formatted.
Input Value: {input_value}

Instructions:
- Analyze the input value.
- For 'name', it should look like a real name (not gibberish, very short, or excessively long).
- For 'phone', it should resemble a phone number (mostly digits, appropriate length).
- For 'email', it must contain '@' and a '.' in the domain part.
- Respond with a JSON object containing two keys:
  - "is_valid": boolean (true if valid, false otherwise)
  - "reason": string (brief explanation if invalid, or "Valid." if valid)

Example for an invalid name: {{"is_valid": false, "reason": "Name appears to be too short or not a typical name."}}
Example for a valid email: {{"is_valid": true, "reason": "Valid email format."}}

JSON Response:"""
        prompt = PromptTemplate.from_template(template_str)
        parser = JsonOutputParser(pydantic_object=None)

        chain = prompt | self.llm | parser
        str_chain_for_fallback = prompt | self.llm | StrOutputParser()

        try:
            result = chain.invoke({"field": field_name, "input_value": value_to_validate})
            return result.get("is_valid", False), result.get("reason", f"Validation inconclusive for {field_name}")
        except Exception as e_json:
            print(f"JsonOutputParser failed for {field_name} validation: {e_json}. Retrying with string parsing from raw output.")
            raw_output = str_chain_for_fallback.invoke({"field": field_name, "input_value": value_to_validate})
            print(f"Raw LLM output for {field_name} validation: {raw_output}")

            try:
                start_index = raw_output.find('{')
                end_index = raw_output.rfind('}')
                if start_index != -1 and end_index != -1 and end_index > start_index:
                    json_str = raw_output[start_index : end_index+1]
                    result = json.loads(json_str)
                    return result.get("is_valid", False), result.get("reason", f"Validation inconclusive for {field_name} (raw parse)")
                else:
                    if "is_valid\": true" in raw_output.lower() or "valid." in raw_output.lower() :
                         return True, "Valid (inferred from text)."
                    if "is_valid\": false" in raw_output.lower() or "invalid" in raw_output.lower():
                         return False, "Invalid (inferred from text)."

            except json.JSONDecodeError:
                print(f"Failed to parse JSON from raw output for {field_name}: {raw_output}")

            print(f"LLM validation for {field_name} yielded non-JSON output and fallback failed: {raw_output}")
            return False, f"Error validating {field_name}: LLM response format unclear. Raw: {raw_output[:100]}"

    def get_name(self, state: ChatState) -> ChatState:
        if not state["messages"] or not isinstance(state["messages"][-1], HumanMessage):
            print("get_name called without HumanMessage. This should not happen with correct routing.")
            return state

        user_msg = state["messages"][-1].content.strip()
        is_valid, reason = self._validate_with_llm("name", user_msg)

        if not is_valid:
            state["attempts"] += 1
            state["messages"].append(AIMessage(content=f"That doesn't seem like a valid name. {reason} Could you please provide your full name?"))
            log_interaction(state, "get_name_invalid", user_msg, state["messages"][-1].content, {"reason": reason})
        else:
            state["name"] = user_msg
            state["messages"].append(AIMessage(content=f"Thanks, {user_msg}! Can I get your phone number?"))
            state["conversation_stage"] = ConversationStage.AWAITING_PHONE
            state["attempts"] = 0
            log_interaction(state, "get_name_valid", user_msg, state["messages"][-1].content)
        return state

    def get_phone(self, state: ChatState) -> ChatState:
        if not state["messages"] or not isinstance(state["messages"][-1], HumanMessage):
            print("get_phone called without HumanMessage.")
            return state

        user_msg = state["messages"][-1].content.strip()
        if not re.search(r'\d{7,}', user_msg):
            is_valid, reason = False, "Phone number must contain at least 7 digits."
        else:
            is_valid, reason = self._validate_with_llm("phone number", user_msg)

        if not is_valid:
            state["attempts"] += 1
            state["messages"].append(AIMessage(content=f"Please enter a valid phone number. {reason}"))
            log_interaction(state, "get_phone_invalid", user_msg, state["messages"][-1].content, {"reason": reason})
        else:
            state["phone"] = user_msg
            state["messages"].append(AIMessage(content="Great. And your email address?"))
            state["conversation_stage"] = ConversationStage.AWAITING_EMAIL
            state["attempts"] = 0
            log_interaction(state, "get_phone_valid", user_msg, state["messages"][-1].content)
        return state

    def get_email(self, state: ChatState) -> ChatState:
        if not state["messages"] or not isinstance(state["messages"][-1], HumanMessage):
            print("get_email called without HumanMessage.")
            return state

        user_msg = state["messages"][-1].content.strip()
        if "@" not in user_msg or "." not in user_msg.split("@")[-1]:
            is_valid, reason = False, "Email address must include '@' and a '.' in the domain."
        else:
            is_valid, reason = self._validate_with_llm("email address", user_msg)

        if not is_valid:
            state["attempts"] += 1
            state["messages"].append(AIMessage(content=f"Please provide a valid email address. {reason}"))
            log_interaction(state, "get_email_invalid", user_msg, state["messages"][-1].content, {"reason": reason})
        else:
            state["email"] = user_msg
            log_interaction(state, "get_email_valid", user_msg)
            if state["intent"] == Intent.BUY:
                state["messages"].append(AIMessage(content="Are you looking for a new home or a re-sale home?"))
                state["conversation_stage"] = ConversationStage.AWAITING_BUY_TYPE
            else:
                state["messages"].append(AIMessage(content="What is the postcode of the property you're selling?"))
                state["conversation_stage"] = ConversationStage.AWAITING_POSTCODE
            state["attempts"] = 0
            log_interaction(state, "get_email_transition_to_next_stage", None, state["messages"][-1].content)
        return state

In [None]:
class EnhancedBudgetProcessorAgent:
    def __init__(self, llm_model):
        self.llm = llm_model

    def _extract_budget_with_llm(self, text: str) -> Optional[float]:
        template_str = """Your task is to extract a numerical budget amount in pounds (£) from the user's text.
User's text: "{text_input}"

Instructions:
- Identify any monetary value mentioned.
- Convert it to a float (e.g., "1 million" -> 1000000.0, "500k" -> 500000.0, "£1,250,000" -> 1250000.0).
- If multiple numbers, prioritize the one most likely to be the budget.
- Respond with a JSON object containing two keys:
  - "budget": float (the extracted budget amount, or null if not found)
  - "confidence": float (your confidence in this extraction, 0.0 to 1.0)

Example for "around 1.5m": {{"budget": 1500000.0, "confidence": 0.9}}
Example for "I don't know yet": {{"budget": null, "confidence": 0.1}}

JSON Response:"""
        prompt = PromptTemplate.from_template(template_str)
        parser = JsonOutputParser()
        chain = prompt | self.llm | parser
        str_chain_for_fallback = prompt | self.llm | StrOutputParser()

        try:
            result = chain.invoke({"text_input": text})
            if result.get("budget") is not None and result.get("confidence", 0.0) >= 0.6:
                return float(result["budget"])
            return None
        except Exception as e_json:
            print(f"JsonOutputParser failed for budget extraction: {e_json}. Retrying with string parsing.")
            raw_output = str_chain_for_fallback.invoke({"text_input": text})
            print(f"Raw LLM output for budget extraction: {raw_output}")
            try:
                start_index = raw_output.find('{')
                end_index = raw_output.rfind('}')
                if start_index != -1 and end_index != -1 and end_index > start_index:
                    json_str = raw_output[start_index : end_index+1]
                    result = json.loads(json_str)
                    if result.get("budget") is not None and result.get("confidence", 0.0) >= 0.6:
                        return float(result["budget"])
            except json.JSONDecodeError:
                 print(f"Failed to parse JSON from raw budget output: {raw_output}")
            return None

    def _parse_budget_rules(self, text: str) -> Optional[float]:
        text_lower = text.lower()
        text_cleaned = re.sub(r'[£$,]', '', text_lower)
        numbers_extracted = []

        million_matches = re.findall(r'(\d+\.?\d*)\s*(?:m\b|million\b)', text_cleaned)
        for val_str in million_matches:
            try: numbers_extracted.append(float(val_str) * 1_000_000)
            except ValueError: pass
        text_cleaned = re.sub(r'\d+\.?\d*\s*(?:m\b|million\b)', '', text_cleaned)

        thousand_matches = re.findall(r'(\d+\.?\d*)\s*(?:k\b|thousand\b)', text_cleaned)
        for val_str in thousand_matches:
            try: numbers_extracted.append(float(val_str) * 1_000)
            except ValueError: pass
        text_cleaned = re.sub(r'\d+\.?\d*\s*(?:k\b|thousand\b)', '', text_cleaned)

        plain_num_matches = re.findall(r'\b\d+\.?\d*\b', text_cleaned)
        for val_str in plain_num_matches:
            try:
                num = float(val_str)
                if num > 100 or not numbers_extracted:
                    numbers_extracted.append(num)
            except ValueError: pass

        if not numbers_extracted:
            return None

        return max(numbers_extracted) if numbers_extracted else None

    def process_budget(self, state: ChatState) -> ChatState:
        if not state["messages"] or not isinstance(state["messages"][-1], HumanMessage):
            print("process_budget called without HumanMessage.")
            return state

        user_msg = state["messages"][-1].content.strip()
        budget_val = self._extract_budget_with_llm(user_msg)
        if budget_val is None:
            print(f"LLM budget extraction failed for '{user_msg}'. Trying rule-based parsing.")
            budget_val = self._parse_budget_rules(user_msg)

        if budget_val is None or budget_val <=0 :
            state["attempts"] += 1
            state["messages"].append(AIMessage(content="I couldn't understand the budget. Please provide a clear amount (e.g., '£500,000', '1.2m', or '750k')."))
            log_interaction(state, "get_budget_invalid", user_msg, state["messages"][-1].content)
        else:
            state["budget"] = budget_val
            print(f"Parsed budget: '{user_msg}' -> {budget_val}")
            log_interaction(state, "get_budget_valid", user_msg, details={"parsed_budget": budget_val})

            if state["buy_type"] == BuyType.NEW_HOME and budget_val < MIN_BUDGET_NEW_HOME:
                msg = (f"For new homes, our current listings start at £{MIN_BUDGET_NEW_HOME:,.0f}. "
                       f"Your budget of £{budget_val:,.0f} is below this. "
                       f"Please call our office at {COMPANY_PHONE_NUMBER} for other options or to discuss further. "
                       "Is there anything else I can help you with today? (yes/no)")
                state["messages"].append(AIMessage(content=msg))
                state["conversation_stage"] = ConversationStage.AWAITING_REASSISTANCE
                state["attempts"] = 0
                log_interaction(state, "budget_too_low_new_home", user_msg, msg)
            else:
                state["messages"].append(AIMessage(content=f"Understood. Budget: £{budget_val:,.0f}. What is the postcode of interest?"))
                state["conversation_stage"] = ConversationStage.AWAITING_POSTCODE
                state["attempts"] = 0
                log_interaction(state, "budget_accepted", user_msg, state["messages"][-1].content)
        return state

In [None]:
class EnhancedPostcodeProcessorAgent:
    def __init__(self, eligible_set: Set[str], eligible_list: List[str], embedding_model_instance, llm_model):
        self.eligible_set = eligible_set
        self.eligible_list = eligible_list
        self.embedding_model = embedding_model_instance
        self.llm = llm_model
        self.index = None
        self.dimension = 0
        if self.eligible_list:
            self._build_index()

    def _build_index(self):
        try:
            if not self.eligible_list:
                print("No eligible postcodes provided to build FAISS index.")
                return
            str_eligible_list = [str(pc) for pc in self.eligible_list]
            embeddings = np.array(self.embedding_model.embed_documents(str_eligible_list)).astype('float32')

            if embeddings.ndim == 1:
                 if embeddings.shape[0] == 0:
                    print("FAISS: No embeddings generated for postcodes.")
                    return
                 else:
                    embeddings = embeddings.reshape(1, -1)

            if embeddings.shape[0] == 0:
                print("FAISS: Embeddings array is empty after generation.")
                return

            self.dimension = embeddings.shape[1]
            if self.dimension == 0 :
                print("FAISS: Embedding dimension is 0. Cannot build index.")
                return

            self.index = faiss.IndexFlatL2(self.dimension)
            self.index.add(embeddings)
            print(f"FAISS index built with {len(self.eligible_list)} postcodes, dimension {self.dimension}.")
        except Exception as e:
            print(f"Error building FAISS index: {e}")
            self.index = None

    def _validate_postcode_format_with_llm(self, postcode_text: str) -> Tuple[bool, str]:
        uk_pc_pattern = r"^[A-Z]{1,2}[0-9][A-Z0-9]?\s?[0-9][A-Z]{2}$"
        normalized_for_regex = postcode_text.upper().replace(" ", "")
        if len(normalized_for_regex) in [5,6,7] and ' ' not in postcode_text.upper():
            if len(normalized_for_regex) > 3:
                 postcode_to_check = normalized_for_regex[:-3] + " " + normalized_for_regex[-3:]
            else:
                 postcode_to_check = normalized_for_regex
        else:
            postcode_to_check = postcode_text.upper()

        if not re.fullmatch(uk_pc_pattern, postcode_to_check):
            print(f"Postcode '{postcode_text}' (checked as '{postcode_to_check}') failed regex: {uk_pc_pattern}")
            return False, "Postcode does not match the typical UK format (e.g., SW1A 1AA or M1 1AE)."
        return True, "Format seems valid."

    def _find_similar_postcodes(self, postcode: str, k: int = 1) -> Optional[str]:
        if not self.index or self.dimension == 0:
            print("FAISS index not available or not built correctly for similarity search.")
            return None
        try:
            norm_pc_query = normalize_postcode(postcode)
            query_embedding = np.array(self.embedding_model.embed_query(norm_pc_query)).astype('float32').reshape(1, -1)

            if query_embedding.shape[1] != self.dimension:
                print(f"Query embedding dimension ({query_embedding.shape[1]}) does not match index dimension ({self.dimension}). Cannot search.")
                return None

            distances, indices = self.index.search(query_embedding, k)

            if indices.size > 0 and indices[0][0] != -1 and indices[0][0] < len(self.eligible_list):
                return self.eligible_list[indices[0][0]]
            return None
        except Exception as e:
            print(f"Error during FAISS similarity search for '{postcode}': {e}")
            return None

    def process_postcode(self, state: ChatState) -> ChatState:
        if not state["messages"] or not isinstance(state["messages"][-1], HumanMessage):
            print("process_postcode called without HumanMessage.")
            return state

        user_msg_raw = state["messages"][-1].content.strip()
        is_format_valid, reason = self._validate_postcode_format_with_llm(user_msg_raw)

        if not is_format_valid:
            state["attempts"] += 1
            state["messages"].append(AIMessage(content=f"That postcode doesn't look quite right. {reason} Please try again (e.g., 'SW1A 1AA')."))
            log_interaction(state, "get_postcode_invalid_format", user_msg_raw, state["messages"][-1].content, {"reason": reason})
            return state

        norm_pc = normalize_postcode(user_msg_raw)
        state["postcode"] = norm_pc

        if norm_pc in self.eligible_set:
            state["postcode_covered"] = True
            msg = (f"Great! Postcode {user_msg_raw.upper()} is within our service area. "
                   "I expect someone to get in touch with you within 24 hours. "
                   "Is there anything else I can help you with today? (yes/no)")
            log_interaction(state, "get_postcode_covered", user_msg_raw, msg, {"normalized_pc": norm_pc})
        else:
            state["postcode_covered"] = False
            suggestion = self._find_similar_postcodes(user_msg_raw)
            state["suggested_postcode"] = suggestion

            msg_parts = [f"Sorry, we don't currently cover postcode {user_msg_raw.upper()} directly."]
            if suggestion and suggestion != norm_pc :
                msg_parts.append(f"Did you perhaps mean a similar postcode we cover, like {suggestion}?")

            if state["intent"] == Intent.BUY and state["buy_type"] == BuyType.NEW_HOME:
                msg_parts.append(f"For new homes in other areas, please call our office at {COMPANY_PHONE_NUMBER}.")
            else:
                msg_parts.append(f"However, please call {COMPANY_PHONE_NUMBER} as we might still be able to assist or refer you.")

            msg_parts.append("Is there anything else I can help you with today? (yes/no)")
            msg = " ".join(msg_parts)
            log_interaction(state, "get_postcode_not_covered", user_msg_raw, msg, {"normalized_pc": norm_pc, "suggestion": suggestion})

        state["messages"].append(AIMessage(content=msg))
        state["conversation_stage"] = ConversationStage.AWAITING_REASSISTANCE
        state["attempts"] = 0
        return state

# Initialize Agents
intent_classifier_agent = EnhancedIntentClassifierAgent(llm)
info_gatherer_agent = EnhancedInfoGathererAgent(llm)
budget_processor_agent = EnhancedBudgetProcessorAgent(llm)
eligible_postcodes_set, eligible_postcodes_list = load_eligible_postcodes(POSTCODE_FILE)
postcode_processor_agent = EnhancedPostcodeProcessorAgent(eligible_postcodes_set, eligible_postcodes_list, embedding_model, llm)

Loaded 100 eligible postcodes from /content/drive/MyDrive/uk_postcodes 1.csv
FAISS index built with 100 postcodes, dimension 768.


In [None]:
# LangGraph Node Functions
def create_initial_state() -> ChatState:
    return {
        "messages": [], "intent": None, "buy_type": None, "name": None, "phone": None,
        "email": None, "budget": None, "postcode": None, "postcode_covered": None,
        "suggested_postcode": None, "attempts": 0, "conversation_ended": False,
        "conversation_stage": ConversationStage.GREETING,
        "last_error": None, "session_id": str(uuid.uuid4()), "interaction_history": []
    }

def initial_greeting_node(state: ChatState) -> ChatState:
    if state.get("conversation_stage") == ConversationStage.GREETING:
        print("INITIAL_GREETING_NODE: Performing initial greeting setup.")
        state["messages"] = [
            SystemMessage(content="You are RealtyFlow AI, a friendly and professional assistant for a real estate agency. Your goal is to qualify leads by asking targeted questions. Be concise and clear."),
            AIMessage(content="Hello! I'm the RealtyFlow AI assistant. Are you looking to buy or sell a property today?")
        ]
        state["conversation_stage"] = ConversationStage.AWAITING_INTENT
        state["attempts"] = 0
        log_interaction(state, "greeting_sent_from_initial_node", bot_response=state["messages"][-1].content)
    else:
        print(f"INITIAL_GREETING_NODE: Pass-through, stage is {state.get('conversation_stage')}.")
    return state

def handle_intent(state: ChatState) -> ChatState:
    if not state["messages"] or not isinstance(state["messages"][-1], HumanMessage):
        print("HANDLE_INTENT: Called without a HumanMessage as the last message. This should not happen if routing is correct.")
        return state

    user_msg = state["messages"][-1].content
    intent_val = intent_classifier_agent.classify_intent(user_msg)

    if intent_val == Intent.UNKNOWN:
        state["attempts"] += 1
        state["messages"].append(AIMessage(content="I'm sorry, I didn't quite catch that. Are you looking to buy or sell?"))
        log_interaction(state, "intent_unknown", user_msg, state["messages"][-1].content)
    else:
        state["intent"] = intent_val
        state["messages"].append(AIMessage(content=f"Great, you're looking to {intent_val.value}! To get started, can I please have your full name?"))
        state["conversation_stage"] = ConversationStage.AWAITING_NAME
        state["attempts"] = 0
        log_interaction(state, "intent_classified", user_msg, state["messages"][-1].content, {"intent": intent_val.value})
    return state

def handle_name(state: ChatState) -> ChatState:
    return info_gatherer_agent.get_name(state)

def handle_phone(state: ChatState) -> ChatState:
    return info_gatherer_agent.get_phone(state)

def handle_email(state: ChatState) -> ChatState:
    return info_gatherer_agent.get_email(state)

def handle_buy_type(state: ChatState) -> ChatState:
    if not state["messages"] or not isinstance(state["messages"][-1], HumanMessage):
        print("HANDLE_BUY_TYPE: Called without HumanMessage.")
        return state

    user_msg = state["messages"][-1].content
    buy_type_val = intent_classifier_agent.classify_buy_type(user_msg)

    if buy_type_val == BuyType.UNKNOWN:
        state["attempts"] += 1
        state["messages"].append(AIMessage(content="Sorry, I'm not sure if that's a new home or a re-sale. Could you clarify? (e.g., 'new build', 'existing property')"))
        log_interaction(state, "buy_type_unknown", user_msg, state["messages"][-1].content)
    else:
        state["buy_type"] = buy_type_val
        state["messages"].append(AIMessage(content=f"Got it, a {buy_type_val.value.replace('_', ' ')} property. What's your approximate budget?"))
        state["conversation_stage"] = ConversationStage.AWAITING_BUDGET
        state["attempts"] = 0
        log_interaction(state, "buy_type_classified", user_msg, state["messages"][-1].content, {"buy_type": buy_type_val.value})
    return state

def handle_budget(state: ChatState) -> ChatState:
    return budget_processor_agent.process_budget(state)

def handle_postcode(state: ChatState) -> ChatState:
    return postcode_processor_agent.process_postcode(state)

def handle_reassistance(state: ChatState) -> ChatState:
    if not state["messages"] or not isinstance(state["messages"][-1], HumanMessage):
        print("HANDLE_REASSISTANCE: Called without HumanMessage.")
        return state

    user_msg = state["messages"][-1].content
    choice = intent_classifier_agent.classify_yes_no(user_msg)

    if choice == YesNo.YES:
        print("HANDLE_REASSISTANCE: User chose YES to restart.")
        session_id = state["session_id"]
        history = state["interaction_history"]

        new_state_dict = create_initial_state()
        new_state_dict["session_id"] = session_id
        new_state_dict["interaction_history"] = history

        state.update(new_state_dict)
        state["messages"].append(AIMessage(content="Okay, let's start over! Are you looking to buy or sell a property today?"))
        state["conversation_stage"] = ConversationStage.AWAITING_INTENT

        log_interaction(state, "reassistance_yes_restart", user_msg, state["messages"][-1].content)

    elif choice == YesNo.NO:
        state["messages"].append(AIMessage(content="Thank you for chatting with RealtyFlow AI. Goodbye!"))
        state["conversation_ended"] = True
        state["conversation_stage"] = ConversationStage.ENDED
        log_interaction(state, "reassistance_no_end", user_msg, state["messages"][-1].content)
    else:
        state["attempts"] += 1
        state["messages"].append(AIMessage(content="I didn't quite get that. Could you please answer with 'yes' or 'no'?"))
        log_interaction(state, "reassistance_unknown", user_msg, state["messages"][-1].content)
    return state

def handle_max_attempts_fallback(state: ChatState) -> ChatState:
    print("MAX_ATTEMPTS_FALLBACK: Max attempts reached.")
    state["messages"].append(AIMessage(content=f"I'm having a bit of trouble understanding. For further assistance, please call our office at {COMPANY_PHONE_NUMBER}. Thank you."))
    state["conversation_ended"] = True
    state["conversation_stage"] = ConversationStage.ENDED
    log_interaction(state, "max_attempts_reached", bot_response=state["messages"][-1].content)
    return state

def route_next_step(state: ChatState) -> str:
    current_stage_val = state.get("conversation_stage").value if isinstance(state.get("conversation_stage"), Enum) else state.get("conversation_stage")
    last_msg_type = state['messages'][-1].type if state['messages'] else 'None'
    print(f"ROUTE_NEXT_STEP: Stage: {current_stage_val}, LastMsg: {last_msg_type}, Attempts: {state['attempts']}")

    if state["conversation_ended"]:
        print("ROUTE_NEXT_STEP: Conversation ended. Routing to END.")
        return END

    if state["attempts"] >= MAX_ATTEMPTS:
        print(f"ROUTE_NEXT_STEP: Max attempts ({state['attempts']}) reached. Routing to max_attempts_fallback.")
        return "max_attempts_fallback"

    if state["messages"] and isinstance(state["messages"][-1], AIMessage):
        print(f"ROUTE_NEXT_STEP: AI just spoke (last msg: '{state['messages'][-1].content[:50]}...'). Returning END to await user input.")
        return END

    stage = state["conversation_stage"]
    print(f"ROUTE_NEXT_STEP: Last message was Human or pre-greeting. Routing based on stage: {stage.value}")

    if stage == ConversationStage.GREETING:
        print("ROUTE_NEXT_STEP: Stage is GREETING, but AI hasn't spoken. Should be handled by initial_greeting_node. Forcing to initial_greeting.")
        return "initial_greeting_node"
    elif stage == ConversationStage.AWAITING_INTENT:
        return "handle_intent"
    elif stage == ConversationStage.AWAITING_NAME:
        return "handle_name"
    elif stage == ConversationStage.AWAITING_PHONE:
        return "handle_phone"
    elif stage == ConversationStage.AWAITING_EMAIL:
        return "handle_email"
    elif stage == ConversationStage.AWAITING_BUY_TYPE:
        return "handle_buy_type"
    elif stage == ConversationStage.AWAITING_BUDGET:
        return "handle_budget"
    elif stage == ConversationStage.AWAITING_POSTCODE:
        return "handle_postcode"
    elif stage == ConversationStage.AWAITING_REASSISTANCE:
        return "handle_reassistance"
    else:
        print(f"ROUTE_NEXT_STEP: Unknown routing condition for stage: {stage.value}. Defaulting to END to prevent loop.")
        return END

# Build the Graph
workflow = StateGraph(ChatState)

workflow.add_node("initial_greeting_node", initial_greeting_node)
workflow.add_node("handle_intent", handle_intent)
workflow.add_node("handle_name", handle_name)
workflow.add_node("handle_phone", handle_phone)
workflow.add_node("handle_email", handle_email)
workflow.add_node("handle_buy_type", handle_buy_type)
workflow.add_node("handle_budget", handle_budget)
workflow.add_node("handle_postcode", handle_postcode)
workflow.add_node("handle_reassistance", handle_reassistance)
workflow.add_node("max_attempts_fallback", handle_max_attempts_fallback)

router_path_map = {
    "initial_greeting_node": "initial_greeting_node",
    "handle_intent": "handle_intent",
    "handle_name": "handle_name",
    "handle_phone": "handle_phone",
    "handle_email": "handle_email",
    "handle_buy_type": "handle_buy_type",
    "handle_budget": "handle_budget",
    "handle_postcode": "handle_postcode",
    "handle_reassistance": "handle_reassistance",
    "max_attempts_fallback": "max_attempts_fallback",
    END: END
}

workflow.set_entry_point("initial_greeting_node")

workflow.add_conditional_edges(
    "initial_greeting_node",
    route_next_step,
    router_path_map
)

processing_nodes = ["handle_intent", "handle_name", "handle_phone", "handle_email",
                    "handle_buy_type", "handle_budget", "handle_postcode", "handle_reassistance"]
for node_name in processing_nodes:
    workflow.add_conditional_edges(node_name, route_next_step, router_path_map)

workflow.add_edge("max_attempts_fallback", END)

app = workflow.compile()

In [None]:
# Colab Chatbot Interface
def run_chatbot():
    print("RealtyFlow AI Chatbot Initialized. Type 'quit' to exit, 'restart' to begin again.")

    current_state = create_initial_state()

    try:
        current_state = app.invoke(current_state, {"recursion_limit": 50})

        if current_state["messages"] and current_state["messages"][-1].type == "ai":
            print(f"RealtyFlow AI: {current_state['messages'][-1].content}")
        else:
             print("RealtyFlow AI: Welcome! How can I help you buy or sell a property?")
             current_state["messages"].append(AIMessage(content="Welcome! How can I help you buy or sell a property?"))
             current_state["conversation_stage"] = ConversationStage.AWAITING_INTENT
             log_interaction(current_state, "manual_greeting_fallback", bot_response="Welcome! ...")

    except Exception as e:
        print(f"Error during initial graph invocation: {e}")
        print("RealtyFlow AI: Sorry, I'm having trouble starting up. Please try again later.")
        return

    while not current_state.get("conversation_ended", False):
        user_input = input("You: ").strip()

        if not user_input:
            print("RealtyFlow AI: Please provide some input.")
            continue

        if user_input.lower() == 'quit':
            print("RealtyFlow AI: Thank you for chatting. Goodbye!")
            log_interaction(current_state, "user_quit", user_input, "Thank you for chatting. Goodbye!")
            break

        if user_input.lower() == 'restart':
            print("RealtyFlow AI: Okay, restarting the conversation...")
            session_id = current_state["session_id"]
            history = current_state["interaction_history"]
            current_state = create_initial_state()
            current_state["session_id"] = session_id
            current_state["interaction_history"] = history
            log_interaction(current_state, "user_restart_command", user_input)
            try:
                current_state = app.invoke(current_state, {"recursion_limit": 50})
                if current_state["messages"] and current_state["messages"][-1].type == "ai":
                    print(f"RealtyFlow AI: {current_state['messages'][-1].content}")
                else:
                    print("RealtyFlow AI: Hello again! Are you looking to buy or sell?")
                continue
            except Exception as e:
                print(f"Error during restart graph invocation: {e}")
                print("RealtyFlow AI: Sorry, I'm having trouble restarting. Please try again later.")
                break

        current_state["messages"].append(HumanMessage(content=user_input))
        log_interaction(current_state, "user_message_received", user_input)

        try:
            current_state = app.invoke(current_state, {"recursion_limit": 50})

            if current_state["messages"] and current_state["messages"][-1].type == "ai":
                print(f"RealtyFlow AI: {current_state['messages'][-1].content}")

        except Exception as e:
            print(f"Unhandled error during graph execution: {e}")
            current_state["last_error"] = str(e)
            error_msg = f"I'm sorry, an unexpected issue occurred. Please try again or contact support at {COMPANY_PHONE_NUMBER} if the problem persists."
            current_state["messages"].append(AIMessage(content=error_msg))
            print(f"RealtyFlow AI: {error_msg}")
            log_interaction(current_state, "graph_execution_error", details={"error": str(e)})

    print(f"Chat session {current_state['session_id']} ended. Stage: {current_state.get('conversation_stage', 'UNKNOWN').value if isinstance(current_state.get('conversation_stage'), Enum) else current_state.get('conversation_stage', 'UNKNOWN')}")
    print("--- End of Session ---")

if __name__ == '__main__':
    run_chatbot()

RealtyFlow AI Chatbot Initialized. Type 'quit' to exit, 'restart' to begin again.
INITIAL_GREETING_NODE: Performing initial greeting setup.
ROUTE_NEXT_STEP: Stage: awaiting_intent, LastMsg: ai, Attempts: 0
ROUTE_NEXT_STEP: AI just spoke (last msg: 'Hello! I'm the RealtyFlow AI assistant. Are you lo...'). Returning END to await user input.
RealtyFlow AI: Hello! I'm the RealtyFlow AI assistant. Are you looking to buy or sell a property today?
You: Yes I am looking for sell a property 
INITIAL_GREETING_NODE: Pass-through, stage is ConversationStage.AWAITING_INTENT.
ROUTE_NEXT_STEP: Stage: awaiting_intent, LastMsg: human, Attempts: 0
ROUTE_NEXT_STEP: Last message was Human or pre-greeting. Routing based on stage: awaiting_intent




Intent classification for 'Yes I am looking for sell a property': sell
ROUTE_NEXT_STEP: Stage: awaiting_name, LastMsg: ai, Attempts: 0
ROUTE_NEXT_STEP: AI just spoke (last msg: 'Great, you're looking to sell! To get started, can...'). Returning END to await user input.
RealtyFlow AI: Great, you're looking to sell! To get started, can I please have your full name?
You: Mohd Kaif 
INITIAL_GREETING_NODE: Pass-through, stage is ConversationStage.AWAITING_NAME.
ROUTE_NEXT_STEP: Stage: awaiting_name, LastMsg: human, Attempts: 0
ROUTE_NEXT_STEP: Last message was Human or pre-greeting. Routing based on stage: awaiting_name




ROUTE_NEXT_STEP: Stage: awaiting_phone, LastMsg: ai, Attempts: 0
ROUTE_NEXT_STEP: AI just spoke (last msg: 'Thanks, Mohd Kaif! Can I get your phone number?...'). Returning END to await user input.
RealtyFlow AI: Thanks, Mohd Kaif! Can I get your phone number?
You: +91 8755714681
INITIAL_GREETING_NODE: Pass-through, stage is ConversationStage.AWAITING_PHONE.
ROUTE_NEXT_STEP: Stage: awaiting_phone, LastMsg: human, Attempts: 0
ROUTE_NEXT_STEP: Last message was Human or pre-greeting. Routing based on stage: awaiting_phone




ROUTE_NEXT_STEP: Stage: awaiting_email, LastMsg: ai, Attempts: 0
ROUTE_NEXT_STEP: AI just spoke (last msg: 'Great. And your email address?...'). Returning END to await user input.
RealtyFlow AI: Great. And your email address?
You: kaifahmad087@gmail.com 
INITIAL_GREETING_NODE: Pass-through, stage is ConversationStage.AWAITING_EMAIL.
ROUTE_NEXT_STEP: Stage: awaiting_email, LastMsg: human, Attempts: 0
ROUTE_NEXT_STEP: Last message was Human or pre-greeting. Routing based on stage: awaiting_email




ROUTE_NEXT_STEP: Stage: awaiting_postcode, LastMsg: ai, Attempts: 0
ROUTE_NEXT_STEP: AI just spoke (last msg: 'What is the postcode of the property you're sellin...'). Returning END to await user input.
RealtyFlow AI: What is the postcode of the property you're selling?
You: 243005 
INITIAL_GREETING_NODE: Pass-through, stage is ConversationStage.AWAITING_POSTCODE.
ROUTE_NEXT_STEP: Stage: awaiting_postcode, LastMsg: human, Attempts: 0
ROUTE_NEXT_STEP: Last message was Human or pre-greeting. Routing based on stage: awaiting_postcode
Postcode '243005' (checked as '243 005') failed regex: ^[A-Z]{1,2}[0-9][A-Z0-9]?\s?[0-9][A-Z]{2}$
ROUTE_NEXT_STEP: Stage: awaiting_postcode, LastMsg: ai, Attempts: 1
ROUTE_NEXT_STEP: AI just spoke (last msg: 'That postcode doesn't look quite right. Postcode d...'). Returning END to await user input.
RealtyFlow AI: That postcode doesn't look quite right. Postcode does not match the typical UK format (e.g., SW1A 1AA or M1 1AE). Please try again (e.g., 'SW1A 1AA