
# 🏠 Tenant Chatbot — Sprint 2 (LLM + RAG + Agent + Memory)    
- 保留：PDF 加载、向量库、Prompt 模板、RAG 问答的原有单元  
- 新增：`TenantChatbot` 类（多意图入口：合同检索 / 工具代理 / 对话记忆）  
- 可选：FAISS / Chroma 后端、简单工具（租金计算）  



## 1. 导入依赖库 / Import libraries


In [1]:

from __future__ import annotations

import os
from typing import List, Any, Dict

# LangChain core
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA, ConversationChain
from langchain.agents import initialize_agent, AgentType
from langchain.memory import ConversationBufferMemory
from langchain.prompts import ChatPromptTemplate
from langchain.tools import Tool
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Utilities
import re
import hashlib
import numpy as np

print('✅ Libraries imported.')

✅ Libraries imported.



## 2. 全局配置 / Global Config
- 读取 `OPENAI_API_KEY`（推荐从环境变量中读取）  
- 可切换向量/模型后端：`OPENAI` 或 `LOCAL`（本地伪嵌入，便于无 Key 情况下跑通流程演示）


In [4]:

# === API Key ===
from dotenv import load_dotenv
load_dotenv()  # 读取 .env 文件
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY', '').strip()

# === Backend Switches ===
EMBEDDINGS_BACKEND = os.getenv('EMBEDDINGS_BACKEND', 'OPENAI').upper()   # 'OPENAI' or 'LOCAL'
VECTORSTORE_BACKEND = os.getenv('VECTORSTORE_BACKEND', 'CHROMA').upper() # 'CHROMA' (default)

PDF_PATH = 'Track_B_Tenancy_Agreement.pdf'  # 请将合同放在同目录 / Put the PDF in the same folder

print(f'🔐 OPENAI_API_KEY set: {bool(OPENAI_API_KEY)}')
print(f'🧠 EMBEDDINGS_BACKEND = {EMBEDDINGS_BACKEND}')
print(f'💾 VECTORSTORE_BACKEND = {VECTORSTORE_BACKEND}')
print(f'📄 PDF_PATH = {PDF_PATH}')

🔐 OPENAI_API_KEY set: True
🧠 EMBEDDINGS_BACKEND = OPENAI
💾 VECTORSTORE_BACKEND = CHROMA
📄 PDF_PATH = Track_B_Tenancy_Agreement.pdf



## 3. 加载合同文档 / Load the tenancy agreement PDF
将真实合同读入为 LangChain 文档对象，供后续分块与向量化。  


In [5]:

# 原有加载逻辑（保持不动 / Kept as-is）
try:
    loader = PyPDFLoader(PDF_PATH)
    docs = loader.load()
    print(f'📄 成功加载 {len(docs)} 页 / Loaded {len(docs)} pages.')
except Exception as e:
    print('❗无法加载PDF，请检查文件是否存在。/ Failed to load PDF.')
    print('Error:', e)
    docs = []

📄 成功加载 10 页 / Loaded 10 pages.



## 4. 向量化与知识库 / Embeddings & Vector Store
- **保留**你的向量库思路（Chroma 默认）  
- **新增**一个本地伪嵌入类：在无 Key 环境下也能完成演示（不可用于真实效果）


In [6]:
# 创建文本切分器
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,      # 每块最多500个字符
    chunk_overlap=50,    # 相邻块重叠50字符，保证语义连续
)

# 生成分块
splits = text_splitter.split_documents(docs)
print(f'✅ 已切分为 {len(splits)} 个文本块 / Chunks ready.')
#print("示例 Chunk 内容:\n", splits[5].page_content[:300])

# 选择要展示的数量
num_show = 5  

for i, chunk in enumerate(splits[:num_show]):
    print(f"🧩 Chunk {i}")
    print(chunk.page_content[:300].strip())  # 只展示前300字符，避免太长
    print("-" * 80)

✅ 已切分为 67 个文本块 / Chunks ready.
🧩 Chunk 0
TENANCY AGREEMENT (PRIVATE
CONDO/APARTMENT)
Page 1 of 6 LEG-AG-17.16 28/07/23
Disclaimer: This is a general document which may not be appropriate for use in all cases. When in doubt, please seek
legal advice. In the event of a dispute, the Landlord/Tenant agree not to hold Sterling Properties Pte Lt
--------------------------------------------------------------------------------
🧩 Chunk 1
been done with the consent and agreement of both parties prior to the signing of the agreement.
TENANCY AGREEMENT (PRIVATE CONDO/APARTMENT)
THIS AGREEMENT is made on the 20 _ JAN 2024
BETWEEN
Name: Peter Richardson Williams
NRIC: S8634521G
Address: 125 Marine Parade Road #12-08, Singapore 449735
Ema
--------------------------------------------------------------------------------
🧩 Chunk 2
successors and assigns) of the one part.
AND
Name: Michael Thompson Anderson
Fin: M4782619K
Address: c/o GlobalTech Solutions Pte Ltd
18 Cross Street #15-02
Singapore 048423
E

In [7]:
# 使用真实 OpenAI 嵌入器
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",   # 或 text-embedding-3-large
    openai_api_key=OPENAI_API_KEY
)
print("✅ Embeddings ready: OpenAIEmbeddings (using API)")

✅ Embeddings ready: OpenAIEmbeddings (using API)


In [8]:

# 构建向量数据库（默认 Chroma）/ Build vector store
if not docs:
    print('⚠️ 没有文档可用于构建向量库 / No docs for vector store.')
    vectorstore = None
else:
    # Chroma in-memory; you can set persist_directory for persistence
    vectorstore = Chroma.from_documents(docs, embedding=embeddings)
    print('✅ Vector store ready: Chroma (memory)')

✅ Vector store ready: Chroma (memory)



## 5. Prompt 模板 / Prompt design  
强制输出结构：  
1) Short answer  2) Clause reference  3) Source snippet  


In [9]:

contract_prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are a professional Singapore tenancy-law assistant. "
     "Use the given contract context to answer clearly and cite the relevant clause."),
    ("human",
     "Context:\n{context}\n\n"
     "Question:\n{user_query}\n\n"
     "Answer format:\n"
     "1. Short answer\n"
     "2. Clause reference\n"
     "3. Source snippet")
])
print("🧾 Template: Contract-based Q&A Assistant Created")

🧾 Template: Contract-based Q&A Assistant Created



## 6. 基于合同的 RAG 问答 / Retrieval-Augmented Generation (RAG)
> （保留你原本的思路，并增加健壮性检查）


In [10]:

# 初始化 LLM
if OPENAI_API_KEY:
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2, api_key=OPENAI_API_KEY)
else:
    # 仍创建对象以保持接口一致（如果 SDK 强校验，会抛错；建议设置 Key）
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2, api_key="")
    print("⚠️ 未设置 OPENAI_API_KEY，后续真实问答将无法工作。Set OPENAI_API_KEY to use real LLM.")

# 创建 QA 链
if vectorstore is not None:
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=vectorstore.as_retriever()
    )
    print('✅ RetrievalQA chain is ready.')
else:
    qa_chain = None
    print('⚠️ QA chain skipped due to missing vectorstore.')

✅ RetrievalQA chain is ready.



## 7. Prompt 格式化示例 / Prompt formatting example  


In [11]:

sample_query = "Who is responsible for aircon maintenance?"
sample_context = (
    "Clause 2(j): The tenant shall be responsible for minor repairs not exceeding S$200. "
    "Air-conditioning servicing to be carried out once every three months by the tenant."
)
formatted_prompt = contract_prompt.format_messages(
    context=sample_context,
    user_query=sample_query
)
print('🔧 Formatted messages preview:')
for m in formatted_prompt:
    print(f'[{m.type}] {m.content[:120]}...')

🔧 Formatted messages preview:
[system] You are a professional Singapore tenancy-law assistant. Use the given contract context to answer clearly and cite the re...
[human] Context:
Clause 2(j): The tenant shall be responsible for minor repairs not exceeding S$200. Air-conditioning servicing ...



## 8. 工具（示例：租金计算）/ Tools (Example: Rent Calculator)
- 示例工具：`calculate_rent_tool`（从句子中提取**月租**和**月数**，计算总租金）  
- 可扩展更多工具（如：押金计算、滞纳金、维修共享比例等）


In [None]:

def calculate_rent_tool(query: str) -> str:
    """从自然语言中提取 (monthly_rent, months) 并估算总租金。
    Extract (monthly_rent, months) from text and compute total rent.
    示例: "Calculate total rent if monthly rent is $2500 for 15 months."
    """
    nums = [int(x) for x in re.findall(r"\d+", query)]
    monthly = months = None
    if len(nums) >= 2:
        # 朴素假设：第一个数=月租，第二个数=月数 / naive assumption
        monthly, months = nums[0], nums[1]
        total = monthly * months
        return f"💰 Estimated total rent for {months} months at ${monthly}/mo: **${total}**."
    return "Please provide both the monthly rent and the number of months (e.g., '$2500 for 15 months')."

calculate_rent = Tool.from_function(
    func=calculate_rent_tool,
    name="calculate_rent",
    description="Calculate total rent given monthly rent and number of months from natural language."
)
print('🧰 Tool ready: calculate_rent')


## 9. 记忆与 Agent / Memory & Agent
- `ConversationBufferMemory` 记录对话上下文  
- `initialize_agent` 构建工具代理（用于计算类问题等）


In [None]:

memory = ConversationBufferMemory()

agent = initialize_agent(
    tools=[calculate_rent],
    llm=llm,
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=False
)

print('🧠 Memory ready. 🤖 Agent ready.')


## 10. 集成封装：TenantChatbot 类（融合架构）  
- **不删除**以上单元的前提下，提供一个**统一入口**用于实际调用：  
  - 合同条款 / RAG：`RetrievalQA`  
  - 工具 / 计算：`Agent + Tools`  
  - 一般对话：`ConversationChain + Memory`


In [None]:

class TenantChatbot:
    """统一多意图入口的租赁合同 Chatbot / Unified multi-intent Tenant Chatbot."""
    def __init__(self, docs, vectorstore, llm, memory, qa_chain, agent):
        self.docs = docs
        self.vectorstore = vectorstore
        self.llm = llm
        self.memory = memory
        self.qa_chain = qa_chain
        self.agent = agent
        self.conversation = ConversationChain(llm=self.llm, memory=self.memory)

        # 可按需扩展的关键字（可迁移到配置 / You can externalize these intent keywords）
        self.contract_keywords = [
            'clause', 'tenant', 'landlord', 'terminate', 'repair', 'deposit',
            'renewal', 'maintenance', 'aircon', 'breach', 'notice', 'early termination'
        ]
        self.calc_keywords = ['calculate', 'rent', 'payment', 'fee', 'total']

    def process_query(self, query: str) -> str:
        q = query.lower()

        # 1) 合同条款类问题 → 使用 RAG（向量检索 + LLM）
        if any(k in q for k in self.contract_keywords):
            if not self.qa_chain:
                return 'RAG 未就绪：缺少向量库或 LLM 配置。/ RAG is not ready (missing vector store or LLM).'
            return self.qa_chain.run(query)

        # 2) 计算/工具类问题 → 交给 Agent 与工具
        if any(k in q for k in self.calc_keywords):
            try:
                return self.agent.run(query)
            except Exception as e:
                return f'Agent 执行失败 / Agent failed: {e}'

        # 3) 一般性交流或指导 → 走记忆会话
        try:
            return self.conversation.invoke({"input": query})["response"]
        except Exception as e:
            return f'会话失败 / Conversation failed: {e}'

print('🏗️ TenantChatbot class ready.')


## 11. 统一入口测试 / Unified Entry Tests
> 你可以直接修改下面的 `test_queries` 进行自测。


In [None]:

chatbot = TenantChatbot(
    docs=docs,
    vectorstore=vectorstore,
    llm=llm,
    memory=memory,
    qa_chain=qa_chain,
    agent=agent
)

test_queries = [
    # 合同条款类（RAG）
    "Who is responsible for aircon maintenance?",
    "Can I terminate the lease early?",
    "What does the clause say about deposit refund?",
    # 计算类（Agent）
    "Calculate total rent if monthly rent is $2500 for 15 months.",
    # 一般对话（Memory）
    "I'm confused about my lease renewal. What should I check first?"
]

for q in test_queries:
    print('\n' + '='*70)
    print('Q:', q)
    try:
        ans = chatbot.process_query(q)
        print('A:', ans)
    except Exception as e:
        print('❗Error running query:', e)


## 12. 说明与后续扩展 / Notes & Next Steps
- 若需要 **FAISS** 版本，只需将 `Chroma.from_documents(...)` 替换为 `FAISS.from_documents(...)`（并导入相应模块）。  
- 如果需要 **持久化**，可为 Chroma 设置 `persist_directory`，在下次启动时 `Chroma(persist_directory=..., embedding=...)` 进行加载。  
- 工具可扩展：如**押金扣除估算**、**维修费用分担**、**滞纳金**计算等。  
- Prompt 工程：可在 `contract_prompt` 中强制引用**具体条款编号**并限制输出格式，让报告更规范。  
- 部署：建议结合 FastAPI/Streamlit 进行接口/前端演示，与 C 分工对接。  
