# 🧱 Track B – Contract Retrieval Demo (No API Version)

This notebook demonstrates **A group (LLM & RAG)** logic **without using OpenAI API**.

✅ It extracts and splits the contract PDF.
✅ Generates **random embeddings** instead of calling API (no cost).
✅ Builds a FAISS index for similarity search.
✅ Allows you to ask questions and retrieve the most relevant clauses.

---
本 Notebook 演示 **A 组的合同检索流程（不使用 API）**。

✅ 从 PDF 合同读取并分块。
✅ 使用随机向量模拟嵌入（无需花费额度）。
✅ 用 FAISS 建立相似度索引。
✅ 可输入问题，检索最相关的合同条款。


## 1️⃣ Import libraries / 导入依赖

In [7]:
import pdfplumber, re, numpy as np, faiss
import pandas as pd
from IPython.display import display

print('✅ Libraries imported successfully')

✅ Libraries imported successfully


## 2️⃣ Extract and split contract / 读取并分块合同

In [8]:
def extract_contract_chunks(pdf_path):
    """
    Read PDF and split text by clauses.
    读取 PDF 文件，并按条款（Clause 或 (a)(b)）拆分为文本块。
    """
    chunks = []
    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            text = page.extract_text()
            if not text:
                continue
            parts = re.split(r'\n(?=Clause\s*\d|\([a-z]\))', text)
            for p in parts:
                clean = p.strip()
                if len(clean) > 120:
                    chunks.append(clean)
    print(f'📄 Extracted {len(chunks)} text chunks from contract.')
    return chunks

clauses = extract_contract_chunks('Track_B_Tenancy_Agreement.pdf')
print('\nExample clause preview:')
print(clauses[0][:300], '...')

📄 Extracted 50 text chunks from contract.

Example clause preview:
TENANCY AGREEMENT (PRIVATE
CONDO/APARTMENT)
Page 1 of 6 LEG-AG-17.16 28/07/23
Disclaimer: This is a general document which may not be appropriate for use in all cases. When in doubt, please seek
legal advice. In the event of a dispute, the Landlord/Tenant agree not to hold Sterling Properties Pte Lt ...


## 3️⃣ Generate random embeddings / 生成随机嵌入向量（不使用 API）

In [9]:
embedding_dim = 1536  # Typical size for text-embedding-3-large
np.random.seed(42)
embeddings = np.random.rand(len(clauses), embedding_dim).astype('float32')
print('✅ Random embeddings generated:', embeddings.shape)

✅ Random embeddings generated: (50, 1536)


## 4️⃣ Build FAISS index / 建立相似度索引

In [10]:
index = faiss.IndexFlatL2(embedding_dim)
index.add(embeddings)
print('✅ FAISS index built successfully. Number of clauses indexed:', index.ntotal)

✅ FAISS index built successfully. Number of clauses indexed: 50


## 5️⃣ Retrieve relevant clauses / 输入问题并检索最相关条款

In [11]:
def embed_query_simulation(query):
    """
    Generate a random embedding for query to simulate vector search.
    为问题生成随机向量，模拟查询嵌入。
    """
    np.random.seed(abs(hash(query)) % (2**32))
    return np.random.rand(1, embedding_dim).astype('float32')

def retrieve_clauses(query, k=3):
    qvec = embed_query_simulation(query)
    D, I = index.search(qvec, k)
    results = [(int(I[0][i]), float(D[0][i])) for i in range(k)]
    df = pd.DataFrame([
        {'Rank': i+1, 'Clause_ID': idx, 'Distance': dist, 'Content': clauses[idx][:300]+'...'}
        for i,(idx,dist) in enumerate(results)
    ])
    display(df)
    return df

## 6️⃣ Demo sample queries / 示例问题演示

In [12]:
sample_questions = [
    'What is the diplomatic clause?',
    'Who pays for repairs?',
    'When should rent be paid?',
    'Do I need to repaint before moving out?'
]

for q in sample_questions:
    print('\n❓ Question:', q)
    retrieve_clauses(q)
    print('-'*80)



❓ Question: What is the diplomatic clause?


Unnamed: 0,Rank,Clause_ID,Distance,Content
0,1,22,241.003479,"(x) COMPLIANCE WITH LAW, RULES AND REGULATIONS..."
1,2,2,244.444962,(c) PAYMENT OF OUTGOINGS\nTo pay all charges d...
2,3,16,245.456818,(q) NOT TO CAUSE NUISANCE\nNot to do or permit...


--------------------------------------------------------------------------------

❓ Question: Who pays for repairs?


Unnamed: 0,Rank,Clause_ID,Distance,Content
0,1,21,240.180206,(w) PETS\nNot to keep or permit to be kept in ...
1,2,32,240.728912,(b) PAYMENT OF PROPERTY TAX\nTo pay all proper...
2,3,5,241.726105,(g) REPLACEMENT OF ITEMS\nTo replace any other...


--------------------------------------------------------------------------------

❓ Question: When should rent be paid?


Unnamed: 0,Rank,Clause_ID,Distance,Content
0,1,39,237.710175,(d) REIMBURSEMENT OF PRO-RATA COMMISSION\nIf t...
1,2,4,241.493942,(e) MAINTENANCE OF DEMISED PREMISES\nAt the Te...
2,3,24,244.554016,(z) HANGING OF PICTURE FRAME\nNot to hack any ...


--------------------------------------------------------------------------------

❓ Question: Do I need to repaint before moving out?


Unnamed: 0,Rank,Clause_ID,Distance,Content
0,1,17,240.985703,(r) USE OF PREMISES\nTo use the Premises as a ...
1,2,6,242.852615,(h) TO INDEMNIFY THE LANDLORD\nTo be responsib...
2,3,2,245.202988,(c) PAYMENT OF OUTGOINGS\nTo pay all charges d...


--------------------------------------------------------------------------------
