Contract Q&A application with Retrieval-Augmented Generation (RAG):
- Upload PDF/DOCX contracts
- Index contract text into Chroma
- Chat with grounded answers and chunk citations
- Run E2E RAG evaluation (G-Eval) for the selected document
- Manage stored documents from the web UI
- Backend: FastAPI
- Frontend: FastAPI-served HTML/CSS/JavaScript (Bootstrap)
- Vector DB: ChromaDB
- LLM provider: OpenAI API
- Chat model:
OPENAI_CHAT_MODEL - Embedding model:
OPENAI_EMBEDDING_MODEL - Rerank model:
OPENAI_RERANK_MODEL
- Chat model:
app/
api/main.py
core/config.py
core/schemas.py
services/
chat_memory.py
chunking.py
document_parser.py
openai_client.py
rag_pipeline.py
vector_store.py
data/
chroma/
processed/
uploads/
static/
css/chatbot.css
js/chatbot.js
templates/
chatbot.html
run_api.py
requirements.txt
- Create and activate virtual env:
python -m venv .venv
.venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Configure environment:
copy .env.example .envSet OPENAI_API_KEY in .env.
- Run API + web UI:
python run_api.py- Open browser:
http://127.0.0.1:8000/
- Document upload
- UI sends file to
POST /api/documents/upload. - Backend validates file type and reads bytes.
- A stable
doc_idis generated from filename + content hash.
- Parsing and chunking
document_parser.pyextracts contract text sections.chunking.pycreates chunks:- normal sections: semantic paragraph-first chunking with sentence-aware splitting
- overlap retention across chunks for context continuity
- list-like sections (appendix/keyword-heavy): line-based chunking
- Embedding and indexing
openai_client.pybuilds embeddings.vector_store.pystores vectors and metadata in Chroma.
- Chat pipeline
- UI sends question to
POST /api/chat/{session_id}/{doc_id}. rag_pipeline.pydoes:- list-intent detection
- hybrid retrieval (vector + keyword BM25)
- Reciprocal Rank Fusion (RRF) to combine vector + keyword rankings
- rerank (LLM rerank by default, embedding rerank fallback)
- score-threshold filtering before answer generation
- citation-aware ordering (chunks explicitly cited by model are prioritized)
- grounded answer generation with citations
- Guardrails
- Response includes confidence and grounded status.
- Low-confidence answers are flagged in output.
- If no retrieval hits exist, the assistant returns:
I do not have enough evidence from this document to answer.
- E2E GEval pipeline
- UI button calls
POST /api/evaluate/{doc_id}. - Backend runs a fixed prompt set through full RAG (
ask(...)). - LLM-as-judge scores each answer on:
- groundedness
- answer relevance
- citation faithfulness
- Returns aggregate metrics + per-example records.
{
"question": "string (min length: 3)",
"top_k": 20
}{
"doc_id": "string",
"chunk_id": "string",
"score": 0.0,
"page": 1,
"quote": "string"
}{
"grounded": true,
"reason": "string"
}{
"answer": "string",
"citations": [
{
"doc_id": "string",
"chunk_id": "string",
"score": 0.0,
"page": 1,
"quote": "string"
}
],
"confidence": 0.0,
"guardrail_status": {
"grounded": true,
"reason": "string"
},
"latency_ms": 0,
"retrieved_chunks": 0
}{
"doc_id": "string",
"filename": "string",
"chunks_indexed": 0,
"indexed_at": "2026-02-16T12:00:00"
}{
"doc_id": "string",
"filename": "string",
"pages_or_sections": 0,
"chunks_indexed": 0,
"indexed_at": "2026-02-16T12:00:00"
}{
"name": "geval_overall",
"value": 0.81,
"note": "optional string"
}{
"question": "string",
"confidence": 0.72,
"grounded": true,
"citations": 4,
"latency_ms": 1100,
"geval_groundedness": 0.8,
"geval_answer_relevance": 0.8,
"geval_citation_faithfulness": 1.0,
"geval_overall": 0.866,
"answer_preview": "string"
}{
"doc_id": "string",
"method": "geval_e2e_rag",
"metrics": [
{
"name": "geval_overall",
"value": 0.81
}
],
"examples": [
{
"question": "string",
"confidence": 0.72,
"grounded": true,
"citations": 4,
"latency_ms": 1100,
"geval_groundedness": 0.8,
"geval_answer_relevance": 0.8,
"geval_citation_faithfulness": 1.0,
"geval_overall": 0.866,
"answer_preview": "string"
}
]
}GET /api/healthPOST /api/documents/uploadGET /api/documents/{doc_id}GET /api/documentsPOST /api/documents/clearPOST /api/chat/{session_id}/{doc_id}POST /api/evaluate/{doc_id}POST /api/session/cleanup/{session_id}POST /ingest-qa/invoke(LangServe-compatible)
OPENAI_API_KEY=
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_CHAT_MODEL=gpt-4o-mini
OPENAI_EMBEDDING_MODEL=text-embedding-3-large
OPENAI_RERANK_MODEL=gpt-4o-mini
CHROMA_DIR=data/chroma
UPLOAD_DIR=data/uploads
PROCESSED_DIR=data/processed
DEFAULT_TOP_K=20
RERANK_TOP_K=8
RETRIEVAL_MIN_SCORE=0.3
LIST_INTENT_TOP_K=50
LIST_INTENT_RERANK_TOP_K=25
LIST_INTENT_MIN_SCORE=0.2
MAX_CHUNK_CHARS=900
CHUNK_OVERLAP_CHARS=180- If UI shows old evaluation metrics or
Overall score: N/Aafter code changes:- Stop backend process.
- Start again with
python run_api.py. - Hard refresh browser (
Ctrl+F5).
- FastAPI interactive docs are available at
/docs.