Skip to content

AmariahAK/docreader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document Q&A Service

A small, fully local HTTP API that answers questions over a fixed set of reference documents using retrieval-augmented-generation style logic. There are no external LLM calls, no vector database, and no hosted services — all retrieval, ranking, and confidence math runs in-process in pure Python.

Setup

Requires Python 3.10+. From the project root, create a virtual environment and install the dependencies (a venv is recommended, and required on systems whose Python is "externally managed", e.g. Homebrew):

python3 -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate
pip install -r requirements.txt

The commands below assume the venv is activated. If you prefer not to activate it, prefix each command with the venv interpreter instead, e.g. .venv/bin/python -m pytest -q.

How to run

# Start the HTTP API (serves on http://127.0.0.1:8000)
uvicorn app.main:app --reload

# Chat in the terminal (runs the engine in-process, no server needed)
python -m app.tui

# Run the tests
pytest -q

Example requests

# Answerable
curl -s http://127.0.0.1:8000/answer \
  -H 'content-type: application/json' \
  -d '{"question": "How long do refunds take?"}'

# Weak evidence -> fallback
curl -s http://127.0.0.1:8000/answer \
  -H 'content-type: application/json' \
  -d '{"question": "Do you support SSO?"}'

Response shape:

{
  "answer": "string",
  "citations": [{ "doc_id": "string", "title": "string", "snippet": "string" }],
  "confidence": 0.0,
  "fallback": false
}

Interactive docs are available at http://127.0.0.1:8000/docs once the server is running.

Chat (terminal UI)

A small Rich-based terminal chat runs the engine in-process — no server needed:

python -m app.tui

Type a question to see the grounded answer, a colored confidence bar, and citations. Commands: /help, /quit (Ctrl+C / Ctrl+D also exit cleanly).

Design choices

  • Sentence-level chunking. Each document is split into sentences. Sentences are the unit of retrieval, which yields precise citations (the exact supporting sentence) and lets answers be assembled extractively.
  • Hand-written BM25. Retrieval ranks sentences with a from-scratch BM25 (k1=1.5, b=0.75). BM25 handles term frequency saturation and length normalization well on short text, and writing it by hand keeps the relevance math visible instead of hidden behind a library. Tokenization lowercases, drops a small stop list, and applies conservative singularization so query and document word forms align (refunds -> refund, members -> member). Each document's title is folded into every one of its chunks, since the title is real evidence that applies to all its sentences — without this, "What is the API rate limit?" would fall back, because "rate" appears only in the "API Rate Limits" title.
  • Extractive answers (no hallucination). The answer is the top matching sentence plus any close runner-up (capped at 3), returned verbatim. The same sentences are the citation snippets, so every word of the answer is provably present in a source document.
  • Confidence = IDF-weighted query-term coverage. Raw BM25 scores are unbounded and not comparable across questions, so they make a poor confidence signal. Instead, each distinct query content term is weighted by its IDF, and confidence is the share of that weighted mass that the retrieved sentences actually cover. Rare, informative terms dominate; common ones barely move the needle. A query term that does not appear in the corpus at all is assigned the maximum IDF (treated as df = 0), so an unknown key term drags confidence down hard. The score is bounded in [0, 1].
  • Fallback logic. The service returns the fixed message "I could not find enough evidence in the provided documents." with empty citations when any of the following hold: the question has no content terms after stopword removal, nothing scores above zero, or confidence is below the threshold (0.4). Worked example — "Do you support SSO?": support is common and does appear in the corpus (low IDF), but sso is unknown (maximum IDF) and matches nothing, so confidence lands near 0.35 and the service falls back rather than answering from the weak support match.

All tunables (K1, B, RUNNER_UP_RATIO, CONFIDENCE_THRESHOLD, MAX_CITATIONS) are named constants at the top of app/retrieval.py.

Project layout

app/
  main.py        FastAPI app, startup index build, POST /answer
  models.py      Pydantic request/response schemas (the API contract)
  retrieval.py   tokenizer, sentence chunking, BM25 index, IDF
  qa.py          orchestration: retrieve -> score -> answer / cite / fallback
data/
  documents.json the reference corpus
tests/
  test_answer.py end-to-end tests (answerable + fallback + contract)

Limitations

  • Lexical only. Matching is on (singularized) surface tokens. Synonyms and paraphrases are not understood: "SSO" will not match "single sign-on", and "kept" will not match "retained". This is the intended trade-off for a no-embeddings, no-LLM design.
  • Extractive, not synthesized. Answers are stitched from existing sentences and cannot combine facts across documents into a new phrasing, nor reformat to directly mirror the question.
  • Small fixed corpus. The index is rebuilt from a local JSON file at startup; there is no incremental update path.
  • Naive sentence splitter. Splitting on .!? is sufficient for this corpus but would mishandle abbreviations or decimal numbers in richer text.

What I would improve next for production

  • Semantic retrieval. Add embeddings with an approximate-nearest-neighbor index to recover synonym/paraphrase recall, while keeping BM25 as a hybrid lexical signal.
  • Optional constrained synthesis. Layer an LLM that rewrites the answer only from the retrieved snippets, with citation enforcement and a grounding check, so fluency improves without reintroducing hallucination.
  • Evaluation harness. A labelled query set with retrieval and answer-quality metrics (precision@k, answerable/fallback accuracy) to tune thresholds with data instead of by hand.
  • Operational hardening. Authentication, rate limiting, response caching, structured logging and metrics, and request tracing for observability.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages