A production-ready PDF question-answering system with semantic search and LLM-powered answers. Works with any LLM provider (OpenAI, Ollama, etc.) or no LLM at all.
- Semantic Search - Find relevant content by meaning, not keywords
- Model-Agnostic RAG - Works with 6+ LLM providers (OpenAI, Ollama, Claude, etc.)
- Local-First - Run completely offline with local models
- Clean Architecture - Modular, testable, production-ready code
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
python main.py data/sample.pdf
Returns relevant text chunks for your questions.
python main_rag.py data/sample.pdf
Generates natural language answers using an LLM.
Ollama - Easiest local setup:
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.2
python main_rag.py data/sample.pdf
OpenAI gpt-oss-20b - Latest open-source model:
pip install transformers accelerate
python main_rag.py data/sample.pdf
# Select HuggingFace β openai/gpt-oss-20b
export OPENAI_API_KEY="sk-..."
# or
export ANTHROPIC_API_KEY="sk-ant-..."
python main_rag.py data/sample.pdf
pdf-qa-system/
βββ main.py # Semantic search CLI
βββ main_rag.py # RAG with LLM CLI
βββ test.py # Quick test
βββ requirements.txt # Dependencies
β
βββ data/ # Your PDF files
β βββ sample.pdf
β
βββ src/ # Core modules
β βββ extract.py # PDF β text
β βββ chunk.py # Text β chunks
β βββ embed.py # Chunks β vectors
β βββ vector_store.py # Vector database
β βββ query.py # Search interface
β βββ llm_providers.py # LLM integrations
β βββ rag.py # RAG pipeline
β
βββ docs/ # Documentation
βββ ARCHITECTURE.md # Technical details
$ python main.py data/sample.pdf
> What are the benefits?
[Shows 3 most relevant text chunks]
$ python main_rag.py data/sample.pdf
> What are the benefits?
π‘ Based on the document, the main benefits include:
1. Wellness app with health tracking
2. Coverage up to Rs. 10 Lakhs
3. Accidental death coverage
...
Provider | Cost | Privacy | Setup |
---|---|---|---|
Ollama | Free | 100% Local | ollama pull llama3.2 |
gpt-oss-20b | Free | 100% Local | Auto-downloads |
OpenAI | Paid | Cloud | Set OPENAI_API_KEY |
Anthropic | Paid | Cloud | Set ANTHROPIC_API_KEY |
HuggingFace | Free | 100% Local | Auto-downloads |
Local Server | Free | 100% Local | Start vLLM/text-gen-webui |
from src import PDFParser, TextChunker, EmbeddingModel, VectorStore, RAGInterface
# Process PDF
parser = PDFParser()
text = parser.extract_text("document.pdf")
# Create embeddings
chunker = TextChunker()
chunks = chunker.chunk_text(text)
embedder = EmbeddingModel()
embeddings = embedder.embed_batch(chunks)
# Store in vector DB
store = VectorStore()
store.add_chunks(chunks, embeddings)
# Query
from src import get_available_llm
rag = RAGInterface(embedder, store, llm=get_available_llm())
result = rag.answer_question("What is this about?")
print(result['answer'])
Edit settings in the respective modules:
- Chunk size:
src/chunk.py
βTextChunker(chunk_size=500)
- Number of results:
src/query.py
βQueryInterface(n_results=3)
- Embedding model:
src/embed.py
βEmbeddingModel(model_name="...")
- Python 3.8+
- 8GB RAM minimum (16GB+ recommended for large models)
- 10GB disk space (for models)
PDF β Extract β Chunk β Embed β Vector Store β Query β Results
PDF β Extract β Chunk β Embed β Vector Store
β
Question β Embed β Search β Top Chunks β Prompt β LLM β Answer
- ARCHITECTURE.md - Technical details and design decisions
MIT License
- Docling - PDF parsing
- Sentence Transformers - Embeddings
- Chroma - Vector database
- LangChain - Text splitting
- Ollama - Local LLM runtime