Local-first documentation reasoning for engineers and AI coding agents.
DocsRAG is an open-source, self-hostable platform that turns public technical documentation into a grounded reasoning layer. It crawls docs, extracts structured sections, separates explanations from code examples, and answers implementation-focused questions with citations — using your own local setup and provider configuration.
Why now?
In the era of AI-assisted coding, normal docs search and naive RAG are no longer enough. Engineers and coding agents need answers that are specific, grounded, and actionable — not just related paragraphs retrieved from a large documentation site.
When I’m working on integrations (Stripe, APIs, SDKs, etc.), I constantly find myself going back and forth between documentation pages.
You search for something simple like:
“how to generate an API key”
And you end up:
- opening multiple pages
- reading partially related sections
- piecing together steps manually
- guessing which code example actually applies
Even with AI tools, the answers are often:
- incomplete
- not grounded in the actual docs
- or missing the exact implementation details
DocsRAG was built to fix that.
Instead of just searching documentation, it tries to understand how the docs are structured and return:
- clear explanations
- actionable steps
- relevant code examples
- grounded answers with citations
Engineers and AI coding tools struggle to find exact implementation guidance inside large documentation sites.
The useful answer is usually:
- buried across multiple pages
- mixed with navigation noise
- surrounded by high-level explanations
- disconnected from the code examples
DocsRAG turns documentation into a structured reasoning layer.
Instead of treating docs as one big text corpus, it:
- separates explanation from examples
- preserves section context
- builds a retrieval flow optimized for real engineering questions
Technical documentation is designed for browsing — not for implementation-time reasoning.
- Docs are large, fragmented, and spread across many pages
- Engineers need exact steps, not broad summaries
- AI tools hallucinate when context is incomplete
- Code examples and explanations are not tightly linked
- Docs chatbots often return related pages, not the right ones
- Naive RAG treats all chunks equally (explanation, reference, examples)
DocsRAG is a staged documentation reasoning engine.
It:
- crawls public technical docs
- extracts structured sections
- separates explanation blocks from code examples
- keeps them linked by section/page context
When a question comes in:
- retrieves explanation-first context
- attaches only relevant code examples
- validates whether the docs actually support the answer
- generates a grounded response with citations
The goal is not to be a generic docs chatbot.
👉 The goal is to help engineers and AI agents answer implementation questions reliably from official docs.
- Implementation-oriented, not summary-oriented
- Explanation-first retrieval
- Code examples are linked to their actual context
- Intent-aware retrieval pipeline
- Support validation before answering
- Designed for humans and AI agents
- Local-first and self-hostable
- User adds a public documentation URL
- Docs are crawled and extracted
- Content is parsed into structured sections
- Explanation blocks and code examples are stored separately but linked
- Explanation chunks are indexed
- User asks a question
- Intent analysis + query planning
- Retrieval + reranking
- Support validation
- Relevant code examples are attached
- Final grounded answer is generated with citations
DocsRAG uses a staged backend pipeline rather than a single "retrieve and prompt" step.
- Ingestion is responsible for crawling public docs, extracting text, parsing sections, and building linked explanation and code-example records.
- Retrieval focuses on explanation chunks first, because those are usually the most reliable source for procedural answers.
- Query planning and reranking use question intent, entities, and procedural signals to improve retrieval quality.
- Code example selection is handled separately so the answer only includes examples that are relevant to the question.
- Support validation checks whether the retrieved documentation directly supports the requested task before the answer is returned.
- Answer composition produces explanation-first responses with citations and optionally linked implementation steps and examples.
This separation is important. It makes the system easier to inspect, improve, and adapt for both human-facing UI flows and future agent tooling.
The current repository is a monorepo with a local frontend and backend:
- Next.js 15 + React 19 power the web UI for ingestion, questioning, status, citations, and answer display.
- FastAPI provides the backend API for ingestion, asking questions, streaming answers, status, and reset operations.
- Poetry manages the Python backend environment and CLI packaging.
- Typer powers the backend CLI for ingesting docs, asking questions, serving the API, checking status, and reset flows.
- Chroma stores vector indexes for explanation-oriented retrieval.
- SQLite + SQLAlchemy persist local metadata such as docsets, pages, ingestion runs, and provider-related state.
- Trafilatura handles primary content extraction from public documentation pages.
- BeautifulSoup supports HTML parsing and structured section/code extraction.
- OpenAI-compatible LLM and embedding providers supply embeddings and answer generation while letting users bring their own keys and model choices.
- Uvicorn runs the FastAPI application in development and local deployment scenarios.
- Tailwind CSS + Radix UI support the frontend UI layer.
- Docker Compose provides an optional local container-based development path.
- Pytest, Ruff, Black, ESLint, and GitHub Actions CI provide the current testing and quality baseline.
DocsRAG is designed to run locally.
You can clone the repository, configure your own providers, and run the full stack on your machine without depending on a hosted DocsRAG service. The project is built around bring-your-own API keys and provider configuration, so you control the models, the storage, and the runtime environment.
This matters for engineering workflows. Local-first tools are easier to inspect, easier to extend, and easier to trust when they are being used inside real implementation work.
For engineers, DocsRAG helps turn large documentation sets into something closer to an implementation assistant. It is useful when you need the practical meaning of a docs section, the exact flow for setting something up, or the most relevant example tied to the explanation that justifies it.
For AI coding agents, the value is grounded context. Instead of relying on partial memory or weak retrieval, an agent can use DocsRAG to pull supportable answers from official docs, cite them, and reduce the chance of inventing steps that are not actually documented.
This is especially useful for AI-assisted coding, vibe coding, internal tooling experiments, and any workflow where the gap between "related docs" and "actionable implementation guidance" matters.
- "How do I implement Stripe webhook verification?"
- "How do I configure FastAPI background tasks?"
- "How do I rotate API keys?"
- "What does this docs section mean in practical terms?"
- "Which official example is most relevant to this setup flow?"
- Engineers working from public technical documentation
- Developer tools users who want grounded answers instead of loose summaries
- AI coding workflows that need reliable documentation retrieval
- Coding assistants and agentic tooling experiments
- Open-source contributors building better developer infrastructure
- Teams that want self-hostable, citation-backed documentation answers
DocsRAG/
├── backend/ FastAPI app, CLI, ingestion, retrieval, pipeline, tests
├── frontend/ Next.js app
├── docker-compose.yml Optional local container workflow
├── Makefile Common development commands
└── README.md
DocsRAG currently uses a monorepo with separate frontend and backend apps.
git clone git@github.com:Ando22/rag-docs.git
cd rag-docsCopy the example env file at the repository root:
cp .env.example .envThen set the values you want to use. At minimum, you will usually want to configure:
OPENAI_API_KEYDOCSRAG_LLM_PROVIDERDOCSRAG_LLM_MODELDOCSRAG_EMBEDDING_PROVIDERDOCSRAG_EMBEDDING_MODELNEXT_PUBLIC_API_URL
By default, DocsRAG uses local SQLite and local Chroma persistence directories.
cd backend
poetry install
cd ..cd frontend
npm install
cd ..make backend-devThis starts the FastAPI server via the DocsRAG CLI.
In a second terminal:
make frontend-devThe frontend runs on http://localhost:3000 by default, and the backend runs on http://localhost:8000 by default.
make devThe backend already includes both a CLI and HTTP API.
Example CLI usage:
cd backend
poetry run docsrag ingest https://fastapi.tiangolo.com/
poetry run docsrag ask "How do I declare a query parameter?" --docset https://fastapi.tiangolo.com/
poetry run docsrag chat --docset https://fastapi.tiangolo.com/
poetry run docsrag status
poetry run docsrag reset
poetry run docsrag serve --reloadCurrent API surface includes:
POST /api/ingestPOST /api/askPOST /api/ask/streamGET /api/statusGET /api/healthPOST /api/reset
OpenAPI docs are available at http://localhost:8000/docs.
Contributions are welcome.
If you want to help, useful areas include:
- retrieval improvements
- parser and section-structure improvements
- better explanation/code-example linkage
- provider support and configuration profiles
- UI and UX improvements
- MCP integration and agent-facing interfaces
- tests, fixtures, and documentation quality
Issues, discussions, and pull requests are all useful. If you are exploring the project for the first time, small improvements to docs, test coverage, or retrieval debugging are good places to start.
The repository already has a solid foundation, but the project is still early. The checklist below is intentionally conservative.
- Monorepo structure
- Local frontend + backend
- FastAPI API and Typer CLI
- Public documentation ingestion
- Local metadata persistence
- Vector indexing with Chroma
- Basic ask flow with citations
- Structured section parsing
- Separate explanation and code-example handling
- More robust parsing across inconsistent docs layouts
- Stronger multi-page reasoning and synthesis
- Intent analysis
- Query planning
- Retrieval reranking
- Support validation before answering
- Initial code-example filtering and selection
- Better lexical and hybrid reranking
- Stronger code-example relevance scoring
- Retrieval evaluation benchmarks and fixtures
- Better handling for ambiguous or underspecified questions
- Basic web UI
- Ingest and ask workflow
- Answer and citation display
- Settings/status surface
- Better code display and comparison views
- Expanded debug and retrieval inspection panel
- First-run onboarding/setup flow
- Sharper docset management UX
- MCP
ask_docsinterface - Structured tool output modes for agents
- Agent-oriented answer mode
- Better streaming/tool-call ergonomics
- CLI runner flows for coding-agent pipelines
- MIT license
- CI for backend and frontend
- Environment template
- Better end-to-end examples
- Architecture docs
- Dedicated contribution guide
- More test coverage and fixtures
Near-term directions for the project include:
- improved structured parsing for inconsistent public docs
- better retrieval and reranking quality
- stronger code-example selection and explanation/example linking
- MCP support for AI coding tools
- a more agent-friendly CLI runner and structured output mode
- multi-doc and cross-doc reasoning
- better provider profiles and local configuration ergonomics
- improved debugging, evaluation, and benchmark workflows
MIT. See LICENSE.