Chat with any documentation using AI. Provide a documentation link, and the system will scrape, process, and convert it into a searchable knowledge base that you can interact with through natural language.
Check out the live website here: DocChat
This project implements a Retrieval-Augmented Generation (RAG) system designed for documentation. It allows users to convert any documentation website into an interactive chat interface powered by large language models.
The system handles crawling, chunking, embedding, indexing, and querying in a structured pipeline.
- Accepts a documentation URL as input
- Recursively crawls internal links
- Applies limits to avoid excessive crawling
- Cleans and extracts meaningful content from HTML
- Splits content into manageable chunks
- Generates vector embeddings for each chunk
- Stores embeddings in Qdrant
- Performs similarity search to retrieve relevant context
- If a documentation URL has already been ingested, the system reuses the existing knowledge base
- Works for the same user and for different users
- New chat creation for the same docs URL becomes instant (no re-ingestion wait)
- Users can ask questions about the ingested documentation
- Responses are generated using retrieved context
- Each response includes source references
- Tracks token usage per request
- Stores model usage details
- Enables usage monitoring for users
- Users can provide their own API keys
- Supports multiple providers
- Keys are encrypted before storage
- Ingestion runs asynchronously
- Tracks progress with status updates (processing, ready, failed)
- OpenAI
- Anthropic
- Google (Gemini)
- xAI (Grok)
- OpenRouter
- User submits a documentation URL
- System crawls and collects internal pages
- Content is cleaned and split into chunks
- Chunks are embedded using an embedding model
- Embeddings are stored in Qdrant collections (one per unique docs URL and reused across chats)
- User query triggers similarity search
- Retrieved context is passed to the LLM
- LLM generates response with references
Stores user account details.
Represents a documentation session created by a user.
Stores root documentation links associated with a chat.
Stores conversation messages including prompts and responses along with token usage.
Stores the source chunks used to generate each response.
Tracks token usage across different operations (chat, embedding, system).
Stores encrypted API keys provided by users.
-
Each unique docs URL has a collection that can be reused by multiple chats
-
Collections store:
- Embedding vectors
- Payload (text, source URL, metadata)
-
Enables isolated and efficient similarity search per chat
git clone https://github.com/avishek0679/DocChat.git
cd DocChat
pnpm install
pnpm run dev
- API keys are encrypted using a server-side encryption key
- Keys are never stored in plaintext
- Decryption happens only when making requests to providers
- Works best with static documentation websites
- JavaScript-heavy sites may not be fully supported
- Large documentation sets may take time to process
- Improved code-aware chunking
- Better support for dynamic websites
- Enhanced ranking and reranking strategies
- Advanced analytics for usage
Contributions are welcome. Please open an issue or submit a pull request.
MIT License