Transform GitHub documentation repositories into intelligent, queryable knowledge bases using RAG and MCP.
- Semantic Search - Find answers across documentation using natural language
- 🤖 AI-Powered Q&A - Get intelligent responses with source citations
- 📚 Batch Processing - Ingest entire repositories with progress tracking
- 🔄 Incremental Updates - Detect and sync only changed files
- 🗂️ Repository Management - Complete CRUD operations for ingested docs
- Python 3.13+
- MongoDB Atlas with Vector Search enabled
- Nebius API key for embeddings and LLM
- GitHub token (optional, for private repos and higher rate limits)
# Clone and setup
git clone https://github.com/md-abid-hussain/doc-mcp.git
cd doc-mcp
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Setup environment
cp .env.example .env
Edit .env
with your credentials:
NEBIUS_API_KEY=your_nebius_api_key_here
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/
GITHUB_API_KEY=your_github_token_here # Optional
# Setup database
python scripts/db_setup.py setup
# Start application
python main.py
Visit http://localhost:7860
to access the web interface.
Access MCP at http://127.0.0.1:7860/gradio_api/mcp/sse
- Navigate to "📥 Documentation Ingestion" tab
- Enter GitHub repository URL (e.g.,
owner/repo
) - Select markdown files to process
- Execute two-step pipeline: Load files → Generate embeddings
- Go to "🤖 AI Documentation Assistant" tab
- Select your repository
- Ask natural language questions
- Get AI responses with source citations
- Use "�️ Repository Management" tab
- View statistics and file counts
- Delete repositories when needed
# Required
NEBIUS_API_KEY=your_nebius_api_key_here
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/
# Optional
GITHUB_API_KEY=your_github_token_here
CHUNK_SIZE=3072
SIMILARITY_TOP_K=5
GITHUB_CONCURRENT_REQUESTS=10
- Create cluster with Vector Search enabled
- Database structure auto-created:
doc_rag
- documents with embeddingsingested_repos
- repository metadata
Common Issues:
- Rate Limits: Add GitHub token for 5000 requests/hour (vs 60)
- Memory Issues: Reduce
CHUNK_SIZE
in.env
- Connection Errors: Verify MongoDB Atlas Vector Search is enabled
- Database Issues: Run
python scripts/db_setup.py status
For detailed guides see:
- Advanced configuration options
- Development and contribution guide
- API reference and examples
Md Abid Hussain
- GitHub: @md-abid-hussain
- LinkedIn: md-abid-hussain
MIT License - see LICENSE file for details.
Built with ❤️ using Python, LlamaIndex, Nebius, MongoDB Atlas, and Gradio