A sophisticated Retrieval-Augmented Generation (RAG) system enhanced with GraphRAG that combines semantic code search with knowledge graph relationships to answer questions about your codebase. Features multi-cloud LLM support (OpenAI, Google Cloud Vertex AI, AWS SageMaker) and an intuitive web interface.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Interface β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Web Frontend (HTML/JS) - Chat Interface with Provider β β
β β Selection, GraphRAG Toggle, Real-time Query Display β β
β βββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FastAPI Backend Server (Port 8000) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β API Layer β β
β β β’ POST /ask - Query endpoint with GraphRAG support β β
β β β’ GET /health - System health check β β
β β β’ POST /llm/provider/{provider} - Switch LLM providers β β
β β β’ POST /index - Dynamic codebase indexing β β
β βββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ β
β β GraphRAG Orchestrator (app/graph_rag.py) β β
β β βββββββββββββββββββββββ βββββββββββββββββββββββββββββ β β
β β β 1. Vector Search β β 2. Graph Traversal β β β
β β β (Embeddings) β β (Relationships) β β β
β β β β’ ChromaDB β β β’ Graph API (Port 5001)β β β
β β β β’ Top K chunks β β β’ Node extraction β β β
β β ββββββββββββ¬βββββββββββ ββββββββββββ¬βββββββββββββββββ β β
β β β β β β
β β ββββββββββββ¬ββββββββββββββββ β β
β β βΌ β β
β β βββββββββββββββββββββββββ β β
β β β 3. Context Fusion β β β
β β β (Merge embeddings β β β
β β β + graph context) β β β
β β ββββββββββββ¬βββββββββββββ β β
β βββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββ΄βββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β LLM Provider Factory β β Vector Store (ChromaDB) β
β (app/llm_factory.py) β β β’ Code embeddings β
β β β β’ Persistent storage β
β βββββββββββββββββββββ β β β’ Collection-based β
β β OpenAI API β β βββββββββββββββββββββββββββββββ
β β Vertex AI (Qwen) β β
β β SageMaker (Qwen) β β
β βββββββββββββββββββββ β
βββββββββββββββββββββββββββ
- Purpose: Single-page web application for interactive querying
- Features:
- Real-time query execution display
- Provider selection (Local Ollama, OpenAI, Vertex AI, SageMaker)
- GraphRAG toggle (combines embeddings + graph relationships)
- Comprehensive result display (answer, sources, graph context)
- Method and provider indicators
- Design: Modern, responsive UI with terminal-style output boxes
- Framework: FastAPI (high-performance async web framework)
- Endpoints:
POST /ask- Main query endpoint with GraphRAG supportGET /health- Detailed system health checks (vector store, graph, LLM, GraphRAG)POST /llm/provider/{provider}- Dynamic LLM provider switchingPOST /index- Dynamic codebase indexing with background tasksGET /index/status- Indexing progress tracking
- Features: CORS support, background task processing, real-time status updates
Standard RAG (app/rag_chain.py):
- Semantic search over code embeddings
- LangChain RetrievalQA chain
- Local embeddings model (HuggingFace
sentence-transformers/all-MiniLM-L6-v2)
GraphRAG (app/graph_rag.py):
- Vector Search: Retrieves top K semantically similar code chunks
- Graph Traversal: Extracts relationships from knowledge graph
- Loads graph from Analytics API (
http://localhost:5001/graph/nodes) - Finds nodes related to retrieved code chunks
- Traverses graph relationships (SERVICES, PART_OF, SURFACES_IN, etc.)
- Loads graph from Analytics API (
- Context Fusion: Combines embedding context + graph relationships
- Intelligent Node Matching: Path normalization, basename matching, directory structure matching
Graph Loader (app/graph_loader.py):
- Loads graph structure from Analytics API
- Provides traversal functions (get neighbors, find related nodes)
- Handles graph caching and error recovery
- Unified Interface: Single entry point for all LLM providers
- Providers:
- Local (Ollama): Apple Silicon-ready models such as
llama3.1:8b-instruct-q4_1 - OpenAI API (
ChatOpenAI): GPT-3.5/GPT-4 models - Google Cloud Vertex AI (
app/vertex_llm.py): Qwen models via Vertex AI - AWS SageMaker (
app/rag_chain.py): Qwen models via Serverless Inference
- Local (Ollama): Apple Silicon-ready models such as
- Features: Runtime provider switching, automatic fallback, credential management
- Purpose: Persistent storage for code embeddings
- Features: Local-first design, collection-based isolation, efficient similarity search
- Embedding Model:
sentence-transformers/all-MiniLM-L6-v2(384-dim, local)
- Purpose: One-time codebase indexing and embedding generation
- Features:
- Language-aware code chunking (respects functions, classes)
- Multi-language support (JS/TS, Python, Swift, Kotlin, Java, etc.)
- Configurable chunk size and overlap
- Batch processing for large codebases
- Progress tracking and error handling
- Purpose: Provides knowledge graph structure (nodes and edges)
- Technology: Fastify (Node.js) REST API
- Endpoints:
/graph/nodes,/graph/node/{id},/graph/edges - Data: Represents relationships between code components, services, vehicles, UI elements
Indexing Phase (one-time setup):
Source Code β Language Detector (Pygments) β
Code Splitter (LangChain) β
Embedding Model (HuggingFace) β
Vector Store (ChromaDB)
Query Phase - Standard RAG:
User Question β Embedding β Vector Search β
Top K Code Chunks β LLM + Context β Answer
Query Phase - GraphRAG:
User Question β Embedding β Vector Search β Top K Code Chunks
β
Extract Node IDs from Chunks
β
Graph Traversal (Depth 2)
β
Related Nodes + Relationships
β
Context Fusion (Code + Graph)
β
LLM + Combined Context β Answer
| Component | Choice | Rationale |
|---|---|---|
| Web Framework | FastAPI | High performance, async support, automatic API docs, type safety, excellent for production |
| RAG Orchestration | LangChain | Industry standard for RAG pipelines, extensive integrations, well-documented |
| Vector Database | ChromaDB | Lightweight, local-first, easy setup, persistent storage, perfect for POC |
| Server | Uvicorn | ASGI server with excellent performance, production-ready |
| Frontend | Vanilla HTML/JS | No build step, fast iteration, demonstrates backend capabilities |
- Model:
sentence-transformers/all-MiniLM-L6-v2(Hugging Face) - Dimensions: 384
- Purpose: Convert code and queries into vector representations
- Rationale:
- β Lightweight yet high-quality semantic embeddings
- β Runs entirely on CPU (no GPU required)
- β Fast indexing and retrieval
- β Proven performance on code understanding tasks
- β No API costs (runs locally)
OpenAI API (Default):
- Models:
gpt-3.5-turbo,gpt-4,gpt-4-turbo-preview - Why: Reliable, fast, excellent for code analysis, easy to configure
- Use Case: Primary option for quick testing and development
Google Cloud Vertex AI:
- Model: Qwen models (e.g.,
qwen-2.5-7b-instruct) - Why:
- Managed service with automatic scaling
- Access to cutting-edge models (Qwen)
- Enterprise-grade infrastructure
- Integrated with Google Cloud ecosystem
- Use Case: Production deployments, code analysis with specialized models
AWS SageMaker Serverless:
- Model: Qwen models via JumpStart
- Why:
- Serverless auto-scaling (pay per use)
- Integrated with AWS ecosystem
- IAM-based security
- No infrastructure management
- Use Case: AWS-native deployments, cost-effective scaling
- Pygments: Syntax highlighting and language detection
- LangChain Text Splitters: Language-aware code chunking
- Respects code structure (functions, classes, methods)
- Configurable chunk size (default: 1000 chars) and overlap (default: 100 chars)
- Supports: JS/TS, Python, Swift, Kotlin, Java, HTML/CSS, JSON, YAML, Markdown
The indexer automatically detects and processes:
- JavaScript/TypeScript:
.js,.jsx,.ts,.tsx - Python:
.py - Swift:
.swift,.m,.h - Kotlin/Java:
.kt,.java,.gradle - Web:
.html,.css - Config:
.json,.yaml,.toml,.xml,.md
- Python 3.9+ (3.9 recommended for compatibility)
- Node.js 14+ (for Graph Analytics API)
- pip package manager
- LLM provider available locally or via cloud (pick what fits your hardware):
- Ollama (Apple Silicon / local GPUs) for offline mode
- OpenAI API key (recommended for quick start)
- Google Cloud project with Vertex AI enabled (optional)
- AWS account with SageMaker endpoint (optional)
pip install -r requirements.txtCreate .env file in project root:
# LLM Provider (default: local Ollama for Apple Silicon / M-series)
LLM_PROVIDER=ollama
# Local Ollama configuration (install via https://ollama.com/download)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL_NAME=llama3.1:8b-instruct-q4_1
OLLAMA_TEMPERATURE=0.1
OLLAMA_CONTEXT=8192
# OpenAI Configuration (optional backup provider)
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL_NAME=gpt-3.5-turbo
OPENAI_TEMPERATURE=0.2
# Google Cloud Vertex AI (optional, for hosted Qwen)
GCP_PROJECT_ID=your-project-id
GCP_REGION=us-central1
VERTEX_MODEL_NAME=qwen-2.5-7b-instruct
VERTEX_LOCATION=us-central1
# AWS SageMaker (optional, for hosted Qwen)
AWS_REGION=us-east-1
SAGEMAKER_ENDPOINT_NAME=your-endpoint-name
SAGEMAKER_MAX_NEW_TOKENS=2048
SAGEMAKER_TEMPERATURE=0.2
SAGEMAKER_TOP_P=0.9
# Graph API (for GraphRAG)
GRAPH_API_URL=http://localhost:5001/graph/nodes
# RAG Settings
RAG_K=10
GRAPH_DEPTH=2
GRAPH_MAX_NODES=20
# Vector Store
LOCAL_VECTOR_STORE_PATH=chroma_db
LOCAL_COLLECTION_NAME=code_assistant_local- Install Ollama (runs optimized models locally):
brew install ollama ollama serve # keep running in a separate terminal - Pull the recommended model (fits comfortably on M3 and delivers strong code answers):
ollama pull llama3.1:8b-instruct-q4_1
- Verify the model responds locally:
ollama run llama3.1:8b-instruct-q4_1 "Summarize this repo." - Start the backend (it will now default to the local model because
LLM_PROVIDER=ollama).
Terminal 1 - Graph Analytics API (required for GraphRAG):
cd my_codebase/mycarhub-fleet-analytics
npm install # First time only
npm run devTerminal 2 - Main Server:
uvicorn app.main:app --host 0.0.0.0 --port 8000Open Browser: http://localhost:8000/
python3.9 scripts/index_codebase.py /path/to/your/codebase \
--output ./chroma_db \
--collection code_assistant_local# Index main repository
python3.9 scripts/index_codebase.py ./my_codebase/mycarhub/src \
--collection code_assistant_local
# Index analytics repository
python3.9 scripts/index_codebase.py ./my_codebase/mycarhub-fleet-analytics \
--collection code_assistant_local
# Index service hub repository
python3.9 scripts/index_codebase.py ./my_codebase/mycarhub-service-hub \
--collection code_assistant_local| Option | Short | Default | Description |
|---|---|---|---|
codebase_path |
- | required | Path to the codebase directory |
--output |
-o |
./chroma_db |
Output directory for vector store |
--collection |
-c |
code_assistant_local |
Collection name |
--chunk-size |
- | 1000 |
Size of each code chunk (characters) |
--chunk-overlap |
- | 100 |
Overlap between chunks (characters) |
- Open
http://localhost:8000/ - Select LLM provider from dropdown
- Enable/disable GraphRAG checkbox
- Enter question and click "Ask Question"
- View results: answer, sources, graph context
# Health check
curl http://localhost:8000/health
# Ask question (Standard RAG)
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "What components are in this React app?", "use_graph_rag": false}'
# Ask question (GraphRAG)
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "How do service packages relate to vehicles?", "use_graph_rag": true}'
# Switch LLM provider
curl -X POST http://localhost:8000/llm/provider/openai
# Check current provider
curl http://localhost:8000/llm/provider# Health check
python3.9 test_graphrag.py --health
# Test query
python3.9 test_graphrag.py "What components are in this React app?"Standard RAG (code-only):
- "What components are in this React app?"
- "How does user authentication work?"
- "Show me the CarCard component implementation"
GraphRAG (code + relationships):
- "How do service packages relate to vehicles?"
- "What UI components display vehicle information?"
- "Show me dependencies between fleet analytics and service hub"
- "What files are related to the Toyota Camry?"
The system is designed with clear separation of concerns:
app/main.py: API layer only (endpoints, request handling)app/rag_chain.py: Standard RAG implementationapp/graph_rag.py: GraphRAG implementation (extends RAG with graph)app/graph_loader.py: Graph data loading and traversalapp/llm_factory.py: LLM provider abstractionapp/vertex_llm.py: Vertex AI-specific LLM wrapperscripts/index_codebase.py: Standalone indexing utilityfrontend/index.html: Self-contained UI
- UI β API: Clear REST API contract
- API β Business Logic: RAG/GraphRAG modules are independent
- Retrieval β Generation: Retrieval modules don't depend on LLM choice
- Storage β Processing: Vector store is abstracted via LangChain
- LLM Providers: Easy to add new providers via
llm_factory.py - Embedding Models: Swappable via LangChain interface
- Graph Sources:
graph_loader.pycan load from different APIs - UI Features: Frontend can be enhanced without backend changes
- Graceful degradation (GraphRAG falls back to RAG)
- Detailed error messages in health checks
- Retry logic for external API calls
- Clear user feedback in UI
The system provides contextually relevant, accurate answers by:
- Semantic Understanding: Embeddings capture code semantics, not just keywords
- Context Retrieval: Top K most relevant code chunks are retrieved
- Relationship Discovery: GraphRAG discovers cross-file, cross-repo relationships
- Context Fusion: Code snippets + graph relationships provide comprehensive context
- Natural Language Generation: LLM synthesizes clear, coherent answers
- Relevance: Retrieval finds semantically similar code to the question
- Accuracy: Answers are grounded in actual code (sources provided)
- Completeness: GraphRAG finds related components the user might not know about
- Clarity: LLM generates well-structured, readable answers
{
"answer": "Based on the provided code and graph relationships...",
"sources": [
{"source": "CarCard.js", "content": "..."},
{"source": "App.js", "content": "..."}
],
"graph_context": "- Vehicle: Toyota Camry\n Relations: SERVICES: Performance Max...",
"nodes_found": 12,
"method": "graph_rag",
"provider": "openai"
}This implementation demonstrates advanced RAG techniques:
- Combines traditional vector search with knowledge graph relationships
- Discovers implicit connections between code components
- Enhances retrieval with relationship-aware context
- Unified Interface: Single API for multiple LLM providers
- Runtime Switching: Change providers without restart
- Fallback Logic: Automatic provider fallback on errors
- Cost Optimization: Choose provider based on use case
- Path Normalization: Handles absolute vs. relative paths
- Basename Matching: Matches files by name across directories
- Directory Structure Matching: Finds related files in similar structures
- Heuristic Matching: Uses content patterns to find related nodes
- Adaptive Context: Combines varying amounts of code + graph context
- Configurable Depth: Adjustable graph traversal depth
- Smart Filtering: Limits context to most relevant relationships
- Query Execution Display: Shows method, provider, node count in real-time
- Source Attribution: Provides exact code snippets used
- Graph Visualization: Shows relationships discovered
- Cross-Repository Understanding: GraphRAG understands dependencies across repos
- Relationship Discovery: Finds connections user might not know exist
- Contextual Answers: Answers consider both code and relationships
- Provider Flexibility: Can test same query across different LLMs
- β Clear Query Section: Shows what question was asked
- β Method Indicators: Visual badges showing RAG method and provider
- β Real-time Status: Terminal-style execution display
- β Source Citations: Clickable/reviewable code snippets
- β Graph Context: Visual display of relationships found
- β Error Handling: Clear error messages with troubleshooting hints
- Clean, readable typography
- High contrast for terminal outputs
- Responsive layout (works on different screen sizes)
- Keyboard-friendly (can tab through controls)
- Dropdown for easy provider switching
- Visual indicators (π΅ Vertex AI, βͺ OpenAI, π SageMaker)
- Status display showing current provider
- No page reload needed (seamless switching)
- Clear checkbox to enable/disable GraphRAG
- Helpful tooltip/description
- Visual feedback when graph context is found
- Index Once: Codebase indexed once, reused for all queries
- Fast Queries: Retrieval + generation typically < 5 seconds
- No Context Switching: Everything in one interface
- Query History: Can see previous query details (in terminal output)
- Graceful Degradation: GraphRAG falls back to RAG if graph unavailable
- Clear Error Messages: Tells user what went wrong and how to fix
- Health Checks:
/healthendpoint shows system status - Retry Logic: Automatically retries failed initializations
Step 1: Create provider wrapper in app/ (e.g., app/ollama_llm.py)
from langchain_core.language_models.base import BaseLanguageModel
class OllamaLLM(BaseLanguageModel):
def invoke(self, prompt: str):
# Your Ollama implementation
passStep 2: Add to app/llm_factory.py
def _get_ollama_llm():
return OllamaLLM(...)
def get_llm(provider: str):
if provider == "ollama":
return _get_ollama_llm()
# ... existing providersStep 3: Update frontend dropdown (optional)
Option 1: Via environment variable
LOCAL_EMBEDDING_MODEL=all-mpnet-base-v2Option 2: Programmatically in app/rag_chain.py
from langchain_huggingface import HuggingFaceEmbeddings
LOCAL_EMBEDDING_MODEL = os.getenv("LOCAL_EMBEDDING_MODEL", "all-mpnet-base-v2")Edit scripts/index_codebase.py:
LANGUAGE_EXTENSIONS = {
"Rust": [".rs"],
"Go": [".go"],
# ... add your language
}Modify app/graph_loader.py to load from:
- Neo4j database
- GraphQL API
- Custom REST endpoint
- Local JSON file
Add new traversal strategies in app/graph_loader.py:
def traverse_by_importance(self, node_ids, max_nodes=20):
# Custom traversal based on node importance scores
passAlready supported! Use different collections:
python3.9 scripts/index_codebase.py ./repo1 --collection repo1_code
python3.9 scripts/index_codebase.py ./repo2 --collection repo2_codeAdd to scripts/index_codebase.py:
def update_index(codebase_path, collection, changed_files):
# Only re-index changed files
passAdd to app/main.py:
from fastapi import Depends, HTTPException
from fastapi.security import HTTPBearer
security = HTTPBearer()
@app.post("/ask")
async def ask_question(q: Question, token: str = Depends(security)):
# Verify token
passAdd Redis/Memcached caching:
import redis
cache = redis.Redis()
@app.post("/ask")
async def ask_question(q: Question):
cache_key = f"query:{hash(q.question)}"
cached = cache.get(cache_key)
if cached:
return json.loads(cached)
# ... normal processing
cache.setex(cache_key, 3600, json.dumps(result))
return resultAdd logging/metrics:
import logging
from prometheus_client import Counter
query_counter = Counter('queries_total', 'Total queries')
@app.post("/ask")
async def ask_question(q: Question):
query_counter.inc()
logging.info(f"Query: {q.question}")
# ... normal processingFROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]- Vector store: Use ChromaDB server mode or migrate to Pinecone/Weaviate
- API: Stateless FastAPI instances behind load balancer
- Graph API: Replicate or use database-backed graph
- Vector Store: ChromaDB β Pinecone (managed) or Weaviate (self-hosted)
- Graph: JSON API β Neo4j or ArangoDB
- API key authentication
- Rate limiting
- Input validation and sanitization
- CORS restrictions for production
- Secrets management (AWS Secrets Manager, GCP Secret Manager)
- Modular Components: Clear separation (UI, API, retrieval, generation)
- Clean Interfaces: REST API, LangChain abstractions
- Scalable Design: Stateless API, swappable components
- Error Handling: Graceful degradation, clear error messages
- Meaningful Responses: Grounded in actual code with sources
- Context Understanding: Semantic search + graph relationships
- Multi-Modal Retrieval: Vector search + graph traversal
- Provider Flexibility: Works with multiple LLM backends
- GraphRAG Innovation: Combines embeddings + knowledge graph
- Multi-Cloud Support: Unified interface for different providers
- Intelligent Matching: Advanced path/node matching heuristics
- Dynamic Context: Adaptive context fusion based on graph depth
- Intuitive UI: Clear query interface with real-time feedback
- Comprehensive Results: Answer + sources + graph context
- Error Recovery: Graceful fallbacks and clear messages
- Fast Iteration: No build step, instant updates
- Easy Provider Addition: Simple factory pattern
- Language Support: Extensible language detection
- Graph Sources: Pluggable graph loader
- Deployment Ready: Clear path to production (Docker, scaling, security)
"GraphRAG system not initialized"
- β
Ensure Analytics API is running:
curl http://localhost:5001/graph/nodes - β
Check
GRAPH_API_URLin.env - β
Verify codebase is indexed:
curl http://localhost:8000/debug/docs
"No graph relationships found"
- β Index multiple repositories for richer graphs
- β Check graph API is returning data
- β Try different queries (some queries may not have graph connections)
LLM Provider Errors
- β
OpenAI: Verify
OPENAI_API_KEYis set - β
Vertex AI: Run
gcloud auth application-default login - β SageMaker: Verify endpoint exists and IAM permissions
"Address already in use"
- β
Kill existing server:
pkill -f uvicorn - β
Use different port:
uvicorn app.main:app --port 8001
NumPy Errors
- β
Downgrade NumPy:
pip install "numpy<2" --force-reinstall
code-assistant-poc/
βββ app/
β βββ main.py # FastAPI server and endpoints
β βββ rag_chain.py # Standard RAG implementation
β βββ graph_rag.py # GraphRAG implementation
β βββ graph_loader.py # Graph loading and traversal
β βββ llm_factory.py # LLM provider factory
β βββ vertex_llm.py # Vertex AI LLM wrapper
βββ scripts/
β βββ index_codebase.py # Codebase indexing script
βββ frontend/
β βββ index.html # Web UI
βββ my_codebase/ # Sample codebases
β βββ mycarhub/ # React app
β βββ mycarhub-fleet-analytics/ # Graph API (Fastify)
β βββ mycarhub-service-hub/ # Service hub
βββ chroma_db/ # Vector store (created after indexing)
βββ requirements.txt # Python dependencies
βββ .env # Environment variables
βββ test_graphrag.py # Test script
βββ README.md # This file
- Local Embeddings: Embeddings run locally (no API costs)
- LLM Costs: Only LLM inference charges apply (OpenAI, Vertex AI, or SageMaker)
- Graph API: Must be running separately for GraphRAG (see Quick Start)
- Indexing: Re-run indexing when codebase changes significantly
- Collection Names: Use consistent collection names between indexing and querying
This is a proof-of-concept project for demonstration purposes.
- β Add authentication and authorization
- β Implement rate limiting
- β Add comprehensive logging and monitoring
- β Set up CI/CD pipeline
- β Add unit and integration tests
- β Migrate to managed vector database (Pinecone, Weaviate)
- β Add caching layer (Redis)
- β Implement query history and conversation context
- β Add deployment documentation (Docker, Kubernetes)
- β Set up alerting and error tracking