Skip to content

SingaporeSDK/code-assistant-poc-localLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Code Assistant POC - GraphRAG with Multi-Cloud LLM Support

A sophisticated Retrieval-Augmented Generation (RAG) system enhanced with GraphRAG that combines semantic code search with knowledge graph relationships to answer questions about your codebase. Features multi-cloud LLM support (OpenAI, Google Cloud Vertex AI, AWS SageMaker) and an intuitive web interface.


πŸ—οΈ High-Level Architecture

System Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         User Interface                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Web Frontend (HTML/JS) - Chat Interface with Provider       β”‚  β”‚
β”‚  β”‚  Selection, GraphRAG Toggle, Real-time Query Display         β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    FastAPI Backend Server (Port 8000)              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  API Layer                                                    β”‚  β”‚
β”‚  β”‚  β€’ POST /ask - Query endpoint with GraphRAG support          β”‚  β”‚
β”‚  β”‚  β€’ GET /health - System health check                         β”‚  β”‚
β”‚  β”‚  β€’ POST /llm/provider/{provider} - Switch LLM providers      β”‚  β”‚
β”‚  β”‚  β€’ POST /index - Dynamic codebase indexing                   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                          β”‚                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  GraphRAG Orchestrator (app/graph_rag.py)                    β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚  β”‚
β”‚  β”‚  β”‚ 1. Vector Search    β”‚    β”‚ 2. Graph Traversal        β”‚   β”‚  β”‚
β”‚  β”‚  β”‚    (Embeddings)     β”‚    β”‚    (Relationships)        β”‚   β”‚  β”‚
β”‚  β”‚  β”‚    β€’ ChromaDB       β”‚    β”‚    β€’ Graph API (Port 5001)β”‚   β”‚  β”‚
β”‚  β”‚  β”‚    β€’ Top K chunks   β”‚    β”‚    β€’ Node extraction      β”‚   β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚  β”‚
β”‚  β”‚             β”‚                          β”‚                     β”‚  β”‚
β”‚  β”‚             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β”‚  β”‚
β”‚  β”‚                        β–Ό                                     β”‚  β”‚
β”‚  β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚  β”‚
β”‚  β”‚              β”‚ 3. Context Fusion      β”‚                      β”‚  β”‚
β”‚  β”‚              β”‚    (Merge embeddings   β”‚                      β”‚  β”‚
β”‚  β”‚              β”‚     + graph context)   β”‚                      β”‚  β”‚
β”‚  β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚                     β”‚
                  β–Ό                     β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  LLM Provider Factory   β”‚  β”‚  Vector Store (ChromaDB)    β”‚
    β”‚  (app/llm_factory.py)   β”‚  β”‚  β€’ Code embeddings          β”‚
    β”‚                         β”‚  β”‚  β€’ Persistent storage       β”‚
    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚  β€’ Collection-based        β”‚
    β”‚  β”‚ OpenAI API        β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚  β”‚ Vertex AI (Qwen)  β”‚  β”‚
    β”‚  β”‚ SageMaker (Qwen)  β”‚  β”‚
    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Architecture Components

1. User Interface Layer (frontend/index.html)

  • Purpose: Single-page web application for interactive querying
  • Features:
    • Real-time query execution display
    • Provider selection (Local Ollama, OpenAI, Vertex AI, SageMaker)
    • GraphRAG toggle (combines embeddings + graph relationships)
    • Comprehensive result display (answer, sources, graph context)
    • Method and provider indicators
  • Design: Modern, responsive UI with terminal-style output boxes

2. API Layer (app/main.py)

  • Framework: FastAPI (high-performance async web framework)
  • Endpoints:
    • POST /ask - Main query endpoint with GraphRAG support
    • GET /health - Detailed system health checks (vector store, graph, LLM, GraphRAG)
    • POST /llm/provider/{provider} - Dynamic LLM provider switching
    • POST /index - Dynamic codebase indexing with background tasks
    • GET /index/status - Indexing progress tracking
  • Features: CORS support, background task processing, real-time status updates

3. Retrieval Layer

Standard RAG (app/rag_chain.py):

  • Semantic search over code embeddings
  • LangChain RetrievalQA chain
  • Local embeddings model (HuggingFace sentence-transformers/all-MiniLM-L6-v2)

GraphRAG (app/graph_rag.py):

  • Vector Search: Retrieves top K semantically similar code chunks
  • Graph Traversal: Extracts relationships from knowledge graph
    • Loads graph from Analytics API (http://localhost:5001/graph/nodes)
    • Finds nodes related to retrieved code chunks
    • Traverses graph relationships (SERVICES, PART_OF, SURFACES_IN, etc.)
  • Context Fusion: Combines embedding context + graph relationships
  • Intelligent Node Matching: Path normalization, basename matching, directory structure matching

Graph Loader (app/graph_loader.py):

  • Loads graph structure from Analytics API
  • Provides traversal functions (get neighbors, find related nodes)
  • Handles graph caching and error recovery

4. Generation Layer (app/llm_factory.py)

  • Unified Interface: Single entry point for all LLM providers
  • Providers:
    • Local (Ollama): Apple Silicon-ready models such as llama3.1:8b-instruct-q4_1
    • OpenAI API (ChatOpenAI): GPT-3.5/GPT-4 models
    • Google Cloud Vertex AI (app/vertex_llm.py): Qwen models via Vertex AI
    • AWS SageMaker (app/rag_chain.py): Qwen models via Serverless Inference
  • Features: Runtime provider switching, automatic fallback, credential management

5. Vector Store (ChromaDB)

  • Purpose: Persistent storage for code embeddings
  • Features: Local-first design, collection-based isolation, efficient similarity search
  • Embedding Model: sentence-transformers/all-MiniLM-L6-v2 (384-dim, local)

6. Indexing Pipeline (scripts/index_codebase.py)

  • Purpose: One-time codebase indexing and embedding generation
  • Features:
    • Language-aware code chunking (respects functions, classes)
    • Multi-language support (JS/TS, Python, Swift, Kotlin, Java, etc.)
    • Configurable chunk size and overlap
    • Batch processing for large codebases
    • Progress tracking and error handling

7. Graph Analytics API (my_codebase/mycarhub-fleet-analytics)

  • Purpose: Provides knowledge graph structure (nodes and edges)
  • Technology: Fastify (Node.js) REST API
  • Endpoints: /graph/nodes, /graph/node/{id}, /graph/edges
  • Data: Represents relationships between code components, services, vehicles, UI elements

Data Flow

Indexing Phase (one-time setup):

Source Code β†’ Language Detector (Pygments) β†’ 
Code Splitter (LangChain) β†’ 
Embedding Model (HuggingFace) β†’ 
Vector Store (ChromaDB)

Query Phase - Standard RAG:

User Question β†’ Embedding β†’ Vector Search β†’ 
Top K Code Chunks β†’ LLM + Context β†’ Answer

Query Phase - GraphRAG:

User Question β†’ Embedding β†’ Vector Search β†’ Top K Code Chunks
                                      ↓
                            Extract Node IDs from Chunks
                                      ↓
                            Graph Traversal (Depth 2)
                                      ↓
                            Related Nodes + Relationships
                                      ↓
                            Context Fusion (Code + Graph)
                                      ↓
                            LLM + Combined Context β†’ Answer

πŸ€– Model and Framework Choices

Core Frameworks

Component Choice Rationale
Web Framework FastAPI High performance, async support, automatic API docs, type safety, excellent for production
RAG Orchestration LangChain Industry standard for RAG pipelines, extensive integrations, well-documented
Vector Database ChromaDB Lightweight, local-first, easy setup, persistent storage, perfect for POC
Server Uvicorn ASGI server with excellent performance, production-ready
Frontend Vanilla HTML/JS No build step, fast iteration, demonstrates backend capabilities

AI Models

Embedding Model

  • Model: sentence-transformers/all-MiniLM-L6-v2 (Hugging Face)
  • Dimensions: 384
  • Purpose: Convert code and queries into vector representations
  • Rationale:
    • βœ… Lightweight yet high-quality semantic embeddings
    • βœ… Runs entirely on CPU (no GPU required)
    • βœ… Fast indexing and retrieval
    • βœ… Proven performance on code understanding tasks
    • βœ… No API costs (runs locally)

Language Models (LLMs)

OpenAI API (Default):

  • Models: gpt-3.5-turbo, gpt-4, gpt-4-turbo-preview
  • Why: Reliable, fast, excellent for code analysis, easy to configure
  • Use Case: Primary option for quick testing and development

Google Cloud Vertex AI:

  • Model: Qwen models (e.g., qwen-2.5-7b-instruct)
  • Why:
    • Managed service with automatic scaling
    • Access to cutting-edge models (Qwen)
    • Enterprise-grade infrastructure
    • Integrated with Google Cloud ecosystem
  • Use Case: Production deployments, code analysis with specialized models

AWS SageMaker Serverless:

  • Model: Qwen models via JumpStart
  • Why:
    • Serverless auto-scaling (pay per use)
    • Integrated with AWS ecosystem
    • IAM-based security
    • No infrastructure management
  • Use Case: AWS-native deployments, cost-effective scaling

Code Processing

  • Pygments: Syntax highlighting and language detection
  • LangChain Text Splitters: Language-aware code chunking
    • Respects code structure (functions, classes, methods)
    • Configurable chunk size (default: 1000 chars) and overlap (default: 100 chars)
    • Supports: JS/TS, Python, Swift, Kotlin, Java, HTML/CSS, JSON, YAML, Markdown

Supported Languages

The indexer automatically detects and processes:

  • JavaScript/TypeScript: .js, .jsx, .ts, .tsx
  • Python: .py
  • Swift: .swift, .m, .h
  • Kotlin/Java: .kt, .java, .gradle
  • Web: .html, .css
  • Config: .json, .yaml, .toml, .xml, .md

πŸš€ How to Run or Test the Tool

Prerequisites

  • Python 3.9+ (3.9 recommended for compatibility)
  • Node.js 14+ (for Graph Analytics API)
  • pip package manager
  • LLM provider available locally or via cloud (pick what fits your hardware):
    • Ollama (Apple Silicon / local GPUs) for offline mode
    • OpenAI API key (recommended for quick start)
    • Google Cloud project with Vertex AI enabled (optional)
    • AWS account with SageMaker endpoint (optional)

Quick Start (3 Steps)

1. Install Dependencies

pip install -r requirements.txt

2. Configure Environment

Create .env file in project root:

# LLM Provider (default: local Ollama for Apple Silicon / M-series)
LLM_PROVIDER=ollama

# Local Ollama configuration (install via https://ollama.com/download)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL_NAME=llama3.1:8b-instruct-q4_1
OLLAMA_TEMPERATURE=0.1
OLLAMA_CONTEXT=8192

# OpenAI Configuration (optional backup provider)
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL_NAME=gpt-3.5-turbo
OPENAI_TEMPERATURE=0.2

# Google Cloud Vertex AI (optional, for hosted Qwen)
GCP_PROJECT_ID=your-project-id
GCP_REGION=us-central1
VERTEX_MODEL_NAME=qwen-2.5-7b-instruct
VERTEX_LOCATION=us-central1

# AWS SageMaker (optional, for hosted Qwen)
AWS_REGION=us-east-1
SAGEMAKER_ENDPOINT_NAME=your-endpoint-name
SAGEMAKER_MAX_NEW_TOKENS=2048
SAGEMAKER_TEMPERATURE=0.2
SAGEMAKER_TOP_P=0.9

# Graph API (for GraphRAG)
GRAPH_API_URL=http://localhost:5001/graph/nodes

# RAG Settings
RAG_K=10
GRAPH_DEPTH=2
GRAPH_MAX_NODES=20

# Vector Store
LOCAL_VECTOR_STORE_PATH=chroma_db
LOCAL_COLLECTION_NAME=code_assistant_local

Apple Silicon (M3/M2) Local LLM Setup

  1. Install Ollama (runs optimized models locally):
    brew install ollama
    ollama serve  # keep running in a separate terminal
  2. Pull the recommended model (fits comfortably on M3 and delivers strong code answers):
    ollama pull llama3.1:8b-instruct-q4_1
  3. Verify the model responds locally:
    ollama run llama3.1:8b-instruct-q4_1 "Summarize this repo."
  4. Start the backend (it will now default to the local model because LLM_PROVIDER=ollama).

3. Start Services

Terminal 1 - Graph Analytics API (required for GraphRAG):

cd my_codebase/mycarhub-fleet-analytics
npm install  # First time only
npm run dev

Terminal 2 - Main Server:

uvicorn app.main:app --host 0.0.0.0 --port 8000

Open Browser: http://localhost:8000/

Indexing Your Codebase

Basic Indexing

python3.9 scripts/index_codebase.py /path/to/your/codebase \
  --output ./chroma_db \
  --collection code_assistant_local

Index Multiple Repositories (for GraphRAG)

# Index main repository
python3.9 scripts/index_codebase.py ./my_codebase/mycarhub/src \
  --collection code_assistant_local

# Index analytics repository
python3.9 scripts/index_codebase.py ./my_codebase/mycarhub-fleet-analytics \
  --collection code_assistant_local

# Index service hub repository
python3.9 scripts/index_codebase.py ./my_codebase/mycarhub-service-hub \
  --collection code_assistant_local

Indexing Options

Option Short Default Description
codebase_path - required Path to the codebase directory
--output -o ./chroma_db Output directory for vector store
--collection -c code_assistant_local Collection name
--chunk-size - 1000 Size of each code chunk (characters)
--chunk-overlap - 100 Overlap between chunks (characters)

Testing the Tool

Method 1: Web Interface

  1. Open http://localhost:8000/
  2. Select LLM provider from dropdown
  3. Enable/disable GraphRAG checkbox
  4. Enter question and click "Ask Question"
  5. View results: answer, sources, graph context

Method 2: API Testing

# Health check
curl http://localhost:8000/health

# Ask question (Standard RAG)
curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "What components are in this React app?", "use_graph_rag": false}'

# Ask question (GraphRAG)
curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "How do service packages relate to vehicles?", "use_graph_rag": true}'

# Switch LLM provider
curl -X POST http://localhost:8000/llm/provider/openai

# Check current provider
curl http://localhost:8000/llm/provider

Method 3: Command-Line Test Script

# Health check
python3.9 test_graphrag.py --health

# Test query
python3.9 test_graphrag.py "What components are in this React app?"

Example Queries

Standard RAG (code-only):

  • "What components are in this React app?"
  • "How does user authentication work?"
  • "Show me the CarCard component implementation"

GraphRAG (code + relationships):

  • "How do service packages relate to vehicles?"
  • "What UI components display vehicle information?"
  • "Show me dependencies between fleet analytics and service hub"
  • "What files are related to the Toyota Camry?"

πŸ›οΈ Architecture Design

Component Structure

The system is designed with clear separation of concerns:

Modular Design

  • app/main.py: API layer only (endpoints, request handling)
  • app/rag_chain.py: Standard RAG implementation
  • app/graph_rag.py: GraphRAG implementation (extends RAG with graph)
  • app/graph_loader.py: Graph data loading and traversal
  • app/llm_factory.py: LLM provider abstraction
  • app/vertex_llm.py: Vertex AI-specific LLM wrapper
  • scripts/index_codebase.py: Standalone indexing utility
  • frontend/index.html: Self-contained UI

Separation of Concerns

  • UI ↔ API: Clear REST API contract
  • API ↔ Business Logic: RAG/GraphRAG modules are independent
  • Retrieval ↔ Generation: Retrieval modules don't depend on LLM choice
  • Storage ↔ Processing: Vector store is abstracted via LangChain

Extensibility Points

  • LLM Providers: Easy to add new providers via llm_factory.py
  • Embedding Models: Swappable via LangChain interface
  • Graph Sources: graph_loader.py can load from different APIs
  • UI Features: Frontend can be enhanced without backend changes

Error Handling

  • Graceful degradation (GraphRAG falls back to RAG)
  • Detailed error messages in health checks
  • Retry logic for external API calls
  • Clear user feedback in UI

βœ… Functionality

Meaningful Responses to Code Questions

The system provides contextually relevant, accurate answers by:

  1. Semantic Understanding: Embeddings capture code semantics, not just keywords
  2. Context Retrieval: Top K most relevant code chunks are retrieved
  3. Relationship Discovery: GraphRAG discovers cross-file, cross-repo relationships
  4. Context Fusion: Code snippets + graph relationships provide comprehensive context
  5. Natural Language Generation: LLM synthesizes clear, coherent answers

Response Quality

  • Relevance: Retrieval finds semantically similar code to the question
  • Accuracy: Answers are grounded in actual code (sources provided)
  • Completeness: GraphRAG finds related components the user might not know about
  • Clarity: LLM generates well-structured, readable answers

Example Response Structure

{
  "answer": "Based on the provided code and graph relationships...",
  "sources": [
    {"source": "CarCard.js", "content": "..."},
    {"source": "App.js", "content": "..."}
  ],
  "graph_context": "- Vehicle: Toyota Camry\n  Relations: SERVICES: Performance Max...",
  "nodes_found": 12,
  "method": "graph_rag",
  "provider": "openai"
}

🎨 Creativity & Exploration

RAG Integration Exploration

This implementation demonstrates advanced RAG techniques:

1. GraphRAG Innovation

  • Combines traditional vector search with knowledge graph relationships
  • Discovers implicit connections between code components
  • Enhances retrieval with relationship-aware context

2. Multi-Cloud LLM Support

  • Unified Interface: Single API for multiple LLM providers
  • Runtime Switching: Change providers without restart
  • Fallback Logic: Automatic provider fallback on errors
  • Cost Optimization: Choose provider based on use case

3. Intelligent Node Matching

  • Path Normalization: Handles absolute vs. relative paths
  • Basename Matching: Matches files by name across directories
  • Directory Structure Matching: Finds related files in similar structures
  • Heuristic Matching: Uses content patterns to find related nodes

4. Dynamic Context Fusion

  • Adaptive Context: Combines varying amounts of code + graph context
  • Configurable Depth: Adjustable graph traversal depth
  • Smart Filtering: Limits context to most relevant relationships

5. Real-time Feedback

  • Query Execution Display: Shows method, provider, node count in real-time
  • Source Attribution: Provides exact code snippets used
  • Graph Visualization: Shows relationships discovered

Exploration Highlights

  • Cross-Repository Understanding: GraphRAG understands dependencies across repos
  • Relationship Discovery: Finds connections user might not know exist
  • Contextual Answers: Answers consider both code and relationships
  • Provider Flexibility: Can test same query across different LLMs

🎯 Practical Usability

Chat Interface Design

User-Friendly Features

  • βœ… Clear Query Section: Shows what question was asked
  • βœ… Method Indicators: Visual badges showing RAG method and provider
  • βœ… Real-time Status: Terminal-style execution display
  • βœ… Source Citations: Clickable/reviewable code snippets
  • βœ… Graph Context: Visual display of relationships found
  • βœ… Error Handling: Clear error messages with troubleshooting hints

Accessibility

  • Clean, readable typography
  • High contrast for terminal outputs
  • Responsive layout (works on different screen sizes)
  • Keyboard-friendly (can tab through controls)

Provider Selection

  • Dropdown for easy provider switching
  • Visual indicators (πŸ”΅ Vertex AI, βšͺ OpenAI, 🟠 SageMaker)
  • Status display showing current provider
  • No page reload needed (seamless switching)

GraphRAG Toggle

  • Clear checkbox to enable/disable GraphRAG
  • Helpful tooltip/description
  • Visual feedback when graph context is found

Workflow Efficiency

  1. Index Once: Codebase indexed once, reused for all queries
  2. Fast Queries: Retrieval + generation typically < 5 seconds
  3. No Context Switching: Everything in one interface
  4. Query History: Can see previous query details (in terminal output)

Error Recovery

  • Graceful Degradation: GraphRAG falls back to RAG if graph unavailable
  • Clear Error Messages: Tells user what went wrong and how to fix
  • Health Checks: /health endpoint shows system status
  • Retry Logic: Automatically retries failed initializations

πŸ”§ Extensibility

How to Scale or Adapt

1. Adding New LLM Providers

Step 1: Create provider wrapper in app/ (e.g., app/ollama_llm.py)

from langchain_core.language_models.base import BaseLanguageModel

class OllamaLLM(BaseLanguageModel):
    def invoke(self, prompt: str):
        # Your Ollama implementation
        pass

Step 2: Add to app/llm_factory.py

def _get_ollama_llm():
    return OllamaLLM(...)

def get_llm(provider: str):
    if provider == "ollama":
        return _get_ollama_llm()
    # ... existing providers

Step 3: Update frontend dropdown (optional)

2. Adding New Embedding Models

Option 1: Via environment variable

LOCAL_EMBEDDING_MODEL=all-mpnet-base-v2

Option 2: Programmatically in app/rag_chain.py

from langchain_huggingface import HuggingFaceEmbeddings

LOCAL_EMBEDDING_MODEL = os.getenv("LOCAL_EMBEDDING_MODEL", "all-mpnet-base-v2")

3. Supporting More Languages

Edit scripts/index_codebase.py:

LANGUAGE_EXTENSIONS = {
    "Rust": [".rs"],
    "Go": [".go"],
    # ... add your language
}

4. Custom Graph Sources

Modify app/graph_loader.py to load from:

  • Neo4j database
  • GraphQL API
  • Custom REST endpoint
  • Local JSON file

5. Enhanced Graph Traversal

Add new traversal strategies in app/graph_loader.py:

def traverse_by_importance(self, node_ids, max_nodes=20):
    # Custom traversal based on node importance scores
    pass

6. Multiple Vector Stores

Already supported! Use different collections:

python3.9 scripts/index_codebase.py ./repo1 --collection repo1_code
python3.9 scripts/index_codebase.py ./repo2 --collection repo2_code

7. Incremental Indexing

Add to scripts/index_codebase.py:

def update_index(codebase_path, collection, changed_files):
    # Only re-index changed files
    pass

8. Authentication & Authorization

Add to app/main.py:

from fastapi import Depends, HTTPException
from fastapi.security import HTTPBearer

security = HTTPBearer()

@app.post("/ask")
async def ask_question(q: Question, token: str = Depends(security)):
    # Verify token
    pass

9. Caching

Add Redis/Memcached caching:

import redis

cache = redis.Redis()

@app.post("/ask")
async def ask_question(q: Question):
    cache_key = f"query:{hash(q.question)}"
    cached = cache.get(cache_key)
    if cached:
        return json.loads(cached)
    # ... normal processing
    cache.setex(cache_key, 3600, json.dumps(result))
    return result

10. Monitoring & Observability

Add logging/metrics:

import logging
from prometheus_client import Counter

query_counter = Counter('queries_total', 'Total queries')

@app.post("/ask")
async def ask_question(q: Question):
    query_counter.inc()
    logging.info(f"Query: {q.question}")
    # ... normal processing

Production Deployment Considerations

Docker Containerization

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Horizontal Scaling

  • Vector store: Use ChromaDB server mode or migrate to Pinecone/Weaviate
  • API: Stateless FastAPI instances behind load balancer
  • Graph API: Replicate or use database-backed graph

Database Migration

  • Vector Store: ChromaDB β†’ Pinecone (managed) or Weaviate (self-hosted)
  • Graph: JSON API β†’ Neo4j or ArangoDB

Security Enhancements

  • API key authentication
  • Rate limiting
  • Input validation and sanitization
  • CORS restrictions for production
  • Secrets management (AWS Secrets Manager, GCP Secret Manager)

πŸ“Š Evaluation Criteria Alignment

Architecture Design βœ…

  • Modular Components: Clear separation (UI, API, retrieval, generation)
  • Clean Interfaces: REST API, LangChain abstractions
  • Scalable Design: Stateless API, swappable components
  • Error Handling: Graceful degradation, clear error messages

Functionality βœ…

  • Meaningful Responses: Grounded in actual code with sources
  • Context Understanding: Semantic search + graph relationships
  • Multi-Modal Retrieval: Vector search + graph traversal
  • Provider Flexibility: Works with multiple LLM backends

Creativity & Exploration βœ…

  • GraphRAG Innovation: Combines embeddings + knowledge graph
  • Multi-Cloud Support: Unified interface for different providers
  • Intelligent Matching: Advanced path/node matching heuristics
  • Dynamic Context: Adaptive context fusion based on graph depth

Practical Usability βœ…

  • Intuitive UI: Clear query interface with real-time feedback
  • Comprehensive Results: Answer + sources + graph context
  • Error Recovery: Graceful fallbacks and clear messages
  • Fast Iteration: No build step, instant updates

Extensibility βœ…

  • Easy Provider Addition: Simple factory pattern
  • Language Support: Extensible language detection
  • Graph Sources: Pluggable graph loader
  • Deployment Ready: Clear path to production (Docker, scaling, security)

πŸ› Troubleshooting

Common Issues

"GraphRAG system not initialized"

  • βœ… Ensure Analytics API is running: curl http://localhost:5001/graph/nodes
  • βœ… Check GRAPH_API_URL in .env
  • βœ… Verify codebase is indexed: curl http://localhost:8000/debug/docs

"No graph relationships found"

  • βœ… Index multiple repositories for richer graphs
  • βœ… Check graph API is returning data
  • βœ… Try different queries (some queries may not have graph connections)

LLM Provider Errors

  • βœ… OpenAI: Verify OPENAI_API_KEY is set
  • βœ… Vertex AI: Run gcloud auth application-default login
  • βœ… SageMaker: Verify endpoint exists and IAM permissions

"Address already in use"

  • βœ… Kill existing server: pkill -f uvicorn
  • βœ… Use different port: uvicorn app.main:app --port 8001

NumPy Errors

  • βœ… Downgrade NumPy: pip install "numpy<2" --force-reinstall

πŸ“ Project Structure

code-assistant-poc/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py              # FastAPI server and endpoints
β”‚   β”œβ”€β”€ rag_chain.py         # Standard RAG implementation
β”‚   β”œβ”€β”€ graph_rag.py         # GraphRAG implementation
β”‚   β”œβ”€β”€ graph_loader.py      # Graph loading and traversal
β”‚   β”œβ”€β”€ llm_factory.py       # LLM provider factory
β”‚   └── vertex_llm.py        # Vertex AI LLM wrapper
β”œβ”€β”€ scripts/
β”‚   └── index_codebase.py    # Codebase indexing script
β”œβ”€β”€ frontend/
β”‚   └── index.html           # Web UI
β”œβ”€β”€ my_codebase/             # Sample codebases
β”‚   β”œβ”€β”€ mycarhub/            # React app
β”‚   β”œβ”€β”€ mycarhub-fleet-analytics/  # Graph API (Fastify)
β”‚   └── mycarhub-service-hub/      # Service hub
β”œβ”€β”€ chroma_db/               # Vector store (created after indexing)
β”œβ”€β”€ requirements.txt         # Python dependencies
β”œβ”€β”€ .env                     # Environment variables
β”œβ”€β”€ test_graphrag.py         # Test script
└── README.md                # This file

πŸ“ Notes

  • Local Embeddings: Embeddings run locally (no API costs)
  • LLM Costs: Only LLM inference charges apply (OpenAI, Vertex AI, or SageMaker)
  • Graph API: Must be running separately for GraphRAG (see Quick Start)
  • Indexing: Re-run indexing when codebase changes significantly
  • Collection Names: Use consistent collection names between indexing and querying

πŸ“„ License

This is a proof-of-concept project for demonstration purposes.


🀝 Next Steps for Production

  1. βœ… Add authentication and authorization
  2. βœ… Implement rate limiting
  3. βœ… Add comprehensive logging and monitoring
  4. βœ… Set up CI/CD pipeline
  5. βœ… Add unit and integration tests
  6. βœ… Migrate to managed vector database (Pinecone, Weaviate)
  7. βœ… Add caching layer (Redis)
  8. βœ… Implement query history and conversation context
  9. βœ… Add deployment documentation (Docker, Kubernetes)
  10. βœ… Set up alerting and error tracking

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published