Code Assistant POC - GraphRAG with Multi-Cloud LLM Support

A sophisticated Retrieval-Augmented Generation (RAG) system enhanced with GraphRAG that combines semantic code search with knowledge graph relationships to answer questions about your codebase. Features multi-cloud LLM support (OpenAI, Google Cloud Vertex AI, AWS SageMaker) and an intuitive web interface.

🏗️ High-Level Architecture

System Overview

┌─────────────────────────────────────────────────────────────────────┐
│                         User Interface                              │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  Web Frontend (HTML/JS) - Chat Interface with Provider       │  │
│  │  Selection, GraphRAG Toggle, Real-time Query Display         │  │
│  └───────────────────────┬──────────────────────────────────────┘  │
└──────────────────────────┼──────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    FastAPI Backend Server (Port 8000)              │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  API Layer                                                    │  │
│  │  • POST /ask - Query endpoint with GraphRAG support          │  │
│  │  • GET /health - System health check                         │  │
│  │  • POST /llm/provider/{provider} - Switch LLM providers      │  │
│  │  • POST /index - Dynamic codebase indexing                   │  │
│  └───────────────────────┬──────────────────────────────────────┘  │
│                          │                                          │
│  ┌───────────────────────▼──────────────────────────────────────┐  │
│  │  GraphRAG Orchestrator (app/graph_rag.py)                    │  │
│  │  ┌─────────────────────┐    ┌───────────────────────────┐   │  │
│  │  │ 1. Vector Search    │    │ 2. Graph Traversal        │   │  │
│  │  │    (Embeddings)     │    │    (Relationships)        │   │  │
│  │  │    • ChromaDB       │    │    • Graph API (Port 5001)│   │  │
│  │  │    • Top K chunks   │    │    • Node extraction      │   │  │
│  │  └──────────┬──────────┘    └──────────┬────────────────┘   │  │
│  │             │                          │                     │  │
│  │             └──────────┬───────────────┘                     │  │
│  │                        ▼                                     │  │
│  │              ┌───────────────────────┐                      │  │
│  │              │ 3. Context Fusion      │                      │  │
│  │              │    (Merge embeddings   │                      │  │
│  │              │     + graph context)   │                      │  │
│  │              └──────────┬────────────┘                      │  │
│  └─────────────────────────┼────────────────────────────────────┘  │
└────────────────────────────┼───────────────────────────────────────┘
                             │
                  ┌──────────┴──────────┐
                  │                     │
                  ▼                     ▼
    ┌─────────────────────────┐  ┌─────────────────────────────┐
    │  LLM Provider Factory   │  │  Vector Store (ChromaDB)    │
    │  (app/llm_factory.py)   │  │  • Code embeddings          │
    │                         │  │  • Persistent storage       │
    │  ┌───────────────────┐  │  │  • Collection-based        │
    │  │ OpenAI API        │  │  └─────────────────────────────┘
    │  │ Vertex AI (Qwen)  │  │
    │  │ SageMaker (Qwen)  │  │
    │  └───────────────────┘  │
    └─────────────────────────┘

Architecture Components

1. User Interface Layer (`frontend/index.html`)

Purpose: Single-page web application for interactive querying
Features:
- Real-time query execution display
- Provider selection (Local Ollama, OpenAI, Vertex AI, SageMaker)
- GraphRAG toggle (combines embeddings + graph relationships)
- Comprehensive result display (answer, sources, graph context)
- Method and provider indicators
Design: Modern, responsive UI with terminal-style output boxes

2. API Layer (`app/main.py`)

Framework: FastAPI (high-performance async web framework)
Endpoints:
- POST /ask - Main query endpoint with GraphRAG support
- GET /health - Detailed system health checks (vector store, graph, LLM, GraphRAG)
- POST /llm/provider/{provider} - Dynamic LLM provider switching
- POST /index - Dynamic codebase indexing with background tasks
- GET /index/status - Indexing progress tracking
Features: CORS support, background task processing, real-time status updates

3. Retrieval Layer

Standard RAG (app/rag_chain.py):

Semantic search over code embeddings
LangChain RetrievalQA chain
Local embeddings model (HuggingFace sentence-transformers/all-MiniLM-L6-v2)

GraphRAG (app/graph_rag.py):

Vector Search: Retrieves top K semantically similar code chunks
Graph Traversal: Extracts relationships from knowledge graph
- Loads graph from Analytics API (http://localhost:5001/graph/nodes)
- Finds nodes related to retrieved code chunks
- Traverses graph relationships (SERVICES, PART_OF, SURFACES_IN, etc.)
Context Fusion: Combines embedding context + graph relationships
Intelligent Node Matching: Path normalization, basename matching, directory structure matching

Graph Loader (app/graph_loader.py):

Loads graph structure from Analytics API
Provides traversal functions (get neighbors, find related nodes)
Handles graph caching and error recovery

4. Generation Layer (`app/llm_factory.py`)

Unified Interface: Single entry point for all LLM providers
Providers:
- Local (Ollama): Apple Silicon-ready models such as llama3.1:8b-instruct-q4_1
- OpenAI API (ChatOpenAI): GPT-3.5/GPT-4 models
- Google Cloud Vertex AI (app/vertex_llm.py): Qwen models via Vertex AI
- AWS SageMaker (app/rag_chain.py): Qwen models via Serverless Inference
Features: Runtime provider switching, automatic fallback, credential management

5. Vector Store (ChromaDB)

Purpose: Persistent storage for code embeddings
Features: Local-first design, collection-based isolation, efficient similarity search
Embedding Model: sentence-transformers/all-MiniLM-L6-v2 (384-dim, local)

6. Indexing Pipeline (`scripts/index_codebase.py`)

Purpose: One-time codebase indexing and embedding generation
Features:
- Language-aware code chunking (respects functions, classes)
- Multi-language support (JS/TS, Python, Swift, Kotlin, Java, etc.)
- Configurable chunk size and overlap
- Batch processing for large codebases
- Progress tracking and error handling

7. Graph Analytics API (`my_codebase/mycarhub-fleet-analytics`)

Purpose: Provides knowledge graph structure (nodes and edges)
Technology: Fastify (Node.js) REST API
Endpoints: /graph/nodes, /graph/node/{id}, /graph/edges
Data: Represents relationships between code components, services, vehicles, UI elements

Data Flow

Indexing Phase (one-time setup):

Source Code → Language Detector (Pygments) → 
Code Splitter (LangChain) → 
Embedding Model (HuggingFace) → 
Vector Store (ChromaDB)

Query Phase - Standard RAG:

User Question → Embedding → Vector Search → 
Top K Code Chunks → LLM + Context → Answer

Query Phase - GraphRAG:

User Question → Embedding → Vector Search → Top K Code Chunks
                                      ↓
                            Extract Node IDs from Chunks
                                      ↓
                            Graph Traversal (Depth 2)
                                      ↓
                            Related Nodes + Relationships
                                      ↓
                            Context Fusion (Code + Graph)
                                      ↓
                            LLM + Combined Context → Answer

🤖 Model and Framework Choices

Core Frameworks

Component	Choice	Rationale
Web Framework	FastAPI	High performance, async support, automatic API docs, type safety, excellent for production
RAG Orchestration	LangChain	Industry standard for RAG pipelines, extensive integrations, well-documented
Vector Database	ChromaDB	Lightweight, local-first, easy setup, persistent storage, perfect for POC
Server	Uvicorn	ASGI server with excellent performance, production-ready
Frontend	Vanilla HTML/JS	No build step, fast iteration, demonstrates backend capabilities

AI Models

Embedding Model

Model: sentence-transformers/all-MiniLM-L6-v2 (Hugging Face)
Dimensions: 384
Purpose: Convert code and queries into vector representations
Rationale:
- ✅ Lightweight yet high-quality semantic embeddings
- ✅ Runs entirely on CPU (no GPU required)
- ✅ Fast indexing and retrieval
- ✅ Proven performance on code understanding tasks
- ✅ No API costs (runs locally)

Language Models (LLMs)

OpenAI API (Default):

Models: gpt-3.5-turbo, gpt-4, gpt-4-turbo-preview
Why: Reliable, fast, excellent for code analysis, easy to configure
Use Case: Primary option for quick testing and development

Google Cloud Vertex AI:

Model: Qwen models (e.g., qwen-2.5-7b-instruct)
Why:
- Managed service with automatic scaling
- Access to cutting-edge models (Qwen)
- Enterprise-grade infrastructure
- Integrated with Google Cloud ecosystem
Use Case: Production deployments, code analysis with specialized models

AWS SageMaker Serverless:

Model: Qwen models via JumpStart
Why:
- Serverless auto-scaling (pay per use)
- Integrated with AWS ecosystem
- IAM-based security
- No infrastructure management
Use Case: AWS-native deployments, cost-effective scaling

Code Processing

Pygments: Syntax highlighting and language detection
LangChain Text Splitters: Language-aware code chunking
- Respects code structure (functions, classes, methods)
- Configurable chunk size (default: 1000 chars) and overlap (default: 100 chars)
- Supports: JS/TS, Python, Swift, Kotlin, Java, HTML/CSS, JSON, YAML, Markdown

Supported Languages

The indexer automatically detects and processes:

JavaScript/TypeScript: .js, .jsx, .ts, .tsx
Python: .py
Swift: .swift, .m, .h
Kotlin/Java: .kt, .java, .gradle
Web: .html, .css
Config: .json, .yaml, .toml, .xml, .md

🚀 How to Run or Test the Tool

Prerequisites

Python 3.9+ (3.9 recommended for compatibility)
Node.js 14+ (for Graph Analytics API)
pip package manager
LLM provider available locally or via cloud (pick what fits your hardware):
- Ollama (Apple Silicon / local GPUs) for offline mode
- OpenAI API key (recommended for quick start)
- Google Cloud project with Vertex AI enabled (optional)
- AWS account with SageMaker endpoint (optional)

Quick Start (3 Steps)

1. Install Dependencies

pip install -r requirements.txt

2. Configure Environment

Create .env file in project root:

# LLM Provider (default: local Ollama for Apple Silicon / M-series)
LLM_PROVIDER=ollama

# Local Ollama configuration (install via https://ollama.com/download)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL_NAME=llama3.1:8b-instruct-q4_1
OLLAMA_TEMPERATURE=0.1
OLLAMA_CONTEXT=8192

# OpenAI Configuration (optional backup provider)
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL_NAME=gpt-3.5-turbo
OPENAI_TEMPERATURE=0.2

# Google Cloud Vertex AI (optional, for hosted Qwen)
GCP_PROJECT_ID=your-project-id
GCP_REGION=us-central1
VERTEX_MODEL_NAME=qwen-2.5-7b-instruct
VERTEX_LOCATION=us-central1

# AWS SageMaker (optional, for hosted Qwen)
AWS_REGION=us-east-1
SAGEMAKER_ENDPOINT_NAME=your-endpoint-name
SAGEMAKER_MAX_NEW_TOKENS=2048
SAGEMAKER_TEMPERATURE=0.2
SAGEMAKER_TOP_P=0.9

# Graph API (for GraphRAG)
GRAPH_API_URL=http://localhost:5001/graph/nodes

# RAG Settings
RAG_K=10
GRAPH_DEPTH=2
GRAPH_MAX_NODES=20

# Vector Store
LOCAL_VECTOR_STORE_PATH=chroma_db
LOCAL_COLLECTION_NAME=code_assistant_local

Apple Silicon (M3/M2) Local LLM Setup

Install Ollama (runs optimized models locally):

brew install ollama
ollama serve  # keep running in a separate terminal

Pull the recommended model (fits comfortably on M3 and delivers strong code answers):
```
ollama pull llama3.1:8b-instruct-q4_1
```

Verify the model responds locally:

ollama run llama3.1:8b-instruct-q4_1 "Summarize this repo."

Start the backend (it will now default to the local model because LLM_PROVIDER=ollama).

3. Start Services

Terminal 1 - Graph Analytics API (required for GraphRAG):

cd my_codebase/mycarhub-fleet-analytics
npm install  # First time only
npm run dev

Terminal 2 - Main Server:

uvicorn app.main:app --host 0.0.0.0 --port 8000

Open Browser: http://localhost:8000/

Indexing Your Codebase

Basic Indexing

python3.9 scripts/index_codebase.py /path/to/your/codebase \
  --output ./chroma_db \
  --collection code_assistant_local

Index Multiple Repositories (for GraphRAG)

# Index main repository
python3.9 scripts/index_codebase.py ./my_codebase/mycarhub/src \
  --collection code_assistant_local

# Index analytics repository
python3.9 scripts/index_codebase.py ./my_codebase/mycarhub-fleet-analytics \
  --collection code_assistant_local

# Index service hub repository
python3.9 scripts/index_codebase.py ./my_codebase/mycarhub-service-hub \
  --collection code_assistant_local

Indexing Options

Option	Short	Default	Description
`codebase_path`	-	required	Path to the codebase directory
`--output`	`-o`	`./chroma_db`	Output directory for vector store
`--collection`	`-c`	`code_assistant_local`	Collection name
`--chunk-size`	-	`1000`	Size of each code chunk (characters)
`--chunk-overlap`	-	`100`	Overlap between chunks (characters)

Testing the Tool

Method 1: Web Interface

Open http://localhost:8000/
Select LLM provider from dropdown
Enable/disable GraphRAG checkbox
Enter question and click "Ask Question"
View results: answer, sources, graph context

Method 2: API Testing

# Health check
curl http://localhost:8000/health

# Ask question (Standard RAG)
curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "What components are in this React app?", "use_graph_rag": false}'

# Ask question (GraphRAG)
curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "How do service packages relate to vehicles?", "use_graph_rag": true}'

# Switch LLM provider
curl -X POST http://localhost:8000/llm/provider/openai

# Check current provider
curl http://localhost:8000/llm/provider

Method 3: Command-Line Test Script

# Health check
python3.9 test_graphrag.py --health

# Test query
python3.9 test_graphrag.py "What components are in this React app?"

Example Queries

Standard RAG (code-only):

"What components are in this React app?"
"How does user authentication work?"
"Show me the CarCard component implementation"

GraphRAG (code + relationships):

"How do service packages relate to vehicles?"
"What UI components display vehicle information?"
"Show me dependencies between fleet analytics and service hub"
"What files are related to the Toyota Camry?"

🏛️ Architecture Design

Component Structure

The system is designed with clear separation of concerns:

Modular Design

app/main.py: API layer only (endpoints, request handling)
app/rag_chain.py: Standard RAG implementation
app/graph_rag.py: GraphRAG implementation (extends RAG with graph)
app/graph_loader.py: Graph data loading and traversal
app/llm_factory.py: LLM provider abstraction
app/vertex_llm.py: Vertex AI-specific LLM wrapper
scripts/index_codebase.py: Standalone indexing utility
frontend/index.html: Self-contained UI

Separation of Concerns

UI ↔ API: Clear REST API contract
API ↔ Business Logic: RAG/GraphRAG modules are independent
Retrieval ↔ Generation: Retrieval modules don't depend on LLM choice
Storage ↔ Processing: Vector store is abstracted via LangChain

Extensibility Points

LLM Providers: Easy to add new providers via llm_factory.py
Embedding Models: Swappable via LangChain interface
Graph Sources: graph_loader.py can load from different APIs
UI Features: Frontend can be enhanced without backend changes

Error Handling

Graceful degradation (GraphRAG falls back to RAG)
Detailed error messages in health checks
Retry logic for external API calls
Clear user feedback in UI

✅ Functionality

Meaningful Responses to Code Questions

The system provides contextually relevant, accurate answers by:

Semantic Understanding: Embeddings capture code semantics, not just keywords
Context Retrieval: Top K most relevant code chunks are retrieved
Relationship Discovery: GraphRAG discovers cross-file, cross-repo relationships
Context Fusion: Code snippets + graph relationships provide comprehensive context
Natural Language Generation: LLM synthesizes clear, coherent answers

Response Quality

Relevance: Retrieval finds semantically similar code to the question
Accuracy: Answers are grounded in actual code (sources provided)
Completeness: GraphRAG finds related components the user might not know about
Clarity: LLM generates well-structured, readable answers

Example Response Structure

{
  "answer": "Based on the provided code and graph relationships...",
  "sources": [
    {"source": "CarCard.js", "content": "..."},
    {"source": "App.js", "content": "..."}
  ],
  "graph_context": "- Vehicle: Toyota Camry\n  Relations: SERVICES: Performance Max...",
  "nodes_found": 12,
  "method": "graph_rag",
  "provider": "openai"
}

🎨 Creativity & Exploration

RAG Integration Exploration

This implementation demonstrates advanced RAG techniques:

1. GraphRAG Innovation

Combines traditional vector search with knowledge graph relationships
Discovers implicit connections between code components
Enhances retrieval with relationship-aware context

2. Multi-Cloud LLM Support

Unified Interface: Single API for multiple LLM providers
Runtime Switching: Change providers without restart
Fallback Logic: Automatic provider fallback on errors
Cost Optimization: Choose provider based on use case

3. Intelligent Node Matching

Path Normalization: Handles absolute vs. relative paths
Basename Matching: Matches files by name across directories
Directory Structure Matching: Finds related files in similar structures
Heuristic Matching: Uses content patterns to find related nodes

4. Dynamic Context Fusion

Adaptive Context: Combines varying amounts of code + graph context
Configurable Depth: Adjustable graph traversal depth
Smart Filtering: Limits context to most relevant relationships

5. Real-time Feedback

Query Execution Display: Shows method, provider, node count in real-time
Source Attribution: Provides exact code snippets used
Graph Visualization: Shows relationships discovered

Exploration Highlights

Cross-Repository Understanding: GraphRAG understands dependencies across repos
Relationship Discovery: Finds connections user might not know exist
Contextual Answers: Answers consider both code and relationships
Provider Flexibility: Can test same query across different LLMs

🎯 Practical Usability

Chat Interface Design

User-Friendly Features

✅ Clear Query Section: Shows what question was asked
✅ Method Indicators: Visual badges showing RAG method and provider
✅ Real-time Status: Terminal-style execution display
✅ Source Citations: Clickable/reviewable code snippets
✅ Graph Context: Visual display of relationships found
✅ Error Handling: Clear error messages with troubleshooting hints

Accessibility

Clean, readable typography
High contrast for terminal outputs
Responsive layout (works on different screen sizes)
Keyboard-friendly (can tab through controls)

Provider Selection

Dropdown for easy provider switching
Visual indicators (🔵 Vertex AI, ⚪ OpenAI, 🟠 SageMaker)
Status display showing current provider
No page reload needed (seamless switching)

GraphRAG Toggle

Clear checkbox to enable/disable GraphRAG
Helpful tooltip/description
Visual feedback when graph context is found

Workflow Efficiency

Index Once: Codebase indexed once, reused for all queries
Fast Queries: Retrieval + generation typically < 5 seconds
No Context Switching: Everything in one interface
Query History: Can see previous query details (in terminal output)

Error Recovery

Graceful Degradation: GraphRAG falls back to RAG if graph unavailable
Clear Error Messages: Tells user what went wrong and how to fix
Health Checks: /health endpoint shows system status
Retry Logic: Automatically retries failed initializations

🔧 Extensibility

How to Scale or Adapt

1. Adding New LLM Providers

Step 1: Create provider wrapper in app/ (e.g., app/ollama_llm.py)

from langchain_core.language_models.base import BaseLanguageModel

class OllamaLLM(BaseLanguageModel):
    def invoke(self, prompt: str):
        # Your Ollama implementation
        pass

Step 2: Add to app/llm_factory.py

def _get_ollama_llm():
    return OllamaLLM(...)

def get_llm(provider: str):
    if provider == "ollama":
        return _get_ollama_llm()
    # ... existing providers

Step 3: Update frontend dropdown (optional)

2. Adding New Embedding Models

Option 1: Via environment variable

LOCAL_EMBEDDING_MODEL=all-mpnet-base-v2

Option 2: Programmatically in app/rag_chain.py

from langchain_huggingface import HuggingFaceEmbeddings

LOCAL_EMBEDDING_MODEL = os.getenv("LOCAL_EMBEDDING_MODEL", "all-mpnet-base-v2")

3. Supporting More Languages

Edit scripts/index_codebase.py:

LANGUAGE_EXTENSIONS = {
    "Rust": [".rs"],
    "Go": [".go"],
    # ... add your language
}

4. Custom Graph Sources

Modify app/graph_loader.py to load from:

Neo4j database
GraphQL API
Custom REST endpoint
Local JSON file

5. Enhanced Graph Traversal

Add new traversal strategies in app/graph_loader.py:

def traverse_by_importance(self, node_ids, max_nodes=20):
    # Custom traversal based on node importance scores
    pass

6. Multiple Vector Stores

Already supported! Use different collections:

python3.9 scripts/index_codebase.py ./repo1 --collection repo1_code
python3.9 scripts/index_codebase.py ./repo2 --collection repo2_code

7. Incremental Indexing

Add to scripts/index_codebase.py:

def update_index(codebase_path, collection, changed_files):
    # Only re-index changed files
    pass

8. Authentication & Authorization

Add to app/main.py:

from fastapi import Depends, HTTPException
from fastapi.security import HTTPBearer

security = HTTPBearer()

@app.post("/ask")
async def ask_question(q: Question, token: str = Depends(security)):
    # Verify token
    pass

9. Caching

Add Redis/Memcached caching:

import redis

cache = redis.Redis()

@app.post("/ask")
async def ask_question(q: Question):
    cache_key = f"query:{hash(q.question)}"
    cached = cache.get(cache_key)
    if cached:
        return json.loads(cached)
    # ... normal processing
    cache.setex(cache_key, 3600, json.dumps(result))
    return result

10. Monitoring & Observability

Add logging/metrics:

import logging
from prometheus_client import Counter

query_counter = Counter('queries_total', 'Total queries')

@app.post("/ask")
async def ask_question(q: Question):
    query_counter.inc()
    logging.info(f"Query: {q.question}")
    # ... normal processing

Production Deployment Considerations

Docker Containerization

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Horizontal Scaling

Vector store: Use ChromaDB server mode or migrate to Pinecone/Weaviate
API: Stateless FastAPI instances behind load balancer
Graph API: Replicate or use database-backed graph

Database Migration

Vector Store: ChromaDB → Pinecone (managed) or Weaviate (self-hosted)
Graph: JSON API → Neo4j or ArangoDB

Security Enhancements

API key authentication
Rate limiting
Input validation and sanitization
CORS restrictions for production
Secrets management (AWS Secrets Manager, GCP Secret Manager)

📊 Evaluation Criteria Alignment

Architecture Design ✅

Modular Components: Clear separation (UI, API, retrieval, generation)
Clean Interfaces: REST API, LangChain abstractions
Scalable Design: Stateless API, swappable components
Error Handling: Graceful degradation, clear error messages

Functionality ✅

Meaningful Responses: Grounded in actual code with sources
Context Understanding: Semantic search + graph relationships
Multi-Modal Retrieval: Vector search + graph traversal
Provider Flexibility: Works with multiple LLM backends

Creativity & Exploration ✅

GraphRAG Innovation: Combines embeddings + knowledge graph
Multi-Cloud Support: Unified interface for different providers
Intelligent Matching: Advanced path/node matching heuristics
Dynamic Context: Adaptive context fusion based on graph depth

Practical Usability ✅

Intuitive UI: Clear query interface with real-time feedback
Comprehensive Results: Answer + sources + graph context
Error Recovery: Graceful fallbacks and clear messages
Fast Iteration: No build step, instant updates

Extensibility ✅

Easy Provider Addition: Simple factory pattern
Language Support: Extensible language detection
Graph Sources: Pluggable graph loader
Deployment Ready: Clear path to production (Docker, scaling, security)

🐛 Troubleshooting

Common Issues

"GraphRAG system not initialized"

✅ Ensure Analytics API is running: curl http://localhost:5001/graph/nodes
✅ Check GRAPH_API_URL in .env
✅ Verify codebase is indexed: curl http://localhost:8000/debug/docs

"No graph relationships found"

✅ Index multiple repositories for richer graphs
✅ Check graph API is returning data
✅ Try different queries (some queries may not have graph connections)

LLM Provider Errors

✅ OpenAI: Verify OPENAI_API_KEY is set
✅ Vertex AI: Run gcloud auth application-default login
✅ SageMaker: Verify endpoint exists and IAM permissions

"Address already in use"

✅ Kill existing server: pkill -f uvicorn
✅ Use different port: uvicorn app.main:app --port 8001

NumPy Errors

✅ Downgrade NumPy: pip install "numpy<2" --force-reinstall

📁 Project Structure

code-assistant-poc/
├── app/
│   ├── main.py              # FastAPI server and endpoints
│   ├── rag_chain.py         # Standard RAG implementation
│   ├── graph_rag.py         # GraphRAG implementation
│   ├── graph_loader.py      # Graph loading and traversal
│   ├── llm_factory.py       # LLM provider factory
│   └── vertex_llm.py        # Vertex AI LLM wrapper
├── scripts/
│   └── index_codebase.py    # Codebase indexing script
├── frontend/
│   └── index.html           # Web UI
├── my_codebase/             # Sample codebases
│   ├── mycarhub/            # React app
│   ├── mycarhub-fleet-analytics/  # Graph API (Fastify)
│   └── mycarhub-service-hub/      # Service hub
├── chroma_db/               # Vector store (created after indexing)
├── requirements.txt         # Python dependencies
├── .env                     # Environment variables
├── test_graphrag.py         # Test script
└── README.md                # This file

📝 Notes

Local Embeddings: Embeddings run locally (no API costs)
LLM Costs: Only LLM inference charges apply (OpenAI, Vertex AI, or SageMaker)
Graph API: Must be running separately for GraphRAG (see Quick Start)
Indexing: Re-run indexing when codebase changes significantly
Collection Names: Use consistent collection names between indexing and querying

📄 License

This is a proof-of-concept project for demonstration purposes.

🤝 Next Steps for Production

✅ Add authentication and authorization
✅ Implement rate limiting
✅ Add comprehensive logging and monitoring
✅ Set up CI/CD pipeline
✅ Add unit and integration tests
✅ Migrate to managed vector database (Pinecone, Weaviate)
✅ Add caching layer (Redis)
✅ Implement query history and conversation context
✅ Add deployment documentation (Docker, Kubernetes)
✅ Set up alerting and error tracking

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
frontend		frontend
my_codebase		my_codebase
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
test_graphrag.py		test_graphrag.py

SingaporeSDK/code-assistant-poc-localLLM

Folders and files

Latest commit

History

Repository files navigation