CodeSage

CodeSage is a local codebase intelligence system that combines Retrieval-Augmented Generation (RAG) with static analysis to facilitate code exploration, refactoring, and quality assessment. By integrating Abstract Syntax Tree (AST) parsing, vector embeddings, and dependency graph traversal, the system provides contextual responses to architectural queries and calculates complexity metrics autonomously.

Technical Capabilities

1. Semantic Codebase Querying

Users can query the codebase using natural language. The system implements a Retrieval-Augmented Generation pipeline:

Vector Search: Converts queries into dense vector embeddings and retrieves relevant code segments from a local FAISS index.
Graph Augmentation: Extracts relational context (imports, class hierarchies, function calls) from an in-memory NetworkX graph.
LLM Reasoning: Passes the aggregated context to a Large Language Model (via Groq) to synthesize technically accurate responses based strictly on the provided codebase context.

2. Dependency Tracking & Impact Analysis

The system constructs a relational dependency graph mapping files, classes, and methods. Users can supply a target module to compute its fan-in configuration, revealing downstream dependencies and calculating its overall modification risk tier (often referred to as a "blast radius" analysis).

3. AST Complexity Heuristics

A static analysis engine utilizes Python's native ast module to evaluate code quality at runtime. It profiles the source code against explicit programmatic heuristics:

Cyclomatic complexity tracking (branching density and nested control structures).
Function length anomalies.
Parameter count thresholds.
Multiple-exit-point detection.

4. Autonomous Agents & Refactoring

CodeSage integrates specialized LLM-driven agents:

Bug Hunter: Ingests unformatted error tracebacks, traverses the vector index to identify the failing source file, and proposes programmatic fixes.
Auto-Refactor: Analyzes complex or unstructured code blocks and streams optimized, statically typed, and PEP-8 compliant suggestions back to the client interface.

5. Persistent Observation & Storage

Event-Driven Synchronization: A watchdog daemon monitors the active directory tree for filesystem modifications, automatically calculating incremental semantic vectors and updating the AST graph to prevent index staleness.
Query Logging: Integrates with pymongo to persistently log historical interactions, metadata, and LLM responses natively into a MongoDB database process without blocking the visualization thread.

Technical Architecture & Stack

Extraction & Mapping: Native Python ast, networkx.
Vector Engine: sentence-transformers (all-MiniLM-L6-v2) generating embeddings stored in faiss-cpu.
Reasoning API: Groq (llama-3.1-8b-instant).
Storage Configuration: MongoDB (mongodb://127.0.0.1:27018).
Interface Layer: Streamlit runtime.

Deployment & Setup

1. Environment Initialization Ensure Python 3.10+ is installed on the host system.

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Environment Variables Configuration Rename the .env.example template to .env and configure the required API values:

GROQ_API_KEY=gsk_your_key_here
LLM_MODEL=llama-3.1-8b-instant
EMBEDDING_MODEL=all-MiniLM-L6-v2

3. MongoDB Provisioning To enable the UI Query History feature, initialize an isolated MongoDB daemon on port 27018:

mkdir -p ~/mongodb-codesage
mongod --port 27018 --dbpath ~/mongodb-codesage

Execution Routine

The system relies on cached embeddings and graph nodes before the interface can effectively parse queries.

1. Calculate Base Index Recursively parse the target directory to generate .index, .json, and .pkl artifacts.

python3 main.py index

2. Launch Visualization Client Start the main application dashboard locally.

python3 main.py ui

The Streamlit client will bind to localhost:8501 by default.

3. Run Watchdog Daemon (Concurrent Task) Optionally spin up a concurrent terminal to maintain index integrity as source code is modified in real-time.

python3 main.py watch

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
core		core
db		db
lib		lib
ui		ui
.gitignore		.gitignore
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeSage

Technical Capabilities

1. Semantic Codebase Querying

2. Dependency Tracking & Impact Analysis

3. AST Complexity Heuristics

4. Autonomous Agents & Refactoring

5. Persistent Observation & Storage

Technical Architecture & Stack

Deployment & Setup

Execution Routine

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodeSage

Technical Capabilities

1. Semantic Codebase Querying

2. Dependency Tracking & Impact Analysis

3. AST Complexity Heuristics

4. Autonomous Agents & Refactoring

5. Persistent Observation & Storage

Technical Architecture & Stack

Deployment & Setup

Execution Routine

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages