Transform any GitHub repository into a structured knowledge graph.
RepoMind is a multi-language code intelligence engine that analyzes source code using compiler techniques, builds semantic relationships between code entities, and provides AI-powered insights through GraphRAG.
Unlike traditional AI code assistants that rely primarily on large language models, RepoMind extracts structured information directly from source code using Abstract Syntax Trees (ASTs), static analysis, and graph-based representations. The LLM is used only as a reasoning layer on top of verified program knowledge.
Modern repositories are too large to understand by reading files manually.
RepoMind aims to make any codebase explorable by converting source code into structured knowledge.
Given a repository, RepoMind will:
- Understand project architecture
- Build dependency and call graphs
- Detect dead code and technical debt
- Generate onboarding documentation
- Explain execution flow
- Build a semantic knowledge graph
- Enable natural language querying through GraphRAG
The long-term goal is to provide compiler-level understanding combined with AI-assisted reasoning.
- Structure before intelligence — Every feature produces structured data before involving an LLM.
- Language-agnostic architecture — Support multiple programming languages through a unified parsing pipeline.
- Knowledge-first design — Source code is transformed into entities, relationships, and graphs rather than treated as plain text.
- Extensible by design — New languages, analyzers, and graph backends can be added without changing the core architecture.
Repository
│
▼
Repository Scanner
│
▼
Language Detection
│
▼
Parser Factory
│
▼
Tree-sitter Parser
│
▼
AST Generation
│
▼
Entity Extraction
│
▼
Semantic Analysis
│
▼
Knowledge Graph
│
▼
Embeddings
│
▼
GraphRAG
│
▼
LLM Reasoning
- Repository cloning
- Recursive file indexing
- Multi-language support
- Incremental repository analysis
- AST generation using Tree-sitter
- Function extraction
- Class extraction
- Import analysis
- Variable extraction
- Call graph generation
- Type relationships
- Symbol references
- Dead code detection
- Circular dependency detection
- Unused imports
- Unused variables
- Complexity analysis
- Technical debt estimation
- Maintainability metrics
- Entity extraction
- Relationship extraction
- Neo4j integration
- Dependency graph
- Call graph
- Architecture graph
- GraphRAG
- Repository Q&A
- Architecture explanation
- Documentation generation
- Onboarding guides
- Bug explanation
- Code summarization
- Next.js
- React
- TypeScript
- Tailwind CSS
- FastAPI
- Python
- Tree-sitter
- Tree-sitter Queries
- PostgreSQL
- Neo4j
- Qdrant (planned)
- GraphRAG
- OpenAI / Local LLMs
- Embedding Models
- Docker
- Kubernetes
- GitHub Actions
repomind/
├── frontend/
├── backend/
│ ├── app/
│ │ ├── ast/
│ │ ├── builders/
│ │ ├── entities/
│ │ ├── models/
│ │ ├── parsers/
│ │ ├── routers/
│ │ ├── services/
│ │ └── utils/
│ └── repos/
├── docs/
├── docker/
└── scripts/
RepoMind follows a layered architecture.
Each layer has a single responsibility.
- Parsers understand syntax.
- Builders convert syntax into entities.
- Analyzers discover relationships.
- Graph engines organize knowledge.
- LLMs reason over structured information.
This separation makes the platform easier to extend, test, and maintain while keeping AI grounded in verified program structure.
- Repository cloning
- Recursive file tree generation
- Language detection
- Tree-sitter integration
- Multi-language parser framework
- Entity extraction
- Semantic analysis
- Knowledge graph generation
- Static analysis engine
- Embedding pipeline
- GraphRAG
- AI-powered repository assistant
RepoMind is currently under active development.
Contributions, discussions, and suggestions are welcome as the architecture evolves.
This project is licensed under the MIT License.