CodeBook — The Code Understanding Layer

CodeBook is a universal translation layer for software code — an MCP Server that bridges non-technical stakeholders (PMs, domain experts, managers) and developers by understanding code intent through natural language and providing clear, role-specific insights into complex systems.

What is CodeBook?

CodeBook enables anyone to understand, diagnose, and propose changes to software systems without writing code. It transforms code repositories into structured, queryable knowledge that adapts its explanations based on your role.

Core capabilities:

Blueprint Scanning — Analyze entire codebases and create visual dependency maps
Module Understanding — Deep dive into specific components with contextual summaries
Problem Diagnosis — Trace code paths to pinpoint bugs or understand functionality
Interactive Q&A — Ask domain-specific questions and get structured answers
Code Generation — Propose changes with unified diffs and impact analysis

Target Users:

Developers — Understand new projects and unfamiliar modules faster
Project Managers — See architectural dependencies and change impact in business terms
Domain Experts — Verify implementations match domain requirements
QA/DevOps — Track system health and change coverage

Installation

Prerequisites

Python 3.10 or higher
pip3 (macOS/Linux) or uv

macOS users: macOS does not ship with python / pip. Use python3 / pip3 instead, or install uv for a zero-config experience.

Quick Start

# Clone the repository
git clone https://github.com/JAAAACY/Codebook.git
cd Codebook

# Install the MCP server
cd mcp-server
pip3 install -e ".[dev]"

# Verify installation
python3 -m pytest tests/ -q

Upgrading from a Previous Version

If you already have CodeBook installed:

cd Codebook
git pull origin main

cd mcp-server
pip3 install -e ".[dev]" --upgrade

# Verify tree-sitter grammars are available (new in v0.2)
python3 -c "import tree_sitter_language_pack; print('OK')"

What changed in v0.2: tree-sitter-language-pack is now a core dependency (previously optional). The upgrade command will install it automatically. This gives you native AST parsing for Bash, JS, TS, Python and 10+ languages — no more regex fallback.

Integration with Claude Desktop

CodeBook runs as an MCP Server, providing instant access within Claude Desktop or other MCP-compatible applications.

Edit your MCP configuration file (typically ~/.claude_desktop_config.json or platform-specific equivalent):

{
  "mcpServers": {
    "codebook": {
      "command": "python3",
      "args": ["-m", "src.server"],
      "cwd": "/path/to/Codebook/mcp-server"
    }
  }
}

Then restart Claude Desktop to activate the tools.

7 Core Tools

1. scan_repo

Analyzes a Git repository to create a blueprint overview with module grouping and dependency visualization.

Inputs:

repo_url (string) — HTTPS Git URL
role (string) — "dev" | "pm" | "domain_expert" (controls output language style)
depth (string) — "overview" (lightweight blueprint) | "detailed" (all module cards)

Outputs:

Module list with file counts and public interfaces
NetworkX-based dependency graph
Mermaid diagram for visualization
Repository statistics (functions, classes, imports, calls)

Example:

{
  "repo_url": "https://github.com/fastapi/fastapi.git",
  "role": "pm",
  "depth": "overview"
}

2. read_chapter

Deep-dive into a specific module with function signatures, class definitions, call relationships, and contextual summaries.

Inputs:

module_name (string) — Name of the logical module (e.g., "authentication", "database")
role (string) — "dev" | "pm" | "domain_expert"

Outputs:

Module summary (translated to role perspective)
Module cards (per-file functions/classes/calls)
Dependency graph for this module

Example:

{
  "module_name": "authentication",
  "role": "domain_expert"
}

3. diagnose

Trace code paths from natural language problem descriptions to exact file locations and call chains.

Inputs:

query (string) — Natural language description (e.g., "Where does the login timeout error get triggered?")
module_name (string) — Optional scope restriction
role (string) — "dev" | "pm" | "domain_expert"

Outputs:

Matched nodes (functions/classes that fit the query)
Call chain (Mermaid sequence diagram)
Exact file:line locations

Example:

{
  "query": "How is user authentication verified during login?",
  "module_name": "auth",
  "role": "domain_expert"
}

4. ask_about

Multi-turn conversation about a module, combining code context with LLM reasoning for complex questions.

Inputs:

module_name (string) — Target module
question (string) — Natural language question
conversation_history (array, optional) — Prior Q&A turns
role (string) — "ceo" | "pm" | "dev" | "qa"

Outputs:

Structured context (code snippets, dependencies)
Guidance for the host LLM
Modules referenced in the answer

Example:

{
  "module_name": "payment_processor",
  "question": "What happens when a payment fails?",
  "role": "pm",
  "conversation_history": []
}

5. codegen

Proposes code changes based on natural language instructions, with validation and blast radius analysis.

Inputs:

instruction (string) — What to change (e.g., "Rename getUserById to getUser across all files")
repo_path (string) — Local path to cloned repository
locate_result (object, optional) — Diagnostic output from prior tools
role (string) — "dev" | "pm"

Outputs:

Change summary (human-readable)
Unified diff (apply with patch -p1)
Blast radius (affected modules)
Verification steps

Example:

{
  "instruction": "Add error handling for network timeouts in the API client",
  "repo_path": "/path/to/repo",
  "role": "dev"
}

6. term_correct (Optional)

Normalizes domain terminology across different naming conventions (internal vocabulary builder).

7. memory_feedback (Optional)

Logs user annotations to improve future explanations (data flywheel for semantic learning).

Role System: Adapting Output to Your Perspective

CodeBook outputs change based on your role:

Role	Best For	Output Style
dev	Developers	Technical, mentions function signatures, call chains, implementation details
pm	Project Managers	Business impact, module boundaries, change risks, team communication
domain_expert	Subject Matter Experts	Domain terminology, business rules validation, regulatory/compliance concerns
ceo	Leadership	Executive summary, strategic implications, resource impact
qa	QA/Testers	Test coverage, edge cases, integration points

Each tool translates its output to match your role's needs without changing the underlying analysis.

Configuration

Environment Variables

Variable	Default	Purpose
`CODEBOOK_LOG_LEVEL`	`INFO`	Logging verbosity (DEBUG, INFO, WARNING, ERROR)
`CODEBOOK_MAX_REPO_SIZE_MB`	`100`	Maximum repository size to analyze
`CODEBOOK_CACHE_DIR`	`~/.codebook/`	Local cache for parsed repositories

Prompt Configuration

CodeBook uses structured prompt templates in mcp-server/src/config/ to generate role-specific explanations. Edit these to customize output style for your organization.

Architecture Overview

User Input (natural language query)
    ↓
[MCP Server Layer] ── Routes requests to appropriate tool
    ↓
[Code Analysis Layer] ── Tree-sitter AST parsing + regex fallback
    ↓                     (graceful degradation when tree-sitter unavailable)
    ↓
[Role Adapter Layer] ── Translates technical details to user role perspective
    ↓
[Output Formatter] ── Mermaid diagrams, JSON, unified diffs

CodeBook does not train custom models. It leverages high-quality LLM reasoning (via the MCP host) combined with precise code analysis to deliver accurate insights.

Graceful Degradation

CodeBook uses a two-tier parsing strategy to ensure reliability:

Full mode (tree-sitter): High-fidelity AST parsing with complete function signatures, class hierarchies, call chains, and scope tracking. Requires tree-sitter-language-pack.
Partial mode (regex fallback): When tree-sitter is unavailable or fails for a specific language, CodeBook automatically falls back to regex-based extraction. This captures top-level functions, classes, imports, and basic call patterns.

Each parsed file includes a parse_method field (full / partial / basic / failed) so downstream tools and users know the precision level. When more than 50% of files use simplified parsing, scan results include a warning.

tree-sitter-language-pack is a core dependency and will be installed automatically with pip install -e . or pip install codebook-mcp. After installation, run codebook doctor to verify all language parsers are available.

If tree-sitter is missing or a specific language grammar fails to load at runtime, the system automatically falls back to regex extraction — it never crashes, and always produces usable results at the best available precision level.

Testing

All 396 tests pass (100% coverage of core features):

cd mcp-server

# Run all tests
python3 -m pytest tests/ -v

# Run specific test module
python3 -m pytest tests/test_scan_repo.py -v

# Run with coverage
python3 -m pytest tests/ --cov=src --cov-report=html

Tests include:

Acceptance tests — Full end-to-end workflows with real codebases
Unit tests — Parser, graph, summarizer, and tool logic
Integration tests — Tool interaction with caching and role adaptation

Development Workflow

Project Structure

codebook/
├── .github/
│   └── workflows/
│       └── test.yml              # GitHub Actions CI pipeline
├── mcp-server/                   # Main MCP server package
│   ├── src/
│   │   ├── server.py             # MCP entry point
│   │   ├── tools/                # 7 tool implementations
│   │   ├── parsers/              # Code analysis (AST, modules, dependencies)
│   │   ├── summarizer/           # Module card generation
│   │   ├── memory/               # Project memory & data flywheel
│   │   └── config/               # Prompt templates
│   ├── tests/                    # 396 tests
│   ├── pyproject.toml            # Dependencies and build config
│   └── README.md                 # Server-specific documentation
├── files/                        # Design documents
│   ├── CLAUDE.md                 # Immutable project rules
│   ├── CONTEXT.md                # Dynamic development status
│   └── INTERFACES.md             # Data structure contracts
└── docs/                         # User guides and API reference

Code Standards

Language: Python 3.10+
Testing: pytest with 99%+ pass rate
Logging: structlog (no print statements)
Type Safety: Full type hints on public APIs
Dependencies: Tree-sitter (parsing), NetworkX (graphs), FastMCP (server)

Making Changes

Create a feature branch
Edit code in mcp-server/src/
Add tests in mcp-server/tests/
Run pytest tests/ -q to verify
Create a pull request with description

Contributing

We welcome contributions! Please:

Fork the repository
Create a feature branch: git checkout -b feature/your-feature
Write tests for new functionality (CodeBook aims for 99%+ test coverage)
Follow code style — PEP 8, type hints, structlog for logging
Document changes in code comments and commit messages
Push and open a Pull Request with a clear description

Contributing Guidelines

Keep commits atomic and well-described
Update INTERFACES.md if you modify tool contracts
Run full test suite before submitting: pytest tests/ -q
Avoid external API calls in tests; use fixtures and mocks

License

CodeBook is released under the MIT License. See the LICENSE file for full terms.

Roadmap

Q2 2026: Web UI for non-developer stakeholders
Q3 2026: Integration with CI/CD pipelines (GitHub Actions, GitLab CI)
Q4 2026: Custom domain terminology learning (data flywheel v2)

Support

Issues: GitHub Issues
Discussions: GitHub Discussions

CodeBook — Making code transparent to everyone.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github/workflows		.github/workflows
files		files
mcp-server		mcp-server
prompts		prompts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

CodeBook — The Code Understanding Layer

What is CodeBook?

Installation

Prerequisites

Quick Start

Upgrading from a Previous Version

Integration with Claude Desktop

7 Core Tools

1. scan_repo

2. read_chapter

3. diagnose

4. ask_about

5. codegen

6. term_correct (Optional)

7. memory_feedback (Optional)

Role System: Adapting Output to Your Perspective

Configuration

Environment Variables

Prompt Configuration

Architecture Overview

Graceful Degradation

Testing

Development Workflow

Project Structure

Code Standards

Making Changes

Contributing

Contributing Guidelines

License

Roadmap

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages