Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ gitignored).
| Directory | Audience | Purpose |
|-----------|----------|---------|
| **`.agents/skills/`** (`.claude/skills/`, `.cursor/skills/`) | Agents **developing** this repo | propose, plan-prompts, pr-open, pr-review |
| **`skills/tier-1/`** + **`skills/tier-2/`** (project root) | Agents **using** this tool on their own codebase | /callers, /routes, /explain-feature, /impact-of, etc. |
| **`skills/explore-codebase/`** (project root) | Agents **using** this tool on their own codebase | /explore-codebase — complete MCP operating manual |

`.agents/` skills are loaded by the agent working *on* java-codebase-rag source
code. `skills/` are shipped to consumers — they instruct an agent to call the
code. `skills/` is shipped to consumers — it instructs an agent to call the
MCP tools (`search`, `find`, `describe`, `neighbors`, `resolve`) against an
indexed Java codebase. Do not mix the two: never import consumer skills into
`.agents/skills/` or vice versa.
Expand Down Expand Up @@ -55,7 +55,7 @@ when needed.
- `docs/CODEBASE_REQUIREMENTS.md` — Java-repo assumptions and per-file map of
what to edit when a target tree doesn't match defaults.
- `tests/README.md` — testing philosophy.
- **`skills/tier-1/`** + **`skills/tier-2/`** — user-facing skills shipped to java-codebase-rag consumers. Tier 1 are single-intent listings (`/callers`, `/routes`, `/clients`, …); Tier 2 are multi-step workflows (`/explain-feature`, `/impact-of`, `/trace-request-flow`, `/mini-map`). Users opt in per tier. Developer workflow skills live in **`.agents/skills/`**, not here.
- **`skills/explore-codebase/`** — user-facing skill shipped to java-codebase-rag consumers. Single self-contained operating manual for the 5-tool MCP. Developer workflow skills live in **`.agents/skills/`**, not here.
- **`propose/`** — design proposes. **In-flight** proposes live in
**`propose/active/`**. **`propose/completed/`** — landed work and rationale.
**List or search this tree** for current filenames; do not rely on enumerated
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,9 +117,9 @@ See [`mcp.json.example`](./mcp.json.example) for the same shape in `.mcp.json` (

Pick **one** of two options (not both — they cover the same navigation intents):

1. **[`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md)** (recommended for most) — standalone MCP operating manual. Copy-paste the `BEGIN`/`END` block into your project's `QWEN.md`, `CLAUDE.md`, or `AGENTS.md`. Contains: five-tool reference, `NodeFilter` / edge taxonomy, ontology glossary, recovery playbook, and inline slash-style aliases (`/callers`, `/callees`, `/routes`, etc.) as prompt templates. Self-contained — no external file dependencies.
1. **[`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md)** (recommended for most) — standalone MCP operating manual. Copy-paste the `BEGIN`/`END` block into your project's `QWEN.md`, `CLAUDE.md`, or `AGENTS.md`. Contains: five-tool reference, `NodeFilter` / edge taxonomy, ontology glossary, recovery playbook, and navigation patterns. Self-contained — no external file dependencies.

2. **[`skills/`](./skills/)** (for hosts with skill discovery) — 15 shipped `SKILL.md` files. If your MCP host supports skill discovery (Claude Code, Qwen Code, Cursor), the same navigation intents are available as discoverable `/` commands. Tier 1 = deterministic MCP chains (`/callers`, `/callees`, `/routes`, `/controllers`, `/clients`, `/producers`, `/handlers`, `/who-hits-route`, `/implements`, `/injects`, `/nl`). Tier 2 = bounded workflows (`/explain-feature`, `/impact-of`, `/trace-request-flow`, `/mini-map`). See [`skills/README.md`](./skills/README.md) for the full index.
2. **[`/explore-codebase`](./skills/explore-codebase/SKILL.md)** (for hosts with skill discovery) — single self-contained skill with the complete operating manual. If your MCP host supports skill discovery (Claude Code, Qwen Code, Cursor), load `/explore-codebase` to get the full tool reference, edge taxonomy, decision tree, and recovery playbook in one shot.

Also: **[`docs/MANUAL-VERIFICATION-CHECKLIST.md`](./docs/MANUAL-VERIFICATION-CHECKLIST.md)** — 7-phase agent-driven verification you run after indexing your real project.

Expand All @@ -139,7 +139,7 @@ Full schemas, `NodeFilter` / `EdgeFilter` semantics, and the hints contract live

### Three-layer architecture

Layer 1 (storage) → Layer 2 (5 MCP tools) → Layer 3 (skills). Navigation skills in [`skills/`](./skills/) wrap the MCP tools into deterministic chains (Tier 1) and bounded workflows (Tier 2). See the [architecture diagram in `skills/README.md`](./skills/README.md#three-layer-architecture).
Layer 1 (storage) → Layer 2 (5 MCP tools) → Layer 3 (skill). The [`/explore-codebase`](./skills/explore-codebase/SKILL.md) skill provides the full operating manual for Layer 2. See the [architecture diagram in `skills/README.md`](./skills/README.md#three-layer-architecture).

---

Expand Down Expand Up @@ -182,7 +182,7 @@ Run `java-codebase-rag --help` to list grouped subcommands. Operator playbook wi
| [`docs/CONFIGURATION.md`](./docs/CONFIGURATION.md) | Environment variables, project YAML, graph ontology, brownfield overrides, ignore patterns. |
| [`docs/JAVA-CODEBASE-RAG-CLI.md`](./docs/JAVA-CODEBASE-RAG-CLI.md) | CLI operator playbook: workflows, exit codes, env alignment. |
| [`docs/EDGE-NAVIGATION.md`](./docs/EDGE-NAVIGATION.md) | MCP-traversable edges, directions, dot-key composition. |
| [`skills/`](./skills/) | 15 navigation and workflow skills for hosts with skill discovery (alternative to copy-pasting AGENT-GUIDE). See [`skills/README.md`](./skills/README.md). |
| [`skills/`](./skills/) | Single `/explore-codebase` skill — complete MCP operating manual for hosts with skill discovery (alternative to copy-pasting AGENT-GUIDE). See [`skills/README.md`](./skills/README.md). |
| [`docs/MANUAL-VERIFICATION-CHECKLIST.md`](./docs/MANUAL-VERIFICATION-CHECKLIST.md) | 7-phase agent-driven verification after indexing your project. |
| [`docs/CODEBASE_REQUIREMENTS.md`](./docs/CODEBASE_REQUIREMENTS.md) | Assumptions about your Java repo + per-file edit map for non-conforming codebases. |
| [`automation/cursor_propose_only/README.md`](./automation/cursor_propose_only/README.md) | Optional proposal orchestration workflow (single-command autopilot, planning bundles, automated execution/review loops). |
Expand Down
6 changes: 3 additions & 3 deletions docs/paper/paper.tex
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ \section{Inspirations}

\paragraph{Model Context Protocol.} The MCP standard \cite{anthropic2024mcp} fixed the impedance mismatch between agents and tools: a single transport, a single way to declare schemas, and a single way to bind tools to hosts (Claude Code, Cursor, Qwen Code, and others). Without MCP this report would describe a Claude-Code-only system. With MCP, a single Python server reaches every host the user already prefers. The standard does one thing well and stops.

\paragraph{The agent-skills layer.} Anthropic's agent skills \cite{anthropic2025skills} provided the missing piece between raw tool calls and agent reasoning: a skill is a slash-invokable, declaratively-described chain of tool calls that encodes a recurring intent ("trace this request flow", "show me callers of this method"). Skills are how a small fixed MCP surface grows into hundreds of usable agent intents without growing the tool count. We describe the planned skills layer briefly in \S\ref{sec:future} and defer its specification to a separate document; empirical testing showed that a comprehensive prose guide (mirrored as \texttt{docs/AGENT-GUIDE.md}) is sufficient for current weak-model performance, so the skills layer is not yet on the critical path.
\paragraph{The agent-skills layer.} Anthropic's agent skills \cite{anthropic2025skills} provided the missing piece between raw tool calls and agent reasoning: a skill is a slash-invokable, declaratively-described chain of tool calls that encodes a recurring intent ("trace this request flow", "show me callers of this method"). Skills are how a small fixed MCP surface grows into hundreds of usable agent intents without growing the tool count. Empirical testing showed that a single comprehensive skill (\texttt{/explore-codebase}) loaded at query time outperforms a large set of narrow per-intent skills, because the agent retains the full decision tree and recovery context rather than operating from a sliced subset.

\paragraph{What we are not.} We do not claim novelty over GraphRAG, LightRAG, or LSP-backed tooling. We claim that a particular synthesis --- minimal MCP surface, typed property graph, three-primitive navigation model, agent-shaped affordances --- is the right shape for code intelligence at the agentic-development layer. The synthesis is the contribution.

Expand Down Expand Up @@ -194,7 +194,7 @@ \subsection{Layer 3: reason (the agent)}

Layer 3 is whatever MCP-compatible host the developer prefers --- Claude Code, Qwen Code, Cursor, or another runtime. The host loads the java-codebase-rag MCP server, sees the five tools, and the agent reasons over them. There is no logic in this layer that is specific to java-codebase-rag; the entire affordance is the five tools and a prose agent guide (\texttt{docs/AGENT-GUIDE.md}) that documents the canonical workflows --- forced reasoning preamble, decision tree, edge taxonomy, worked examples.

A planned addition (deferred) is a thin skills layer that turns recurring intents (\texttt{/callers}, \texttt{/routes}, \texttt{/explain-feature}) into one-line slash invocations that compile to MCP-call chains. Empirical testing on the target codebase showed that the prose guide alone is sufficient for current weak-model accuracy, so the skills layer is not yet implemented.
A single skill (\texttt{/explore-codebase}) wraps the full operating manual --- edge taxonomy, \texttt{NodeFilter} reference, decision tree, recovery playbook --- into one loadable prompt. Empirical testing on the target codebase showed that a comprehensive prose guide loaded as one skill outperforms a large set of narrow per-intent skills; the agent retains the full decision tree and recovery context rather than operating from a sliced subset.

% =============================================================================
\section{Agent workflow}
Expand Down Expand Up @@ -248,7 +248,7 @@ \section{Future work}
\label{sec:future}

{\sloppy
Three threads are open and prioritised. \textbf{(1) Real-codebase evaluation.} Testing on a large legacy Java microservice estate is in progress; once stable, we expect to publish accuracy numbers (intent $\to$ correct-tool-chain rate, end-to-end answer correctness against human labels) for the five-tool surface against weak (Qwen) and strong (Claude Sonnet 4.5) hosts. \textbf{(2) Skills layer.} A 13-skill set --- 10 single-call navigation skills (\texttt{/callers}, \texttt{/callees}, \texttt{/routes}, \texttt{/controllers}, \ldots) and 3 multi-step workflow skills (\texttt{/explain-feature}, \texttt{/impact-of}, \texttt{/trace-request-flow}) --- is designed and on hold until the prose-guide approach shows insufficient. \textbf{(3) Tier-2 incremental rebuilds.} Today the index rebuilds the affected modules; we want commit-level incremental rebuilds for sub-second index updates on large monorepos.
Three threads are open and prioritised. \textbf{(1) Real-codebase evaluation.} Testing on a large legacy Java microservice estate is in progress; once stable, we expect to publish accuracy numbers (intent $\to$ correct-tool-chain rate, end-to-end answer correctness against human labels) for the five-tool surface against weak (Qwen) and strong (Claude Sonnet 4.5) hosts. \textbf{(2) Lightweight skills.} A lightweight \texttt{/search-codebase} skill (search + find + shallow neighbors + host glob) is planned as a low-context-cost alternative to the comprehensive \texttt{/explore-codebase} skill for quick lookups. \textbf{(3) Tier-2 incremental rebuilds.} Today the index rebuilds the affected modules; we want commit-level incremental rebuilds for sub-second index updates on large monorepos.
\par}

We deliberately list \emph{no} item that would grow the MCP tool count without proof that the existing five tools cannot accommodate a real intent.
Expand Down
Loading
Loading