Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions .cursor/rules/agent-workflow.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
description: How agents should investigate, edit, validate, and ship in this repo.
alwaysApply: true
---

# Agent workflow

## Investigate before editing

For any non-trivial change, read the relevant doc first instead of
inferring from code:

- Behaviour / public surface → `README.md`.
- Brownfield assumptions, role/capability tuning → `CODEBASE_REQUIREMENTS.md`.
- Why current design exists → `propose/completed/` and `plans/completed/`.
- Testing philosophy → `tests/README.md`.

## Propose-then-implement culture

The repo has a strong "propose then implement" culture
(`propose/`, `plans/`). For non-trivial features:

1. Drop a short markdown propose under `propose/` describing scope,
schema impact, reindex requirement, and tests touched.
2. Reference it from the PR description.
3. Move it into `propose/completed/` (or `plans/completed/`) once
landed.

Skip this for clearly-bounded fixes (one-file bugs, doc edits, test
loosening). Use judgement.

## Editing rules

- Respect `.cursor/rules/breaking-changes.mdc`: no compatibility
shims, no deprecation cycles.
- One source of truth for roles and capabilities lives in
`java_ontology.py`. Don't sprinkle role / capability string
literals across other modules.
- Schema changes that affect the Lance index or Kuzu graph need a
matching update to the README "Re-index required" callout. Bump
`ontology_version` when enrichment semantics change.
- `server.py` is a stdio MCP server: anything reachable from a tool
handler must not write to **stdout** (that's the JSON-RPC
transport). Diagnostics go to stderr.
- Tool `description=` strings and `_INSTRUCTIONS` in `server.py` are
read by LLM clients to choose tools — treat them as part of the
contract, not freeform docs.

## Validate

- `ruff check .` — fix or justify warnings.
- `pytest tests -v` — must pass without `LANCEDB_MCP_RUN_HEAVY`.
- For schema or ranking work, also run with
`LANCEDB_MCP_RUN_HEAVY=1` locally (slow; downloads models).

## Commit and PR

- Commit messages: present tense, imperative, lowercase first word,
matching existing style (e.g. `fixed call graph review D6`,
`applied fixes for call graph layer`).
- One logical change per commit when feasible.
- Branch names: `cursor/<topic>` for cursor-agent work, `plan/<name>`
for in-progress proposes (matching existing `plan/tier1-completion`).
- PR body should reference any propose it implements, list
user-visible behaviour changes, and call out reindex / env-var
requirements explicitly.
- Never push directly to `master`.

## Don't

- Don't run `gh auth status` or otherwise inspect credentials.
- Don't widen the public surface "just in case" — every new tool,
env var, or schema column adds a re-index burden on users.
- Don't special-case the `tests/bank-chat-system/` fixture in
production code. If a test needs it, the test is wrong (see
`tests/README.md`).
- Don't tighten loose test assertions (`>= 1`, `len(...) >= N`,
`key in result`) into exact counts to chase a number — they are
intentionally loose.
- Don't add a hard dependency on `cocoindex` outside
`java_index_flow_lancedb.py` / the `refresh_code_index` tool.
55 changes: 55 additions & 0 deletions .cursor/rules/project-overview.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
description: Project map and where to look for what. Always-on so agents start with the right mental model.
alwaysApply: true
---

# Project overview

This repo is a **self-contained stdio MCP server** that serves
semantic + structural search over a Java codebase. It is a Python
project (the indexer and server). It is **not** a Java project —
the `tests/bank-chat-system/` tree is fixture data, not code to
modify.

Treat README and the markdown docs as the source of truth for
behaviour, schemas, env vars, ranking, edges, tool defaults, and
ontology. **Do not copy that content into rules** — read it directly
when needed.

## Where to look

- `README.md` — feature surface, env vars, ranking, capabilities,
tool list, "Re-index required" callouts.
- `CODEBASE_REQUIREMENTS.md` — Java-repo assumptions and per-file
map of what to edit when a target tree doesn't match defaults.
- `tests/README.md` — testing philosophy.
- `propose/` and `propose/completed/` — design proposes; the active
ones describe in-flight scope and the completed ones explain *why*
current code looks the way it does.
- `plans/` and `plans/completed/` — longer-form plans (e.g.
capabilities model, tier completions).
- `.cursor/rules/breaking-changes.mdc` — the no-back-compat policy.

## File map (top of repo)

| File | Role |
|------|------|
| `server.py` | MCP stdio server. Every `@mcp.tool` lives here. |
| `search_lancedb.py` | Vector / hybrid / graph-expanded search; ranking. |
| `build_ast_graph.py` | Tree-sitter -> Kuzu graph builder (full rebuild). |
| `kuzu_queries.py` | Read-only Cypher helpers used by the server. |
| `ast_java.py` | Tree-sitter Java parsing, role/capability inference. |
| `graph_enrich.py` | `module` / `microservice` resolution, brownfield overrides, meta-annotation walk. |
| `java_ontology.py` | Source of truth for `VALID_ROLES` / `VALID_CAPABILITIES`. |
| `chunk_heuristics.py` | Query-time chunk hints (no AST / no re-index). |
| `index_common.py` | Embedding config (no CocoIndex dep). |
| `java_index_flow_lancedb.py` | CocoIndex flow (only used by `refresh_code_index`). |
| `java_index_v1_common.py` | Shared file walker / exclude patterns. |
| `mcp.json.example` | Template for `.mcp.json`. |

## Test layout

- `tests/conftest.py` — session-scoped Kuzu graph fixture.
- `tests/bank-chat-system/` — deterministic Java corpus (fixture, not production model).
- `tests/fixtures/call_graph_smoke/` — mini Maven tree calibrated against the call-graph resolver.
- Heavy e2e tests gated behind `LANCEDB_MCP_RUN_HEAVY=1`.
42 changes: 42 additions & 0 deletions .cursorignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Local runtime / index artifacts (huge, binary, regenerated)
lancedb_data/
cocoindex_java_lance.db/
*.kuzu
*.kuzu.wal

# Python caches
__pycache__/
*.py[cod]
*$py.class
.pytest_cache/
.mypy_cache/
.ruff_cache/
.coverage
.coverage.*
htmlcov/

# Virtual envs
.venv/
venv/
env/
ENV/

# Build artifacts
build/
dist/
*.egg-info/
.eggs/

# IDE / OS
.idea/
.vscode/
.DS_Store

# Local env
.env
.env.*
tmp/

# Test fixture compiled output (the .java sources stay searchable)
tests/**/target/
tests/**/build/
45 changes: 45 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# AGENTS.md

Entry point for Cursor CLI agents (and other agentic tools) working
on this repo. Detailed guidance lives in `.cursor/rules/*.mdc` —
those files are auto-loaded by Cursor. This file is a flat summary
for tools that don't read `.cursor/rules/`.

## Where to look

- `README.md` — feature surface, env vars, ranking, capabilities,
tool list, "Re-index required" callouts.
- `CODEBASE_REQUIREMENTS.md` — Java-repo assumptions and tuning map.
- `propose/` and `plans/` (plus their `completed/` subdirs) —
in-flight scope and the rationale behind current design.
- `tests/README.md` — testing philosophy.

Read these directly. Don't rely on rule files to mirror them.

## Hard rules

1. **No backward-compatibility obligation** —
`.cursor/rules/breaking-changes.mdc`. Prefer removals and schema
updates over shims.
2. **Propose-then-implement** for non-trivial features. Drop a short
markdown propose under `propose/`, reference it from the PR, move
it to `propose/completed/` once landed.
3. **Don't overfit to the `tests/bank-chat-system/` fixture.** It is
a deterministic corpus, not a model of production. Assert on
invariants, not exact counts. Don't special-case the fixture in
production code.
4. **`server.py` is stdio MCP.** Nothing reachable from a tool
handler may write to stdout. Diagnostics go to stderr.
5. **Single source of truth** for roles and capabilities is
`java_ontology.py`. No string literals sprinkled elsewhere.
6. **Schema changes require a reindex** — update the README
"Re-index required" callout and bump `ontology_version` when
enrichment semantics change.

## Workflow

- Branch from `master`. Branch names: `cursor/<topic>` (CLI work),
`plan/<name>` (in-progress propose).
- Commit messages: present tense, imperative, lowercase first word.
- Always open a PR; never push to `master`.
- Run `ruff check .` and `pytest tests -v` before pushing.