Skip to content

docs: add CodeGraph OpenClaw example tutorial#87

Merged
BingqingLyu merged 3 commits intoalibaba:mainfrom
BingqingLyu:codegraph_example
Mar 19, 2026
Merged

docs: add CodeGraph OpenClaw example tutorial#87
BingqingLyu merged 3 commits intoalibaba:mainfrom
BingqingLyu:codegraph_example

Conversation

@BingqingLyu
Copy link
Collaborator

@BingqingLyu BingqingLyu commented Mar 19, 2026

Adds a CodeGraph use case tutorial to the documentation.

Changes

  • Added doc/source/tutorials/codegraph-openclaw-example.md: end-to-end walkthrough of CodeGraph with the OpenClaw codebase, covering indexing, CLI usage, Python API, hotspot/bridge/dead-code analysis, and semantic search
  • Registered the new page in doc/source/index.rst under the Tutorials section

Greptile Summary

This PR adds an end-to-end tutorial (codegraph-openclaw-example.md) for the CodeGraph skill — a code analysis tool built on NeuG and a vector database — and registers it in the documentation's Tutorials toctree. The tutorial walks through installation, CLI usage, Python API queries, and built-in analysis methods (hotspots, bridge functions, dead code, semantic search) using the OpenClaw codebase as a concrete example.

Key issues found:

  • Factual inconsistency in hotspot ranking (line 229): The section states hotspots are "ranked by fan-in × fan-out," but the example output places functions with fan_out=0 (e.g., push with fan_in=1747, fan_out=0 → product=0) at the very top, which contradicts the stated formula. The ranking criterion should be corrected or the formula description should be updated.
  • Semantic search output is opaque (line 323): The example only prints truncated vector IDs and scores. Readers have no way to identify which functions matched from the output alone — function names and file paths should be shown.
  • Dead code filter note lacks a code example (line 317): The note recommends filtering by is_external = 0 to exclude vendored/virtual-env files, but provides no snippet demonstrating how to apply that filter in practice.

Confidence Score: 3/5

  • Safe to merge after addressing the hotspot ranking factual inconsistency; the other two issues are polish-level improvements.
  • The index.rst change is trivial and correct. The tutorial itself is well-structured and covers the feature thoroughly, but the hotspot description contains a demonstrable factual error (items with fan_out=0 cannot rank first under a fan-in × fan-out formula), which would mislead readers about how the scoring actually works. The semantic search and dead-code issues are usability gaps rather than blockers.
  • doc/source/tutorials/codegraph-openclaw-example.md — specifically the hotspot ranking description and the semantic search output example.

Important Files Changed

Filename Overview
doc/source/tutorials/codegraph-openclaw-example.md New 353-line tutorial for CodeGraph with the OpenClaw codebase. Contains a factual inconsistency in the hotspot ranking description (claims fan-in × fan-out but the example output contradicts this), and the semantic search example output shows only opaque IDs without resolving them to function names.
doc/source/index.rst One-line addition registering the new tutorial in the Tutorials toctree. Change is correct and consistent with the existing entry format.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Source Repository] -->|codegraph init| B[Indexing Pipeline]
    B --> C[(NeuG Graph DB\nFile, Function, Class,\nModule, Commit nodes)]
    B --> D[(zvec Vector DB\nFunction Embeddings)]

    C & D --> E[CodeGraph API / CLI]

    E --> F[CLI Commands]
    F --> F1[codegraph status]
    F --> F2[codegraph query NL]
    F --> F3[codegraph analyze]

    E --> G[Python API — CodeScope]
    G --> G1[cs.conn.execute\nCypher Queries]
    G --> G2[cs.hotspots\nfan-in × fan-out ranking]
    G --> G3[cs.bridge_functions\ncross-module callers]
    G --> G4[cs.dead_code\nzero-caller functions]
    G --> G5[cs.vector_only_search\nsemantic similarity]

    G1 --> H[Call Chain / Impact Analysis]
    G2 --> I[Architecture Hotspots]
    G3 --> J[Bridge Functions Report]
    G4 --> K[Dead Code Report]
    G5 --> L[Semantic Search Results]
Loading

Last reviewed commit: "add a codegraph exam..."

Greptile also left 3 inline comments on this PR.

@qodo-code-review
Copy link

Review Summary by Qodo

Add CodeGraph tutorial with OpenClaw codebase example

📝 Documentation

Grey Divider

Walkthroughs

Description
• Adds comprehensive CodeGraph tutorial with OpenClaw codebase example
• Demonstrates CLI usage, Python API, and built-in analysis methods
• Includes practical examples of hotspot, bridge, and dead code analysis
• Registers tutorial in documentation index under Tutorials section
Diagram
flowchart LR
  A["Documentation Index"] -->|registers| B["CodeGraph Tutorial"]
  B -->|covers| C["CLI Usage"]
  B -->|covers| D["Python API"]
  B -->|covers| E["Analysis Methods"]
  E -->|includes| F["Hotspots"]
  E -->|includes| G["Bridge Functions"]
  E -->|includes| H["Dead Code Detection"]
  E -->|includes| I["Semantic Search"]
Loading

Grey Divider

File Changes

1. doc/source/tutorials/codegraph-openclaw-example.md 📝 Documentation +353/-0

CodeGraph tutorial with OpenClaw analysis examples

• New comprehensive tutorial covering CodeGraph capabilities and architecture
• Includes environment setup, indexing instructions, and CLI usage examples
• Demonstrates Python API with real OpenClaw codebase examples
• Documents built-in analysis methods: hotspots, bridge functions, dead code, semantic search
• Provides Cypher query templates for common code analysis patterns

doc/source/tutorials/codegraph-openclaw-example.md


2. doc/source/index.rst 📝 Documentation +1/-0

Register CodeGraph tutorial in documentation index

• Registers new CodeGraph tutorial in documentation index
• Added under Tutorials section alongside existing tinysnb_tutorial

doc/source/index.rst


Grey Divider

Qodo Logo

@qodo-code-review
Copy link

qodo-code-review bot commented Mar 19, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0) 📐 Spec deviations (0)

Grey Divider


Action required

1. No MODIFIES backfill enabled 🐞 Bug ✓ Correctness
Description
The tutorial’s codegraph init command omits --backfill-limit, so function-level MODIFIES edges
are never computed and evolution features relying on them won’t work (your own sample output shows
MODIFIES: 0). This contradicts the documented requirement that MODIFIES edges require backfill.
Code

doc/source/tutorials/codegraph-openclaw-example.md[R52-58]

+```bash
+# Create index (first time)
+codegraph init --repo /path/to/your/project --lang auto --commits 100
+
+# Check index status
+codegraph status --db $CODESCOPE_DB_DIR
+```
Evidence
The tutorial’s indexing command doesn’t include backfill and the shown status output confirms there
are zero MODIFIES edges. Repo docs explicitly state MODIFIES edges require backfill /
--backfill-limit.

doc/source/tutorials/codegraph-openclaw-example.md[52-58]
doc/source/tutorials/codegraph-openclaw-example.md[74-78]
skills/codegraph/SKILL.md[53-62]
skills/codegraph/schema.md[24-31]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The tutorial instructs running `codegraph init` without `--backfill-limit`, which prevents generating function-level `MODIFIES` edges and breaks evolution workflows that depend on them.

### Issue Context
Repository CodeGraph docs state `MODIFIES` edges require backfill and the CLI supports this via `--backfill-limit`.

### Fix Focus Areas
- doc/source/tutorials/codegraph-openclaw-example.md[52-58]
- doc/source/tutorials/codegraph-openclaw-example.md[74-78]

### What to change
- Update the `codegraph init` example to include an explicit `--backfill-limit` value (or add an adjacent note explaining that without backfill `MODIFIES` remains 0 and evolution queries will be limited).
- Ensure the sample `codegraph status` output is consistent with the updated command (either show non-zero MODIFIES, or explicitly explain why it may be 0).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. CodeScope never closed 🐞 Bug ⛯ Reliability
Description
The Python API section creates a CodeScope instance but never shows calling cs.close(),
contradicting existing CodeGraph docs that say to always close it. This omission encourages
copy/paste usage that can leave resources open and contribute to lock-related errors.
Code

doc/source/tutorials/codegraph-openclaw-example.md[R137-143]

+```python
+import os
+os.environ['HF_HUB_OFFLINE'] = '1'
+
+from codegraph.core import CodeScope
+cs = CodeScope(os.environ['CODESCOPE_DB_DIR'])
+```
Evidence
The new tutorial instantiates CodeScope and proceeds with many examples without ever demonstrating
closing it. Existing CodeGraph documentation includes cs.close()  # always close when done and
lists lock-related errors in troubleshooting, reinforcing that proper cleanup matters.

doc/source/tutorials/codegraph-openclaw-example.md[137-143]
doc/source/tutorials/codegraph-openclaw-example.md[225-337]
skills/codegraph/SKILL.md[74-90]
skills/codegraph/bug-analysis.md[15-31]
skills/codegraph/SKILL.md[374-381]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The tutorial demonstrates creating a `CodeScope` object but never demonstrates closing it. Existing repository docs explicitly instruct calling `cs.close()`.

### Issue Context
The Python section is structured as a shared setup plus multiple subsequent snippets, so the most natural fix is to add a closing snippet at the end of the Python API section (or show a `try/finally` pattern).

### Fix Focus Areas
- doc/source/tutorials/codegraph-openclaw-example.md[137-143]
- doc/source/tutorials/codegraph-openclaw-example.md[225-337]

### What to change
- Add a short snippet near the end of the Python API section:
 - `cs.close()` (and/or `try: ... finally: cs.close()`), matching the style in `skills/codegraph/SKILL.md`.
- Optionally add a one-line note that the connection should be closed when finished.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines +229 to +260
High-risk functions ranked by fan-in × fan-out:

```python
for h in cs.hotspots(topk=10):
print(f"{h.name} @ {h.file_path}")
print(f" fan_in={h.fan_in}, fan_out={h.fan_out}")
```

**Actual output:**

```
push @ ui/src/ui/chat/input-history.ts
fan_in=1747, fan_out=0
createConfigIO @ src/config/io.ts
fan_in=18, fan_out=57
fn @ extensions/diffs/assets/viewer-runtime.js
fan_in=533, fan_out=1
runEmbeddedPiAgent @ src/agents/pi-embedded-runner/run.ts
fan_in=14, fan_out=65
startGatewayServer @ src/gateway/server.impl.ts
fan_in=10, fan_out=88
now @ src/auto-reply/reply/export-html/template.security.test.ts
fan_in=857, fan_out=0
loadOpenClawPlugins @ src/plugins/loader.ts
fan_in=21, fan_out=36
runCronIsolatedAgentTurn @ src/cron/isolated-agent/run.ts
fan_in=11, fan_out=56
loadSessionStore @ src/config/sessions/store.ts
fan_in=60, fan_out=8
getReplyFromConfig @ src/auto-reply/reply/get-reply.ts
fan_in=20, fan_out=24
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Hotspot ranking formula contradicts the actual output

The section header states hotspots are "ranked by fan-in × fan-out," but several of the top-ranked results have fan_out=0, which would produce a score of 0 under that formula and should place them at the bottom — not the top:

  • push: fan_in=1747, fan_out=0 → product = 0
  • fn: fan_in=533, fan_out=1 → product = 533
  • now: fan_in=857, fan_out=0 → product = 0

createConfigIO (fan_in=18, fan_out=57 → 1,026) and startGatewayServer (10 × 88 = 880) would actually rank highest under the stated formula. Either the formula description is incorrect (the actual ranking may be something like max(fan_in, fan_out) or fan_in + fan_out), or the output was produced with different logic. This inconsistency will confuse readers trying to understand the hotspot scoring model.

Comment on lines +323 to +337
```python
results = cs.vector_only_search('heartbeat periodic wake agent schedule', topk=5)
for r in results:
print(f"id={r['id'][:20]}... score={r['score']:.3f}")
```

**Actual output:**

```
id=59744ec14e23575012c1... score=0.514
id=0b27570192377b7077cd... score=0.481
id=11fad68a6ba0d7fa0228... score=0.478
id=b33f6f3241c0a61d7118... score=0.477
id=8221fa3eb46b7e06e561... score=0.473
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Semantic search output shows only opaque IDs — function names are missing

The example prints truncated vector IDs and scores, but gives readers no way to identify which functions were matched:

id=59744ec14e23575012c1... score=0.514

As a tutorial, this output is not actionable — a user can't act on an ID alone without knowing the corresponding function name and file path. If the vector_only_search result dict contains those fields (e.g., name, file_path), the example should include them. For instance:

results = cs.vector_only_search('heartbeat periodic wake agent schedule', topk=5)
for r in results:
    print(f"{r.get('name', r['id'][:20])} @ {r.get('file_path', '?')} — score={r['score']:.3f}")

Even if the API only returns IDs, the tutorial should explain what a reader should do next (e.g., query the graph to resolve the ID to a function node).

...
```

> **Note**: Dead code detection may include external dependencies. Filter by `is_external = 0` for project-specific results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead-code filter note lacks a code example

The note advises filtering by is_external = 0 to exclude virtual-environment files, but the tutorial is code-driven and this is exactly the kind of practical pitfall a reader will hit first. Providing the actual query would make this note actionable:

# Filter out external/vendored dependencies
for d in cs.dead_code()[:10]:
    if not getattr(d, 'is_external', 0):
        print(f"{d.name} @ {d.file_path}")

Or, if dead_code() accepts a parameter:

for d in cs.dead_code(is_external=False)[:10]:
    print(f"{d.name} @ {d.file_path}")

Without a working snippet, the reader must guess the attribute name and the filtering mechanism.

@BingqingLyu BingqingLyu requested a review from longbinlai March 19, 2026 03:37
Comment on lines +52 to +58
```bash
# Create index (first time)
codegraph init --repo /path/to/your/project --lang auto --commits 100

# Check index status
codegraph status --db $CODESCOPE_DB_DIR
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. No modifies backfill enabled 🐞 Bug ✓ Correctness

The tutorial’s codegraph init command omits --backfill-limit, so function-level MODIFIES edges
are never computed and evolution features relying on them won’t work (your own sample output shows
MODIFIES: 0). This contradicts the documented requirement that MODIFIES edges require backfill.
Agent Prompt
### Issue description
The tutorial instructs running `codegraph init` without `--backfill-limit`, which prevents generating function-level `MODIFIES` edges and breaks evolution workflows that depend on them.

### Issue Context
Repository CodeGraph docs state `MODIFIES` edges require backfill and the CLI supports this via `--backfill-limit`.

### Fix Focus Areas
- doc/source/tutorials/codegraph-openclaw-example.md[52-58]
- doc/source/tutorials/codegraph-openclaw-example.md[74-78]

### What to change
- Update the `codegraph init` example to include an explicit `--backfill-limit` value (or add an adjacent note explaining that without backfill `MODIFIES` remains 0 and evolution queries will be limited).
- Ensure the sample `codegraph status` output is consistent with the updated command (either show non-zero MODIFIES, or explicitly explain why it may be 0).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Collaborator

@longbinlai longbinlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BingqingLyu BingqingLyu merged commit 06677fe into alibaba:main Mar 19, 2026
5 checks passed
@BingqingLyu BingqingLyu deleted the codegraph_example branch March 19, 2026 05:52
liulx20 pushed a commit to liulx20/neug that referenced this pull request Mar 20, 2026
Co-authored-by: Longbin Lai <longbin.lai@gmail.com>
longbinlai added a commit that referenced this pull request Mar 20, 2026
* add java sdk

* add test cases

* Update tools/java_driver/USAGE.md

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update tools/java_driver/USAGE.md

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix some issues

* add ClientTest

* update doc

* fix doc

* Update tools/java_driver/pom.xml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tools/java_driver/src/test/java/org/alibaba/neug/driver/InternalResultSetTest.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* format

* rename org to com

* fix doc

* add result metadata

* fix

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* add tests

* add doc

* add maven

* Update tools/java_driver/src/main/java/com/alibaba/neug/driver/utils/Client.java

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update tools/java_driver/src/main/java/com/alibaba/neug/driver/internal/InternalResultSet.java

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* add e2e ci

* add param test

* format

* Update tools/java_driver/src/main/java/com/alibaba/neug/driver/internal/InternalSession.java

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update InternalSession.java

* remove pb generated

* fix doc

* fix doc

* fix doc

* fix workflows

* fix version

* fix generator

* fix maven action

* fix: catch OSError in neug-cli readline history loading on macOS (#75)

* fix: catch OSError in neug-cli readline history loading on macOS

On macOS, Python's readline module is backed by libedit instead of GNU
readline. When ~/.neug_history was written by a GNU readline session
(e.g. from Docker/Linux), libedit raises OSError (errno 22 EINVAL)
instead of silently handling the incompatible format.

The original code only caught FileNotFoundError, causing neug-cli to
crash on startup. Broaden the exception handler to also catch OSError so
the history file is simply skipped, matching the intended behavior.

Fixes #74

* fix: scope OSError catch to errno.EINVAL for libedit incompatibility

Per greptile review: catching the full OSError base class could silently
swallow unrelated errors such as PermissionError or IsADirectoryError.
Narrow the catch to only suppress errno.EINVAL (22), which is the specific
error raised by macOS libedit when it encounters a GNU readline history
file. All other OSError variants are re-raised so users see genuine
problems.

Also add 'import errno' to top-level imports.

* Update tools/java_driver/src/main/java/com/alibaba/neug/driver/internal/InternalDriver.java

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix getBigDecimal

* Update tools/java_driver/src/main/java/com/alibaba/neug/driver/internal/InternalResultSet.java

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update tools/java_driver/src/main/java/com/alibaba/neug/driver/internal/InternalResultSet.java

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix getObject

* feat: Support Export Query Results to JSON/JSONL file (#60)

* support export arrow table to csv format

Committed-by: Xiaoli Zhou from Dev container

* export query response PB to csv format

Committed-by: Xiaoli Zhou from Dev container

* minor fix according to review

Committed-by: Xiaoli Zhou from Dev container

* fix according to review

Committed-by: Xiaoli Zhou from Dev container

* minor fix

Committed-by: Xiaoli Zhou from Dev container

* support export query results to json format

Committed-by: Xiaoli Zhou from Dev container

* minor fix

Committed-by: Xiaoli Zhou from Dev container

* remove 'newline_delimited' settings and detect jsonl format from path

Committed-by: Xiaoli Zhou from Dev container

Committed-by: Xiaoli Zhou from Dev container

Committed-by: Xiaoli Zhou from Dev container

Committed-by: Xiaoli Zhou from Dev container

* minor fix

Committed-by: Xiaoli Zhou from Dev container

* add export to json tests in CI

Committed-by: Xiaoli Zhou from Dev container

Committed-by: Xiaoli Zhou from Dev container

* Update extension/json/src/json_export_function.cc

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update extension/json/src/json_export_function.cc

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update extension/json/src/json_export_function.cc

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* minor fix

Committed-by: Xiaoli Zhou from Dev container

* minor fix

Committed-by: Xiaoli Zhou from Dev container

* refine extension tests anotation

Committed-by: Xiaoli Zhou from Dev container

* minor fix

Committed-by: Xiaoli Zhou from Dev container

* rename INSTALL_EXTENSIONS to CI_INSTALL_EXTENSIONS to avoid conflict

Committed-by: Xiaoli Zhou from Dev container

* refine json extension tests ci

Committed-by: Xiaoli Zhou from Dev container

* minor fix

Committed-by: Xiaoli Zhou from Dev container

Committed-by: Xiaoli Zhou from Dev container

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* remove bytearray

* add codegraph-qa skill (#78)

* fix: Fix default value support for all type of properties (#63)

Refactor the default value support for storage, avoid exposing default_value on column and mmap_array


---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: Fix incorrect edge table state when transforming between bundled and unbundled (#28)

Fix incorrect edge table state when transforming between bundled and unbundled, include special case for string properties

* fix: make the dedup operator cover all column types (#80)

* make dedup operator cover all column types

* format

* fix

* Correct the is_optional interface behavior for certain columns (#90)

* add a codegraph example (#87)

Co-authored-by: Longbin Lai <longbin.lai@gmail.com>

* add checkRowIndex

* add update_was_null

* update doc

* fix

* update doc

* fix

* Implement the iteration method for QueryResult

* update query_result.md

* update

* update doc

* format example

* format

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Longbin Lai <longbin.lai@gmail.com>
Co-authored-by: Xiaoli Zhou <yihe.zxl@alibaba-inc.com>
Co-authored-by: BingqingLyu <bingqing.lbq@alibaba-inc.com>
Co-authored-by: Zhang Lei <xiaolei.zl@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants