Skip to content

feat: add file relationship visualizer (codesearch graph)#137

Merged
ArtemisMucaj merged 16 commits into
mainfrom
claude/file-relationship-visualizer-65WfD
Apr 3, 2026
Merged

feat: add file relationship visualizer (codesearch graph)#137
ArtemisMucaj merged 16 commits into
mainfrom
claude/file-relationship-visualizer-65WfD

Conversation

@ArtemisMucaj
Copy link
Copy Markdown
Owner

@ArtemisMucaj ArtemisMucaj commented Mar 30, 2026

Introduces a new graph subcommand that builds a file-level dependency
graph from the indexed call graph and renders it in three formats:

  • DOT (Graphviz) — default; pipe through dot -Tsvg to produce
    an SVG with repository cluster boundaries.
  • Mermaid — paste into any Mermaid renderer or GitHub markdown.
  • JSON — structured output for programmatic consumption.

Each repository is drawn as a labelled cluster boundary. Edges are
weighted by the number of distinct symbol references from the source
file to the target file.

New pieces:

  • FileEdge / FileGraph / FileGraphRepo domain models
    (src/domain/models/file_graph.rs)
  • VectorRepository::get_symbol_to_file_map port method (default
    no-op) with concrete implementations in DuckdbVectorRepository
    and InMemoryVectorRepository
  • FileRelationshipUseCase — resolves callee symbols to their
    defining files via a symbol-name → file lookup and aggregates
    symbol references into weighted file edges
  • FileGraphController — DOT / Mermaid / JSON renderers
  • Graph CLI subcommand with --repository, --format,
    --min-weight, and --cross-repo flags
  • Graph is opened in read-only DB mode (no write lock needed)

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a

Summary by CodeRabbit

  • New Features
    • Graph command: visualize file-level dependencies (HTML interactive, DOT, Mermaid, JSON). Supports file/directory granularity, clustering modes (none/directory/consumer), repository filtering, minimum edge-weight, optional cross-repo inclusion, deterministic sorting, and a friendly message for empty graphs; interactive HTML includes repo sidebar, search, tooltips and a weight slider.
    • Split command: analyzes repo split candidates and returns JSON or interactive HTML with groups, consumers, stats, and interactive filtering/visualization.

claude added 3 commits March 30, 2026 05:31
Introduces a new `graph` subcommand that builds a file-level dependency
graph from the indexed call graph and renders it in three formats:

- **DOT** (Graphviz) — default; pipe through `dot -Tsvg` to produce
  an SVG with repository cluster boundaries.
- **Mermaid** — paste into any Mermaid renderer or GitHub markdown.
- **JSON** — structured output for programmatic consumption.

Each repository is drawn as a labelled cluster boundary.  Edges are
weighted by the number of distinct symbol references from the source
file to the target file.

New pieces:
- `FileEdge` / `FileGraph` / `FileGraphRepo` domain models
  (`src/domain/models/file_graph.rs`)
- `VectorRepository::get_symbol_to_file_map` port method (default
  no-op) with concrete implementations in `DuckdbVectorRepository`
  and `InMemoryVectorRepository`
- `FileRelationshipUseCase` — resolves callee symbols to their
  defining files via a symbol-name → file lookup and aggregates
  symbol references into weighted file edges
- `FileGraphController` — DOT / Mermaid / JSON renderers
- `Graph` CLI subcommand with `--repository`, `--format`,
  `--min-weight`, and `--cross-repo` flags
- `Graph` is opened in read-only DB mode (no write lock needed)

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
…o flag

Cross-repository edges are now unconditionally included in the graph
output. The `--repository` flag already scopes which repos are shown,
so a separate toggle for cross-repo edges added unnecessary complexity.

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
Introduces a new --cluster flag with three modes:

  none (default)
    Files are listed flat inside each repository cluster — same as
    before, no change to existing behaviour.

  directory
    Files are grouped into nested sub-clusters by their parent
    directory.  A shared library with src/auth/, src/core/, and
    src/cache/ will show three labelled sub-boxes inside its
    repository boundary.

  consumer
    Files in each repository are grouped by which *other* repositories
    depend on them.  Sub-clusters are labelled "← Repo A" or
    "← Repo A, Repo B", making it easy to see exactly which slice of a
    shared library each upstream consumer actually pulls in.  Files
    with no cross-repo incoming edges are grouped as "internal".

Both DOT and Mermaid renderers support all three modes via nested
subgraph blocks.

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds file-level dependency graph and split-analysis: new domain models (FileGraph/FileEdge/SplitAnalysis), a FileRelationshipUseCase that maps symbols to files and aggregates call-graph references into weighted file edges, a SplitAnalysisUseCase to derive extraction groups, VectorRepository symbol→file lookup, CLI/controller wiring, renderers (HTML/DOT/Mermaid/JSON), and container/router integration.

Changes

Cohort / File(s) Summary
Domain models
src/domain/models/file_graph.rs, src/domain/models/split_analysis.rs, src/domain/models/mod.rs
Add FileGraph/FileEdge/FileGraphRepo and SplitAnalysis types (serde derives, helpers); re-export module.
Use cases
src/application/use_cases/file_relationship.rs, src/application/use_cases/split_analysis.rs, src/application/use_cases/mod.rs
Add FileRelationshipUseCase::build_graph(...) to aggregate symbol→file maps and call-graph refs into weighted file edges; add SplitAnalysisUseCase to compute split groups; register new submodules.
VectorRepository trait & adapters
src/application/interfaces/vector_repository.rs, src/connector/adapter/duckdb_vector_repository.rs, src/connector/adapter/in_memory_vector_repository.rs
Add async trait method get_symbol_to_file_map(repository_id) with default no-op; implement in DuckDB (SQL query) and in-memory adapter (locked filter).
API controllers & routing
src/connector/api/controller/file_graph_controller.rs, src/connector/api/controller/split_controller.rs, src/connector/api/controller/mod.rs, src/connector/api/router.rs, src/connector/api/container.rs
Add FileGraphController and SplitController with formatters and HTML templates; container exposes FileRelationshipUseCase; router and container wiring updated to route Graph/Split commands.
CLI & wiring
src/cli/mod.rs, src/main.rs
Add enums NodeGranularity, ClusterMode, GraphFormat; add Commands::Graph (and Graph options) and treat graph command as read-only in main.
Library surface
src/lib.rs
Expand public re-exports to include new use cases, domain models, and CLI enums.
UI / Templates / Helpers
src/connector/api/... (controller formatters and templates)
Large additions of HTML/JS templates and format-specific renderers (Sigma.js/graphology integration), ID/label helpers, and clustering/aggregation logic.

Sequence Diagram

sequenceDiagram
    participant User as User/CLI
    participant Router as Router
    participant Controller as FileGraphController
    participant UseCase as FileRelationshipUseCase
    participant VectorRepo as VectorRepository
    participant MetadataRepo as MetadataRepository
    participant CallGraph as CallGraphUseCase

    User->>Router: Commands::Graph { repository, format, granularity, min_weight, cluster }
    Router->>Controller: graph(repository, format, granularity, min_weight, cluster)
    Controller->>UseCase: build_graph(repository_ids, min_weight, include_cross_repo)
    UseCase->>MetadataRepo: list()
    UseCase->>VectorRepo: get_symbol_to_file_map(repository_id) [for each repo]
    VectorRepo-->>UseCase: Vec<(symbol, file_path)>
    UseCase->>CallGraph: find_by_repository(repository_id) [for each repo]
    CallGraph-->>UseCase: Vec<SymbolReference>
    UseCase->>UseCase: Aggregate symbol→file, build weighted edges, filter/sort
    UseCase-->>Controller: FileGraph { repositories, files, edges }
    Controller->>Controller: Optionally aggregate to directories / cluster
    Controller-->>Router: Formatted graph string (HTML/DOT/Mermaid/JSON)
    Router-->>User: Output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰
I hopped through symbols, files in tow,
Edges stitched where references go,
Repos cluster, directories sing,
DOT and Mermaid—what joy they bring!
A tiny rabbit cheers the graphing show.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add file relationship visualizer (codesearch graph)' accurately describes the main change: introducing a new file-level dependency graph visualization feature with a graph subcommand.
Docstring Coverage ✅ Passed Docstring coverage is 80.85% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/file-relationship-visualizer-65WfD

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
src/application/interfaces/vector_repository.rs (1)

88-101: Doc comment is misleading about in-memory implementation.

The comment at line 93 states "e.g. the in-memory mock" doesn't support this operation, but InMemoryVectorRepository actually overrides this method with a proper implementation (see src/connector/adapter/in_memory_vector_repository.rs:159-173). Consider updating the doc to reflect which adapters truly rely on the default no-op.

📝 Suggested doc fix
-    /// Used by the file-relationship graph to map callee symbol names back to the
-    /// source files that define them.  The default no-op returns an empty list so
-    /// adapters that do not support this operation (e.g. the in-memory mock) still
-    /// compile without changes.
+    /// Used by the file-relationship graph to map callee symbol names back to the
+    /// source files that define them.  The default no-op returns an empty list so
+    /// new adapters can compile without immediately implementing this method.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/application/interfaces/vector_repository.rs` around lines 88 - 101,
Update the doc comment on get_symbol_to_file_map to remove or correct the
misleading example claiming the in-memory mock doesn't support this operation;
reference the method name get_symbol_to_file_map and the
InMemoryVectorRepository override (in_memory_vector_repository.rs) and either
remove the "in-memory mock" example or replace it with accurate adapters that
actually rely on the default no-op, so the docs reflect that
InMemoryVectorRepository provides a real implementation.
src/domain/models/file_graph.rs (1)

45-47: Stabilize repositories and files before serializing.

edges are explicitly sorted, but HashMap/HashSet have no stable iteration order. The JSON formatter can therefore reshuffle repositories and files between identical runs, which makes diffs and snapshot tests noisy. Prefer BTreeMap/BTreeSet here, or sort these collections at the serialization boundary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/domain/models/file_graph.rs` around lines 45 - 47, repositories (HashMap)
and files (HashSet) iterate in non-deterministic order causing unstable JSON
output; change their types to BTreeMap<String, FileGraphRepo> and
BTreeSet<String> in the FileGraph struct (and update all constructors, methods,
and uses that build or mutate repositories/files) so iteration/serialization is
stable, and adjust imports from std::collections accordingly; alternatively, if
you prefer not to change the model, sort these collections at the serialization
boundary (e.g., convert to sorted Vecs or BTree collections just before JSON
serialization) to ensure deterministic output.
src/connector/api/controller/file_graph_controller.rs (1)

45-428: Split the renderers out of FileGraphController.

This file now mixes request handling, grouping helpers, and two full serializers with very similar clustering code. Pulling DOT and Mermaid rendering into dedicated modules will make future format changes easier to test and review.

As per coding guidelines, "Keep one logical concept per file. If a file grows beyond ~300 lines, split it".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 45 - 428,
Move the DOT and Mermaid rendering logic out of FileGraphController into two new
modules (e.g. file_graph_dot and file_graph_mermaid): extract format_dot,
dot_emit_dir_subclusters, dot_emit_consumer_subclusters, dot_emit_node,
dot_node_id, dot_escape, short_path, file_dir and any DOT-specific helpers into
the DOT module; extract format_mermaid, mermaid_emit_node, mermaid_node_id,
mermaid_escape and Mermaid-specific helpers into the Mermaid module; keep shared
grouping helpers (file_to_repo, files_by_repo, consumer_map) either in a new
shared helper module or make them pub(crate) so both renderers can call them;
update FileGraphController to call file_graph_dot::format_dot(...) and
file_graph_mermaid::format_mermaid(...) and adjust visibility/signatures to
accept &FileGraph and ClusterMode; ensure no logic changes, preserve function
names used by controller (format_dot/format_mermaid) and adjust imports/uses
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/application/use_cases/file_relationship.rs`:
- Around line 72-83: The current use of symbol_map.entry(...).or_insert(...)
makes duplicate-symbol resolution depend on iteration order; change symbol_map
from HashMap<String,(String,String)> to HashMap<String, Vec<(String,String)>>
and, inside the loop that calls
self.vector_repo.get_symbol_to_file_map(repo.id()), push all (file,
repo.id().to_string()) candidates for each symbol instead of using or_insert.
Then update the lookup site(s) that read symbol_map (the call sites that map a
symbol name to a callee file) to apply a deterministic tie-breaker when multiple
candidates exist—e.g., sort candidates by repo id then by file path (or pick the
lexicographically smallest pair) and pick that one—so resolution no longer
depends on fetch order. Ensure references to symbol_map, get_symbol_to_file_map,
and the lookup logic are updated accordingly.

In `@src/connector/api/controller/file_graph_controller.rs`:
- Around line 404-423: The dot_node_id and mermaid_node_id helpers currently
collapse non-alphanumerics causing collisions (e.g., "src/a-b.rs" vs
"src/a/b.rs"); replace their implementations so IDs are a lossless encoding of
the full path/key (e.g., prefix like "n_" and then URL-safe base64 or hex
percent-encoding of the original string) rather than stripping characters,
ensuring the result is a valid DOT/Mermaid identifier; update dot_node_id and
mermaid_node_id to produce those encoded IDs (and keep dot_escape unchanged) so
every distinct input yields a unique renderer ID.
- Around line 30-42: Remove the early return that checks graph.is_empty() so the
selected GraphFormat renderer handles empty graphs; specifically delete the if
graph.is_empty() { return Ok(...); } block and let the existing match on format
call Self::format_dot(&graph, cluster), Self::format_mermaid(&graph, cluster),
or serde_json::to_string_pretty(&graph) for the empty case, or alternatively
update those renderer functions (format_dot, format_mermaid) to return an
appropriate empty representation—ensure graph.is_empty() is not short-circuiting
format selection.

---

Nitpick comments:
In `@src/application/interfaces/vector_repository.rs`:
- Around line 88-101: Update the doc comment on get_symbol_to_file_map to remove
or correct the misleading example claiming the in-memory mock doesn't support
this operation; reference the method name get_symbol_to_file_map and the
InMemoryVectorRepository override (in_memory_vector_repository.rs) and either
remove the "in-memory mock" example or replace it with accurate adapters that
actually rely on the default no-op, so the docs reflect that
InMemoryVectorRepository provides a real implementation.

In `@src/connector/api/controller/file_graph_controller.rs`:
- Around line 45-428: Move the DOT and Mermaid rendering logic out of
FileGraphController into two new modules (e.g. file_graph_dot and
file_graph_mermaid): extract format_dot, dot_emit_dir_subclusters,
dot_emit_consumer_subclusters, dot_emit_node, dot_node_id, dot_escape,
short_path, file_dir and any DOT-specific helpers into the DOT module; extract
format_mermaid, mermaid_emit_node, mermaid_node_id, mermaid_escape and
Mermaid-specific helpers into the Mermaid module; keep shared grouping helpers
(file_to_repo, files_by_repo, consumer_map) either in a new shared helper module
or make them pub(crate) so both renderers can call them; update
FileGraphController to call file_graph_dot::format_dot(...) and
file_graph_mermaid::format_mermaid(...) and adjust visibility/signatures to
accept &FileGraph and ClusterMode; ensure no logic changes, preserve function
names used by controller (format_dot/format_mermaid) and adjust imports/uses
accordingly.

In `@src/domain/models/file_graph.rs`:
- Around line 45-47: repositories (HashMap) and files (HashSet) iterate in
non-deterministic order causing unstable JSON output; change their types to
BTreeMap<String, FileGraphRepo> and BTreeSet<String> in the FileGraph struct
(and update all constructors, methods, and uses that build or mutate
repositories/files) so iteration/serialization is stable, and adjust imports
from std::collections accordingly; alternatively, if you prefer not to change
the model, sort these collections at the serialization boundary (e.g., convert
to sorted Vecs or BTree collections just before JSON serialization) to ensure
deterministic output.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6eff61b9-5bf1-4e6f-88b0-36286bd6f6a3

📥 Commits

Reviewing files that changed from the base of the PR and between 5ce4f74 and b4ca6be.

📒 Files selected for processing (14)
  • src/application/interfaces/vector_repository.rs
  • src/application/use_cases/file_relationship.rs
  • src/application/use_cases/mod.rs
  • src/cli/mod.rs
  • src/connector/adapter/duckdb_vector_repository.rs
  • src/connector/adapter/in_memory_vector_repository.rs
  • src/connector/api/container.rs
  • src/connector/api/controller/file_graph_controller.rs
  • src/connector/api/controller/mod.rs
  • src/connector/api/router.rs
  • src/domain/models/file_graph.rs
  • src/domain/models/mod.rs
  • src/lib.rs
  • src/main.rs

Comment on lines +72 to +83
// ── 2. Build symbol_name → (file_path, repo_id) lookup ────────────
// Prefer the first entry when the same symbol name appears in multiple
// files (e.g. a trait defined in a module and re-exported).
let mut symbol_map: HashMap<String, (String, String)> = HashMap::new();
for repo in &target_repos {
let entries = self
.vector_repo
.get_symbol_to_file_map(repo.id())
.await?;
for (sym, file) in entries {
symbol_map.entry(sym).or_insert((file, repo.id().to_string()));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't make duplicate-symbol resolution depend on fetch order.

or_insert turns "first one returned wins" into the resolution rule for duplicate symbols. Since duplicate names are explicitly expected here, the graph can attach the same callee to different target files depending on repository/vector-store iteration order. Keep all candidates and apply a deterministic tie-breaker at lookup time instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/application/use_cases/file_relationship.rs` around lines 72 - 83, The
current use of symbol_map.entry(...).or_insert(...) makes duplicate-symbol
resolution depend on iteration order; change symbol_map from
HashMap<String,(String,String)> to HashMap<String, Vec<(String,String)>> and,
inside the loop that calls self.vector_repo.get_symbol_to_file_map(repo.id()),
push all (file, repo.id().to_string()) candidates for each symbol instead of
using or_insert. Then update the lookup site(s) that read symbol_map (the call
sites that map a symbol name to a callee file) to apply a deterministic
tie-breaker when multiple candidates exist—e.g., sort candidates by repo id then
by file path (or pick the lexicographically smallest pair) and pick that one—so
resolution no longer depends on fetch order. Ensure references to symbol_map,
get_symbol_to_file_map, and the lookup logic are updated accordingly.

Comment on lines +30 to +42
if graph.is_empty() {
return Ok(
"No file dependency edges found. Make sure at least one repository is indexed \
(codesearch index <path>) and has resolvable symbol references."
.to_string(),
);
}

Ok(match format {
GraphFormat::Dot => Self::format_dot(&graph, cluster),
GraphFormat::Mermaid => Self::format_mermaid(&graph, cluster),
GraphFormat::Json => serde_json::to_string_pretty(&graph)?,
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve the requested format when the graph is empty.

This returns plain text before format is checked, so codesearch graph --format json stops being machine-readable exactly when there are no edges. Let the selected renderer handle the empty case instead.

💡 Possible fix
-        if graph.is_empty() {
-            return Ok(
-                "No file dependency edges found. Make sure at least one repository is indexed \
-                 (codesearch index <path>) and has resolvable symbol references."
-                    .to_string(),
-            );
-        }
-
         Ok(match format {
             GraphFormat::Dot => Self::format_dot(&graph, cluster),
             GraphFormat::Mermaid => Self::format_mermaid(&graph, cluster),
             GraphFormat::Json => serde_json::to_string_pretty(&graph)?,
         })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if graph.is_empty() {
return Ok(
"No file dependency edges found. Make sure at least one repository is indexed \
(codesearch index <path>) and has resolvable symbol references."
.to_string(),
);
}
Ok(match format {
GraphFormat::Dot => Self::format_dot(&graph, cluster),
GraphFormat::Mermaid => Self::format_mermaid(&graph, cluster),
GraphFormat::Json => serde_json::to_string_pretty(&graph)?,
})
Ok(match format {
GraphFormat::Dot => Self::format_dot(&graph, cluster),
GraphFormat::Mermaid => Self::format_mermaid(&graph, cluster),
GraphFormat::Json => serde_json::to_string_pretty(&graph)?,
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 30 - 42,
Remove the early return that checks graph.is_empty() so the selected GraphFormat
renderer handles empty graphs; specifically delete the if graph.is_empty() {
return Ok(...); } block and let the existing match on format call
Self::format_dot(&graph, cluster), Self::format_mermaid(&graph, cluster), or
serde_json::to_string_pretty(&graph) for the empty case, or alternatively update
those renderer functions (format_dot, format_mermaid) to return an appropriate
empty representation—ensure graph.is_empty() is not short-circuiting format
selection.

Comment on lines +404 to +423
/// Convert a file path into a valid DOT node identifier.
fn dot_node_id(path: &str) -> String {
let s: String = path
.chars()
.map(|c| if c.is_alphanumeric() { c } else { '_' })
.collect();
format!("n_{s}")
}

/// Escape a string for DOT double-quoted labels.
fn dot_escape(s: &str) -> String {
s.replace('\\', "\\\\").replace('"', "\\\"")
}

/// Convert a string into a valid Mermaid identifier.
fn mermaid_node_id(s: &str) -> String {
s.chars()
.map(|c| if c.is_alphanumeric() { c } else { '_' })
.collect()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use collision-resistant renderer IDs.

Both helpers erase every non-alphanumeric character, so distinct paths like src/a-b.rs and src/a/b.rs generate the same node ID. That silently merges nodes and edges in DOT and Mermaid output. Build IDs from a lossless encoding of the full node key instead of normalizing characters away.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 404 -
423, The dot_node_id and mermaid_node_id helpers currently collapse
non-alphanumerics causing collisions (e.g., "src/a-b.rs" vs "src/a/b.rs");
replace their implementations so IDs are a lossless encoding of the full
path/key (e.g., prefix like "n_" and then URL-safe base64 or hex
percent-encoding of the original string) rather than stripping characters,
ensuring the result is a valid DOT/Mermaid identifier; update dot_node_id and
mermaid_node_id to produce those encoded IDs (and keep dot_escape unchanged) so
every distinct input yields a unique renderer ID.

The DOT/SVG output was unreadable for large codebases. Two improvements:

HTML format (new default)
  Generates a self-contained interactive page powered by Cytoscape.js:
  - Force-directed layout with repo cluster boundaries
  - Zoom/pan, scroll sensitivity tuned for comfort
  - Sidebar with per-repo colour legend; click a repo to isolate its
    subgraph
  - Live search box filters nodes by file path and highlights their
    neighbourhood
  - Min-weight slider to fade out low-signal edges without reloading
  - Click any file node to highlight its direct neighbours
  - Hover tooltip shows full path on files and weight+kind on edges
  - Edge thickness scales with reference count (mapData weight)
  - Works with all three --cluster modes (none / directory / consumer)
    via Cytoscape compound nodes

Directory granularity (--granularity directory)
  Collapses all file paths to their parent directory before rendering.
  A repo with 300 files in 20 directories becomes 20 nodes, making
  the graph instantly readable. Use this first on any non-trivial
  codebase.

Usage examples:
  # Default: interactive HTML, file-level nodes
  codesearch graph > graph.html && open graph.html

  # Readable in any codebase: directory-level nodes
  codesearch graph --granularity directory > graph.html && open graph.html

  # Consumer clustering on directory-aggregated nodes
  codesearch graph --granularity directory --cluster consumer > graph.html

  # Still need DOT? it's still there
  codesearch graph -F dot | dot -Tsvg -o graph.svg

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
src/cli/mod.rs (1)

232-268: Consider exposing the --cross-repo flag.

The PR objectives mention a --cross-repo flag, but the CLI doesn't define it. The underlying build_graph method accepts an include_cross_repo: bool parameter (currently hardcoded to true in the controller). If filtering out cross-repository edges is a useful feature, consider adding:

/// Include edges between different repositories (default: true).
#[arg(long, default_value_t = true, action = clap::ArgAction::SetFalse)]
cross_repo: bool,

If this was intentionally omitted (always include cross-repo edges), then no action needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/mod.rs` around lines 232 - 268, The Graph CLI struct currently has no
option to toggle cross-repository edges while the controller's build_graph takes
include_cross_repo: bool (and is hardcoded true); add a new field to the Graph
variant named cross_repo: bool with a Clap flag (e.g., long = "cross-repo",
default true, action SetFalse) so the user can disable cross-repo edges, then
wire that Graph.cross_repo value through the command handling into the call to
build_graph (replace the hardcoded true with the passed cross_repo boolean) to
ensure the flag controls include_cross_repo at runtime.
src/connector/api/controller/file_graph_controller.rs (3)

611-667: Consider adding Subresource Integrity (SRI) hash for CDN script.

The Cytoscape.js script is loaded from a CDN without an integrity hash. For security-conscious users, adding SRI prevents execution if the CDN-hosted file is tampered with:

<script src="https://cdn.jsdelivr.net/npm/cytoscape@3.29.2/dist/cytoscape.min.js"
        integrity="sha384-..." crossorigin="anonymous"></script>

This is optional since the graph viewer is typically used locally with trusted data.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 611 -
667, The HTML_TEMPLATE loads Cytoscape via CDN without Subresource Integrity;
update the script tag inside the HTML_TEMPLATE string in
file_graph_controller.rs (search for HTML_TEMPLATE and the Cytoscape script
include) to include an integrity="sha384-..." attribute with the correct SRI
hash for the exact cytoscape@3.29.2 file and add crossorigin="anonymous";
regenerate or verify the SHA384 hash for the minified bundle and insert it into
the integrity attribute so the browser can verify the script before execution.

1-53: Consider splitting this file to comply with size guidelines.

At ~914 lines, this file exceeds the ~300-line guideline. Consider extracting:

  • The HTML_TEMPLATE constant to a separate file (e.g., file_graph_template.rs)
  • Individual renderers to separate modules if they grow further

However, the current structure is logically cohesive, so this is optional.

As per coding guidelines: "Keep one logical concept per file. If a file grows beyond ~300 lines, split it."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 1 - 53,
This file exceeds the repo size guideline; split large constants and renderers
into separate modules: extract the large HTML_TEMPLATE constant into a new file
(e.g., file_graph_template.rs) and move each renderer method (format_html,
format_dot, format_mermaid) into their own module files (e.g., render/html.rs,
render/dot.rs, render/mermaid.rs) while keeping FileGraphController as a thin
orchestrator that calls into those modules; update imports and visibility
(pub(crate) or pub) so FileGraphController::graph can call the new helpers and
re-export or pub use the template/renderer functions as needed.

227-229: Consider propagating serialization errors instead of silently defaulting.

While serde_json::to_string on Vec<serde_json::Value> is unlikely to fail, silently falling back to empty strings could mask issues and produce broken HTML. Since format_html could return Result<String>, propagating errors would be more robust.

-        let nodes_str = serde_json::to_string(&nodes).unwrap_or_default();
-        let edges_str = serde_json::to_string(&edges_json).unwrap_or_default();
-        let repos_str = serde_json::to_string(&graph.repositories).unwrap_or_default();
+        let nodes_str = serde_json::to_string(&nodes)?;
+        let edges_str = serde_json::to_string(&edges_json)?;
+        let repos_str = serde_json::to_string(&graph.repositories)?;

This would require changing format_html to return Result<String>.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 227 -
229, The current code silently swallows serde_json::to_string errors by using
unwrap_or_default for nodes_str, edges_str, and repos_str; change to propagate
serialization errors instead: have calls to serde_json::to_string(&nodes),
serde_json::to_string(&edges_json), and
serde_json::to_string(&graph.repositories) return Results and propagate errors
with ? (or map_err) rather than unwrap_or_default, change format_html to return
Result<String> and update its signature and all callers to handle the Result,
and ensure the controller function returns a compatible Result type so
serialization failures surface instead of producing empty strings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/connector/api/controller/file_graph_controller.rs`:
- Around line 829-830: Remove the unused local variable `keep` declared as const
keep = cy.elements().filter(n => n.data('repoId') === id || n.data('source') &&
true); in the repo isolation block where `activeRepo = id;` is set; simply
delete that `keep` declaration (or replace with a comment if you want to
document the intent), leaving the existing `neighbourhood`-based logic intact so
nothing else in file_graph_controller.rs (e.g. the repo isolation code that
references `neighbourhood`) is affected.

---

Nitpick comments:
In `@src/cli/mod.rs`:
- Around line 232-268: The Graph CLI struct currently has no option to toggle
cross-repository edges while the controller's build_graph takes
include_cross_repo: bool (and is hardcoded true); add a new field to the Graph
variant named cross_repo: bool with a Clap flag (e.g., long = "cross-repo",
default true, action SetFalse) so the user can disable cross-repo edges, then
wire that Graph.cross_repo value through the command handling into the call to
build_graph (replace the hardcoded true with the passed cross_repo boolean) to
ensure the flag controls include_cross_repo at runtime.

In `@src/connector/api/controller/file_graph_controller.rs`:
- Around line 611-667: The HTML_TEMPLATE loads Cytoscape via CDN without
Subresource Integrity; update the script tag inside the HTML_TEMPLATE string in
file_graph_controller.rs (search for HTML_TEMPLATE and the Cytoscape script
include) to include an integrity="sha384-..." attribute with the correct SRI
hash for the exact cytoscape@3.29.2 file and add crossorigin="anonymous";
regenerate or verify the SHA384 hash for the minified bundle and insert it into
the integrity attribute so the browser can verify the script before execution.
- Around line 1-53: This file exceeds the repo size guideline; split large
constants and renderers into separate modules: extract the large HTML_TEMPLATE
constant into a new file (e.g., file_graph_template.rs) and move each renderer
method (format_html, format_dot, format_mermaid) into their own module files
(e.g., render/html.rs, render/dot.rs, render/mermaid.rs) while keeping
FileGraphController as a thin orchestrator that calls into those modules; update
imports and visibility (pub(crate) or pub) so FileGraphController::graph can
call the new helpers and re-export or pub use the template/renderer functions as
needed.
- Around line 227-229: The current code silently swallows serde_json::to_string
errors by using unwrap_or_default for nodes_str, edges_str, and repos_str;
change to propagate serialization errors instead: have calls to
serde_json::to_string(&nodes), serde_json::to_string(&edges_json), and
serde_json::to_string(&graph.repositories) return Results and propagate errors
with ? (or map_err) rather than unwrap_or_default, change format_html to return
Result<String> and update its signature and all callers to handle the Result,
and ensure the controller function returns a compatible Result type so
serialization failures surface instead of producing empty strings.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 68307e2e-d8db-4ffe-aae5-674e8409a5e6

📥 Commits

Reviewing files that changed from the base of the PR and between b4ca6be and bfbd577.

📒 Files selected for processing (4)
  • src/cli/mod.rs
  • src/connector/api/controller/file_graph_controller.rs
  • src/connector/api/router.rs
  • src/lib.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/connector/api/router.rs
  • src/lib.rs

Comment on lines +829 to +830
activeRepo = id;
const keep = cy.elements().filter(n => n.data('repoId') === id || n.data('source') && true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused variable.

The keep variable is declared but never used. The repo isolation logic correctly uses neighbourhood below. This appears to be dead code.

     activeRepo = id;
-    const keep = cy.elements().filter(n => n.data('repoId') === id || n.data('source') && true);
     const repoNode = cy.$(`node[repoId="${id}"][type="repo"]`);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 829 -
830, Remove the unused local variable `keep` declared as const keep =
cy.elements().filter(n => n.data('repoId') === id || n.data('source') && true);
in the repo isolation block where `activeRepo = id;` is set; simply delete that
`keep` declaration (or replace with a comment if you want to document the
intent), leaving the existing `neighbourhood`-based logic intact so nothing else
in file_graph_controller.rs (e.g. the repo isolation code that references
`neighbourhood`) is affected.

claude added 5 commits March 30, 2026 15:11
New `codesearch split <repo-id>` command that analyses a monolith
repository and identifies natural extraction boundaries by grouping
its files according to which external consumers reference them.

Algorithm:
- Build the full cross-repo file dependency graph
- Filter inbound edges (external repo → target) to find the public
  interface of the monolith
- Group target files by their "consumer fingerprint" — the set of
  external repos that depend on each file
- Files with the same fingerprint are extraction candidates
- Identify internal support files that would travel with each group

Output is an interactive HTML page rendered with Sigma.js (WebGL):
- Monolith shown as a large dashed boundary circle
- Extraction candidate groups shown as coloured sub-clusters inside
- Individual files shown as coloured dots within each group
- External consumer repos shown as nodes in an outer ring
- Edges connect each group centroid to its consumer repos
- Sidebar shows stats, group list, consumer list
- Click groups/consumers to highlight connections; search by filename

New types: SplitAnalysis, SplitGroup, ConsumerRepo (domain layer)
New use case: SplitAnalysisUseCase (application layer)
New controller: SplitController (connector layer)

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
Resolves the input string against indexed repositories by trying an
exact ID match first, then a case-insensitive name match.  Produces a
clear error listing available repositories when no match is found.

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
Three bugs fixed:
- Switch CDN from sigma@3.0.0 (not yet a stable release, 404) to
  sigma@2.4.0 which has a known-working jsDelivr UMD build
- Move activeGroupId/activeConsumerId declarations (let → var at top
  of script) so sidebar click handlers registered before the sigma
  initialisation block can reference them without a TDZ error
- Fix SVG halo radius calculation: use graphToViewport() on two points
  and take their pixel distance instead of dividing by camera.ratio,
  which produced incorrect sizes; also remove non-existent sigma v2
  events (beforeDrawingNodes, moveBody)

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
…raph

split command:
- Remove inner group halo circles — colour alone now distinguishes
  groups, eliminating the overlapping-circles visual noise
- File nodes no longer show labels on the canvas; path is shown only
  in the hover tooltip, reducing clutter in dense graphs
- Group centroid nodes (larger, labelled with consumer names) remain
  as visual anchors for each extraction candidate

graph command:
- Replace Cytoscape.js with Sigma.js v2 (WebGL, same CDN as split)
- Layout: repos placed in a ring, their files in adaptive sub-circles
  sized by sqrt(file_count) so large repos spread out naturally
- SVG halos drawn for each repo boundary (dashed coloured ring)
- File labels hidden by default (labelRenderedSizeThreshold=999);
  full path shown in tooltip on hover
- Edge tooltip shows weight and reference kinds
- Click a node to highlight its neighbours; click a repo in the
  sidebar to isolate it; weight slider filters thin edges

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
…ands

- graph: remove SVG halos; distinguish repos by color only; hover dims
  all other nodes and accents direct edges in the hovered node's repo color
- split: replace click-to-select with hover highlight; same-group nodes
  and direct neighbors light up, all others dim; leaveNode restores
- split: remove SVG inner/outer halos entirely (color alone for groups)
- split: sidebar group/consumer clicks still do persistent isolation
- fix: guard duplicate edges with graph.hasEdge() before addEdgeWithKey
- fix: declare interaction state vars with var to avoid TDZ crashes

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
src/connector/api/controller/file_graph_controller.rs (2)

469-493: ⚠️ Potential issue | 🟠 Major

Renderer node IDs are still lossy.

dot_node_id and mermaid_id still collapse every non-alphanumeric character to _, so distinct paths like src/a-b.rs and src/a/b.rs merge into the same rendered node.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 469 -
493, dot_node_id and mermaid_id are currently lossy because they replace every
non-alphanumeric with '_' causing distinct paths (e.g., "src/a-b.rs" vs
"src/a/b.rs") to collide; update both functions to produce stable, unique IDs by
combining a safe sanitized prefix (keep alphanumerics and map other chars to
safe substitutes) with a short deterministic hash of the full path (e.g., hex or
base64url of SHA256/SHA1) so the readable portion remains but collisions are
avoided; ensure dot_emit_node and any other callers use the new dot_node_id
signature and that the chosen characters are valid for DOT and Mermaid
identifiers.

32-39: ⚠️ Potential issue | 🟠 Major

Preserve the requested renderer when the graph is empty.

This still returns plain text before format is checked, so --format json, --format dot, and --format mermaid stop being machine-readable exactly when the graph is empty.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/application/use_cases/split_analysis.rs`:
- Around line 109-125: The support-file selection only considers a single hop;
change it to compute the transitive closure: initialize support_set empty, then
repeatedly scan internal_referenced_by for candidates not in public_files_all
and not already in support_set where all referencers are in pub_set ∪
support_set, add them to support_set, and repeat until no new files are added;
finally sort and return support_set as the support_files Vec<String> (replace
the current support_files block that uses a single filter/map/collect).

In `@src/connector/api/controller/file_graph_controller.rs`:
- Around line 130-135: The injected values for __FILES__, __EDGES__, and
__REPOS__ are raw and can include "</script>" or other characters that break out
of the script tag; update the code that builds files_json, edges_json and
repos_str (the area that calls HTML_TEMPLATE.replace("__FILES__",
...).replace("__EDGES__", ...).replace("__REPOS__", ...)) to produce safe
JavaScript literals using proper JSON serialization (e.g., serde_json::to_string
on the actual Rust structures or the composed JSON array strings) and then
sanitize any raw occurrences of "</script>" in those serialized strings (e.g.,
replace "</script>" with "<\\/script>") before inserting into HTML_TEMPLATE,
leaving max_weight.to_string() unchanged. Ensure you reference HTML_TEMPLATE,
files_json/edges_json/repos_str and perform serialization/sanitization prior to
the replace calls.

In `@src/connector/api/controller/split_controller.rs`:
- Around line 71-74: The code in format_html currently injects raw data_json
into HTML_TEMPLATE, allowing values like "</script>" to break out of the script
block; update format_html to safely escape the serialized JSON before embedding
(e.g., produce a JSON string via build_data_json / serde_json::to_string then
replace unsafe sequences such as "</script>" with "<\\/script>" and optionally
escape "<!--" / "-->" occurrences) or move the data into a safe container (e.g.,
a <script type="application/json"> or base64-encoded payload) so that the DATA
assignment in the template cannot be terminated by user content; apply this
change where format_html, build_data_json, and HTML_TEMPLATE are used.
- Around line 33-37: The case-insensitive fallback currently returns the first
repo with a matching lowercased name and thus can silently pick the wrong repo
when multiple repos share the same name; change the logic around the
repos.iter().find(...) fallback to collect all repositories where
r.name().to_lowercase() == lower (e.g., using filter and collect), then: if
exactly one match return that r.id().to_string(), if zero continue to the
existing error path, and if more than one return an explicit Err indicating the
repository name is ambiguous (include both the provided name and suggest using
the exact id) so callers are not silently misdirected.

---

Duplicate comments:
In `@src/connector/api/controller/file_graph_controller.rs`:
- Around line 469-493: dot_node_id and mermaid_id are currently lossy because
they replace every non-alphanumeric with '_' causing distinct paths (e.g.,
"src/a-b.rs" vs "src/a/b.rs") to collide; update both functions to produce
stable, unique IDs by combining a safe sanitized prefix (keep alphanumerics and
map other chars to safe substitutes) with a short deterministic hash of the full
path (e.g., hex or base64url of SHA256/SHA1) so the readable portion remains but
collisions are avoided; ensure dot_emit_node and any other callers use the new
dot_node_id signature and that the chosen characters are valid for DOT and
Mermaid identifiers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b95ceb48-1c9a-4285-a584-ee6a83797749

📥 Commits

Reviewing files that changed from the base of the PR and between bfbd577 and 31db2a1.

📒 Files selected for processing (12)
  • src/application/use_cases/mod.rs
  • src/application/use_cases/split_analysis.rs
  • src/cli/mod.rs
  • src/connector/api/container.rs
  • src/connector/api/controller/file_graph_controller.rs
  • src/connector/api/controller/mod.rs
  • src/connector/api/controller/split_controller.rs
  • src/connector/api/router.rs
  • src/domain/models/mod.rs
  • src/domain/models/split_analysis.rs
  • src/lib.rs
  • src/main.rs
✅ Files skipped from review due to trivial changes (2)
  • src/main.rs
  • src/connector/api/controller/mod.rs
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/application/use_cases/mod.rs
  • src/domain/models/mod.rs
  • src/connector/api/container.rs
  • src/cli/mod.rs

Comment on lines +109 to +125
// Support files: internal files referenced ONLY by files in this
// group's public interface (not by external repos or other groups).
let support_files: Vec<String> = {
let mut sv: Vec<String> = internal_referenced_by
.iter()
.filter(|(candidate, referencers)| {
// candidate must not itself be in the external public interface
!public_files_all.contains(*candidate)
// all referencers of this candidate must be in this group's
// public interface
&& referencers.iter().all(|r| pub_set.contains(r.as_str()))
})
.map(|(f, _)| f.clone())
.collect();
sv.sort();
sv
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Support-file discovery stops after one hop.

A chain like public.rs -> helper.rs -> util.rs only adds helper.rs; util.rs never qualifies because its referencer is no longer in the initial public-file set. That makes extraction groups incomplete for nested internal dependencies.

💡 Suggested fix
-                let pub_set: HashSet<&str> = pub_files.iter().map(|s| s.as_str()).collect();
+                let mut closure: HashSet<String> = pub_files.iter().cloned().collect();

                 // Support files: internal files referenced ONLY by files in this
                 // group's public interface (not by external repos or other groups).
                 let support_files: Vec<String> = {
-                    let mut sv: Vec<String> = internal_referenced_by
-                        .iter()
-                        .filter(|(candidate, referencers)| {
-                            // candidate must not itself be in the external public interface
-                            !public_files_all.contains(*candidate)
-                                // all referencers of this candidate must be in this group's
-                                // public interface
-                                && referencers.iter().all(|r| pub_set.contains(r.as_str()))
-                        })
-                        .map(|(f, _)| f.clone())
-                        .collect();
+                    loop {
+                        let mut changed = false;
+                        for (candidate, referencers) in &internal_referenced_by {
+                            if public_files_all.contains(candidate) || closure.contains(candidate) {
+                                continue;
+                            }
+                            if referencers.iter().all(|r| closure.contains(r)) {
+                                changed |= closure.insert(candidate.clone());
+                            }
+                        }
+                        if !changed {
+                            break;
+                        }
+                    }
+
+                    let mut sv: Vec<String> = closure
+                        .into_iter()
+                        .filter(|f| !pub_files.contains(f))
+                        .collect();
                     sv.sort();
                     sv
                 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/application/use_cases/split_analysis.rs` around lines 109 - 125, The
support-file selection only considers a single hop; change it to compute the
transitive closure: initialize support_set empty, then repeatedly scan
internal_referenced_by for candidates not in public_files_all and not already in
support_set where all referencers are in pub_set ∪ support_set, add them to
support_set, and repeat until no new files are added; finally sort and return
support_set as the support_files Vec<String> (replace the current support_files
block that uses a single filter/map/collect).

Comment on lines +130 to +135
HTML_TEMPLATE
.replace("__FILES__", &format!("[{}]", files_json.join(",")))
.replace("__EDGES__", &format!("[{}]", edges_json.join(",")))
.replace("__REPOS__", &repos_str)
.replace("__MAX_WEIGHT__", &max_weight.to_string())
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Escape embedded graph JSON before injecting it into the script tag.

__FILES__, __EDGES__, and __REPOS__ are inserted raw into <script>. A repository or file name containing </script> breaks out of the script block and executes attacker-controlled HTML/JS when the report is opened.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 130 -
135, The injected values for __FILES__, __EDGES__, and __REPOS__ are raw and can
include "</script>" or other characters that break out of the script tag; update
the code that builds files_json, edges_json and repos_str (the area that calls
HTML_TEMPLATE.replace("__FILES__", ...).replace("__EDGES__",
...).replace("__REPOS__", ...)) to produce safe JavaScript literals using proper
JSON serialization (e.g., serde_json::to_string on the actual Rust structures or
the composed JSON array strings) and then sanitize any raw occurrences of
"</script>" in those serialized strings (e.g., replace "</script>" with
"<\\/script>") before inserting into HTML_TEMPLATE, leaving
max_weight.to_string() unchanged. Ensure you reference HTML_TEMPLATE,
files_json/edges_json/repos_str and perform serialization/sanitization prior to
the replace calls.

Comment on lines +33 to +37
// Case-insensitive name match
let lower = name_or_id.to_lowercase();
if let Some(r) = repos.iter().find(|r| r.name().to_lowercase() == lower) {
return Ok(r.id().to_string());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject ambiguous repository-name matches.

The case-insensitive fallback currently picks the first match. If two indexed repos share the same name, split <name> can analyse the wrong repository without warning.

💡 Suggested fix
+        if let Some(r) = repos.iter().find(|r| r.name() == name_or_id) {
+            return Ok(r.id().to_string());
+        }
+
         // Case-insensitive name match
         let lower = name_or_id.to_lowercase();
-        if let Some(r) = repos.iter().find(|r| r.name().to_lowercase() == lower) {
-            return Ok(r.id().to_string());
+        let matches: Vec<_> = repos
+            .iter()
+            .filter(|r| r.name().to_lowercase() == lower)
+            .collect();
+        match matches.as_slice() {
+            [r] => return Ok(r.id().to_string()),
+            [] => {}
+            _ => {
+                let choices = matches
+                    .iter()
+                    .map(|r| format!("  {} (id: {})", r.name(), r.id()))
+                    .collect::<Vec<_>>()
+                    .join("\n");
+                return Err(anyhow::anyhow!(
+                    "Repository name '{}' is ambiguous.\nMatches:\n{}",
+                    name_or_id,
+                    choices
+                ));
+            }
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Case-insensitive name match
let lower = name_or_id.to_lowercase();
if let Some(r) = repos.iter().find(|r| r.name().to_lowercase() == lower) {
return Ok(r.id().to_string());
}
if let Some(r) = repos.iter().find(|r| r.name() == name_or_id) {
return Ok(r.id().to_string());
}
// Case-insensitive name match
let lower = name_or_id.to_lowercase();
let matches: Vec<_> = repos
.iter()
.filter(|r| r.name().to_lowercase() == lower)
.collect();
match matches.as_slice() {
[r] => return Ok(r.id().to_string()),
[] => {}
_ => {
let choices = matches
.iter()
.map(|r| format!(" {} (id: {})", r.name(), r.id()))
.collect::<Vec<_>>()
.join("\n");
return Err(anyhow::anyhow!(
"Repository name '{}' is ambiguous.\nMatches:\n{}",
name_or_id,
choices
));
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/split_controller.rs` around lines 33 - 37, The
case-insensitive fallback currently returns the first repo with a matching
lowercased name and thus can silently pick the wrong repo when multiple repos
share the same name; change the logic around the repos.iter().find(...) fallback
to collect all repositories where r.name().to_lowercase() == lower (e.g., using
filter and collect), then: if exactly one match return that r.id().to_string(),
if zero continue to the existing error path, and if more than one return an
explicit Err indicating the repository name is ambiguous (include both the
provided name and suggest using the exact id) so callers are not silently
misdirected.

Comment on lines +71 to +74
fn format_html(analysis: &SplitAnalysis) -> String {
let data_json = build_data_json(analysis);
HTML_TEMPLATE.replace("__SPLIT_DATA__", &data_json)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Escape the JSON payload before embedding it in the <script> block.

data_json contains repo names, repo paths, and file paths. A value containing </script> breaks out of const DATA = ... and runs attacker-controlled HTML/JS when the report is opened.

💡 Suggested fix
 fn format_html(analysis: &SplitAnalysis) -> String {
-    let data_json = build_data_json(analysis);
+    let data_json = escape_json_for_script(&build_data_json(analysis));
     HTML_TEMPLATE.replace("__SPLIT_DATA__", &data_json)
 }
+
+fn escape_json_for_script(json: &str) -> String {
+    json.replace('<', "\\u003c")
+        .replace('>', "\\u003e")
+        .replace('&', "\\u0026")
+        .replace('\u{2028}', "\\u2028")
+        .replace('\u{2029}', "\\u2029")
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/split_controller.rs` around lines 71 - 74, The
code in format_html currently injects raw data_json into HTML_TEMPLATE, allowing
values like "</script>" to break out of the script block; update format_html to
safely escape the serialized JSON before embedding (e.g., produce a JSON string
via build_data_json / serde_json::to_string then replace unsafe sequences such
as "</script>" with "<\\/script>" and optionally escape "<!--" / "-->"
occurrences) or move the data into a safe container (e.g., a <script
type="application/json"> or base64-encoded payload) so that the DATA assignment
in the template cannot be terminated by user content; apply this change where
format_html, build_data_json, and HTML_TEMPLATE are used.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
src/connector/api/controller/file_graph_controller.rs (3)

96-135: ⚠️ Potential issue | 🔴 Critical

Escape embedded graph data before inserting it into <script>.

These JSON fragments are injected raw into the script block. serde_json::to_string does not neutralize </script>, so a repo or file name containing that sequence can break out of the script tag and execute arbitrary HTML/JS when the report is opened. Serialize the full arrays/objects and replace raw </script> with <\\/script> before HTML_TEMPLATE.replace(...).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 96 - 135,
The code currently injects raw JSON fragments (built into files_json,
edges_json, repos_str) directly into HTML_TEMPLATE which allows a file/repo name
containing "</script>" to break out of the script tag; fix by serializing the
full arrays/objects with serde_json::to_string (e.g. produce a single JSON
string for the files array and edges array instead of manual string fragments)
or, if keeping the string assembly, run a replace on the serialized JSON to
escape script end tags (replace "</script>" with "<\\/script>") before calling
HTML_TEMPLATE.replace; update the spots that build files_json/edges_json and
repos_str (and the final .replace calls) so the injected values are the fully
serialized-and-escaped JSON strings.

32-39: ⚠️ Potential issue | 🟠 Major

Let the selected renderer handle empty graphs.

This plain-text return makes --format json, dot, mermaid, and html stop honoring their output contract exactly when the graph is empty. Return the renderer's empty representation instead of short-circuiting here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 32 - 39,
The code currently short-circuits on graph.is_empty() and returns a hard-coded
plaintext message; instead remove this special-case return and let the chosen
renderer produce the empty output. Replace the early return so that when
graph.is_empty() you call the renderer's output path (e.g., invoke the same
render/export function you use for non-empty graphs) or pass the empty graph
into the existing rendering pipeline so formats like JSON, DOT, Mermaid, and
HTML can emit their canonical empty representations; update the branch that
references graph.is_empty() and the surrounding return logic so it delegates to
the renderer rather than returning the plaintext tip.

469-475: ⚠️ Potential issue | 🟠 Major

Use lossless renderer IDs.

Replacing every non-alphanumeric with _ still merges distinct inputs like src/a-b.rs and src/a/b.rs in both DOT and Mermaid. Encode the full key losslessly instead of normalizing characters away.

Also applies to: 489-493

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 469 -
475, dot_node_id currently normalizes path by replacing non-alphanumerics with
'_' which collapses distinct paths; change it (and the equivalent ID generator
used for the Mermaid renderer at the other occurrence) to produce lossless
renderer IDs by encoding the full path instead of stripping characters (for
example using a deterministic encoding like base64-url or hex) and prefixing
with "n_". Ensure the encoder is deterministic, URL/identifier-safe (no padding
or use URL-safe variant), and used consistently by both DOT and Mermaid ID
generators so distinct inputs like "src/a-b.rs" and "src/a/b.rs" remain unique.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/connector/api/controller/file_graph_controller.rs`:
- Around line 386-392: aggregate_to_directories currently represents repository
root as the empty string (""), which later becomes a blank node label; change
the code to normalize that sentinel to a visible root label (e.g., "/" or ".")
wherever directory path strings are produced or used: in the loop inside
aggregate_to_directories use file_dir(&edge.from_file) and
file_dir(&edge.to_file) but replace "" with "/" (or ".") before constructing
keys for edge_map and before any node emission; apply the same normalization in
the other affected blocks referenced (the similar loops/emitters around the
other two ranges) so that edge keys and node labels never use an empty-string
root.
- Around line 92-135: The HTML UI always initializes the weight controls to 1
while edges below the requested min_weight were already filtered server-side;
update the template plumbing so the actual min_weight is passed into the HTML.
Modify format_html in file_graph_controller.rs to accept a min_weight parameter
(or retrieve it from the FileGraph/caller) and add a replacement for a new
template placeholder (e.g., "__MIN_WEIGHT__") when building the HTML_TEMPLATE
string so the generated page sets currentMinWeight, the input value, and the
label to the real threshold; apply the same change to the other
template-producing functions referenced (the other format/HTML builders around
the same file) so the UI reflects the server-side min_weight.
- Around line 19-30: The controller method graph currently hardcodes
include_cross_repo=true when calling FileRelationshipUseCase::build_graph; add a
cross_repo boolean parameter to the graph method signature (e.g., cross_repo:
bool), thread that value through to the call produced by
container.file_graph_use_case() by replacing the hardcoded true with cross_repo
(use_case.build_graph(repository.as_deref(), min_weight, cross_repo).await?),
and update any callers (CLI/route bindings) to supply the desired cross-repo
flag so the controller no longer forces cross-repo behavior.
- Around line 10-818: The FileGraphController file is too large and mixes
controller logic with rendering/export templates; extract the HTML/JS template
and the three renderer functions into their own modules/files and keep the
controller focused on orchestration. Specifically, move format_html (including
HTML_TEMPLATE), format_dot, format_mermaid, and related helpers (dot_node_id,
dot_escape, dot_emit_node, mermaid_id, mermaid_escape, mermaid_emit_node, and
any JS/CSS string constants) into a new renderer module (e.g.,
file_graph_renderer) and export functions like render_html, render_dot,
render_mermaid; keep aggregate_to_directories, file_dir, short_path,
file_to_repo, files_by_repo, consumer_map in the controller or utility module as
appropriate; update FileGraphController::graph to call the new renderer
functions and import the renderer module, and ensure tests and visibility
(pub(crate)/pub) are adjusted for the moved symbols.

---

Duplicate comments:
In `@src/connector/api/controller/file_graph_controller.rs`:
- Around line 96-135: The code currently injects raw JSON fragments (built into
files_json, edges_json, repos_str) directly into HTML_TEMPLATE which allows a
file/repo name containing "</script>" to break out of the script tag; fix by
serializing the full arrays/objects with serde_json::to_string (e.g. produce a
single JSON string for the files array and edges array instead of manual string
fragments) or, if keeping the string assembly, run a replace on the serialized
JSON to escape script end tags (replace "</script>" with "<\\/script>") before
calling HTML_TEMPLATE.replace; update the spots that build files_json/edges_json
and repos_str (and the final .replace calls) so the injected values are the
fully serialized-and-escaped JSON strings.
- Around line 32-39: The code currently short-circuits on graph.is_empty() and
returns a hard-coded plaintext message; instead remove this special-case return
and let the chosen renderer produce the empty output. Replace the early return
so that when graph.is_empty() you call the renderer's output path (e.g., invoke
the same render/export function you use for non-empty graphs) or pass the empty
graph into the existing rendering pipeline so formats like JSON, DOT, Mermaid,
and HTML can emit their canonical empty representations; update the branch that
references graph.is_empty() and the surrounding return logic so it delegates to
the renderer rather than returning the plaintext tip.
- Around line 469-475: dot_node_id currently normalizes path by replacing
non-alphanumerics with '_' which collapses distinct paths; change it (and the
equivalent ID generator used for the Mermaid renderer at the other occurrence)
to produce lossless renderer IDs by encoding the full path instead of stripping
characters (for example using a deterministic encoding like base64-url or hex)
and prefixing with "n_". Ensure the encoder is deterministic,
URL/identifier-safe (no padding or use URL-safe variant), and used consistently
by both DOT and Mermaid ID generators so distinct inputs like "src/a-b.rs" and
"src/a/b.rs" remain unique.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9949003f-3ae9-45d2-8785-3807c8f0aec2

📥 Commits

Reviewing files that changed from the base of the PR and between 31db2a1 and 8bf0181.

📒 Files selected for processing (1)
  • src/connector/api/controller/file_graph_controller.rs

Comment on lines +10 to +818
pub struct FileGraphController<'a> {
container: &'a Container,
}

impl<'a> FileGraphController<'a> {
pub fn new(container: &'a Container) -> Self {
Self { container }
}

pub async fn graph(
&self,
repository: Option<Vec<String>>,
format: GraphFormat,
granularity: NodeGranularity,
min_weight: usize,
cluster: ClusterMode,
) -> Result<String> {
let use_case = self.container.file_graph_use_case();
let graph = use_case
.build_graph(repository.as_deref(), min_weight, true)
.await?;

if graph.is_empty() {
return Ok(
"No file dependency edges found. Make sure at least one repository is \
indexed (codesearch index <path>) and has resolvable symbol references.\n\
Tip: try --granularity directory to aggregate files into directories first."
.to_string(),
);
}

// Optionally collapse file-level nodes to directory-level.
let graph = match granularity {
NodeGranularity::File => graph,
NodeGranularity::Directory => aggregate_to_directories(graph),
};

Ok(match format {
GraphFormat::Html => Self::format_html(&graph, cluster),
GraphFormat::Dot => Self::format_dot(&graph, cluster),
GraphFormat::Mermaid => Self::format_mermaid(&graph, cluster),
GraphFormat::Json => serde_json::to_string_pretty(&graph)?,
})
}

// ── Shared helpers ────────────────────────────────────────────────────

fn file_to_repo<'g>(graph: &'g FileGraph) -> HashMap<&'g str, &'g str> {
let mut map: HashMap<&str, &str> = HashMap::new();
for edge in &graph.edges {
map.entry(edge.from_file.as_str())
.or_insert(edge.from_repo_id.as_str());
map.entry(edge.to_file.as_str())
.or_insert(edge.to_repo_id.as_str());
}
map
}

fn files_by_repo<'g>(graph: &'g FileGraph) -> BTreeMap<&'g str, BTreeSet<&'g str>> {
let file_repo = Self::file_to_repo(graph);
let mut map: BTreeMap<&str, BTreeSet<&str>> = BTreeMap::new();
for (file, repo) in &file_repo {
map.entry(repo).or_default().insert(file);
}
map
}

/// For each file, return the set of *other* repo_ids whose edges point to it.
fn consumer_map<'g>(graph: &'g FileGraph) -> HashMap<&'g str, BTreeSet<&'g str>> {
let mut map: HashMap<&str, BTreeSet<&str>> = HashMap::new();
for edge in &graph.edges {
if edge.from_repo_id != edge.to_repo_id {
map.entry(edge.to_file.as_str())
.or_default()
.insert(edge.from_repo_id.as_str());
}
}
map
}

// ── Interactive HTML renderer (Sigma.js) ─────────────────────────────

fn format_html(graph: &FileGraph, _cluster: ClusterMode) -> String {
let file_repo = Self::file_to_repo(graph);

// Flat file list — layout computed client-side
let files_json: Vec<String> = {
let mut sorted: Vec<&str> = graph.files.iter().map(String::as_str).collect();
sorted.sort();
sorted
.into_iter()
.map(|f| {
let repo_id = file_repo.get(f).copied().unwrap_or("");
format!(
r#"{{"id":{},"label":{},"repoId":{}}}"#,
serde_json::to_string(f).unwrap_or_default(),
serde_json::to_string(&short_path(f)).unwrap_or_default(),
serde_json::to_string(repo_id).unwrap_or_default(),
)
})
.collect()
};

let edges_json: Vec<String> = graph
.edges
.iter()
.map(|e| {
format!(
r#"{{"source":{},"target":{},"weight":{},"kinds":{}}}"#,
serde_json::to_string(&e.from_file).unwrap_or_default(),
serde_json::to_string(&e.to_file).unwrap_or_default(),
e.weight,
serde_json::to_string(&e.reference_kinds.join(", ")).unwrap_or_default(),
)
})
.collect();

let repos_str = serde_json::to_string(&graph.repositories).unwrap_or_default();
let max_weight = graph.edges.iter().map(|e| e.weight).max().unwrap_or(1);

HTML_TEMPLATE
.replace("__FILES__", &format!("[{}]", files_json.join(",")))
.replace("__EDGES__", &format!("[{}]", edges_json.join(",")))
.replace("__REPOS__", &repos_str)
.replace("__MAX_WEIGHT__", &max_weight.to_string())
}

// ── DOT renderer ──────────────────────────────────────────────────────

fn format_dot(graph: &FileGraph, cluster: ClusterMode) -> String {
let mut out = String::new();
out.push_str(
"digraph file_dependencies {\n\
\trankdir=LR;\n\
\tnode [shape=box fontname=\"Helvetica\" fontsize=10];\n\
\tedge [fontsize=9];\n\n",
);

let repo_files = Self::files_by_repo(graph);

for (repo_idx, (repo_id, files)) in repo_files.iter().enumerate() {
let repo_name = graph
.repositories
.get(*repo_id)
.map(|r| r.name.as_str())
.unwrap_or(repo_id);

out.push_str(&format!(
"\tsubgraph cluster_r{idx} {{\n\
\t\tlabel=\"{label}\";\n\
\t\tstyle=filled;\n\
\t\tcolor=lightblue;\n\
\t\tfontname=\"Helvetica Bold\";\n\
\t\tfontsize=12;\n\n",
idx = repo_idx,
label = dot_escape(repo_name),
));

match cluster {
ClusterMode::None => {
for file in files {
dot_emit_node(&mut out, file, "\t\t");
}
}
ClusterMode::Directory => {
Self::dot_emit_dir_subclusters(&mut out, repo_idx, files);
}
ClusterMode::Consumer => {
let consumers = Self::consumer_map(graph);
Self::dot_emit_consumer_subclusters(
&mut out, repo_idx, repo_id, files, &consumers, graph,
);
}
}

out.push_str("\t}\n\n");
}

for edge in &graph.edges {
let attrs = if edge.weight > 1 {
format!(" [label=\"{}\"]", edge.weight)
} else {
String::new()
};
out.push_str(&format!(
"\t{} -> {}{};\n",
dot_node_id(&edge.from_file),
dot_node_id(&edge.to_file),
attrs,
));
}

out.push_str("}\n");
out
}

fn dot_emit_dir_subclusters(out: &mut String, repo_idx: usize, files: &BTreeSet<&str>) {
let mut by_dir: BTreeMap<&str, Vec<&str>> = BTreeMap::new();
for file in files {
by_dir.entry(file_dir(file)).or_default().push(file);
}
for (dir_idx, (dir, dir_files)) in by_dir.iter().enumerate() {
let label = if dir.is_empty() { "/" } else { dir };
out.push_str(&format!(
"\t\tsubgraph cluster_r{ri}_d{di} {{\n\
\t\t\tlabel=\"{label}\";\n\
\t\t\tstyle=filled;\n\
\t\t\tcolor=lightyellow;\n\
\t\t\tfontsize=10;\n\n",
ri = repo_idx,
di = dir_idx,
label = dot_escape(label),
));
for file in dir_files {
dot_emit_node(out, file, "\t\t\t");
}
out.push_str("\t\t}\n");
}
}

fn dot_emit_consumer_subclusters(
out: &mut String,
repo_idx: usize,
_repo_id: &str,
files: &BTreeSet<&str>,
consumers: &HashMap<&str, BTreeSet<&str>>,
graph: &FileGraph,
) {
let mut by_consumers: BTreeMap<Vec<&str>, Vec<&str>> = BTreeMap::new();
for file in files {
let key: Vec<&str> = consumers
.get(file)
.map(|s| s.iter().copied().collect())
.unwrap_or_default();
by_consumers.entry(key).or_default().push(file);
}
for (grp_idx, (consumer_ids, grp_files)) in by_consumers.iter().enumerate() {
let label = if consumer_ids.is_empty() {
"internal".to_string()
} else {
let names: Vec<&str> = consumer_ids
.iter()
.map(|id| {
graph
.repositories
.get(*id)
.map(|r| r.name.as_str())
.unwrap_or(id)
})
.collect();
format!("← {}", names.join(", "))
};
out.push_str(&format!(
"\t\tsubgraph cluster_r{ri}_c{ci} {{\n\
\t\t\tlabel=\"{label}\";\n\
\t\t\tstyle=filled;\n\
\t\t\tcolor={color};\n\
\t\t\tfontsize=10;\n\n",
ri = repo_idx,
ci = grp_idx,
label = dot_escape(&label),
color = if consumer_ids.is_empty() { "white" } else { "lightgreen" },
));
for file in grp_files {
dot_emit_node(out, file, "\t\t\t");
}
out.push_str("\t\t}\n");
}
}

// ── Mermaid renderer ──────────────────────────────────────────────────

fn format_mermaid(graph: &FileGraph, cluster: ClusterMode) -> String {
let mut out = String::from("graph LR\n");
let repo_files = Self::files_by_repo(graph);
let consumers = Self::consumer_map(graph);

for (repo_id, files) in &repo_files {
let repo_name = graph
.repositories
.get(*repo_id)
.map(|r| r.name.as_str())
.unwrap_or(repo_id);
out.push_str(&format!(
" subgraph {id}[\"{label}\"]\n",
id = mermaid_id(repo_id),
label = mermaid_escape(repo_name),
));
match cluster {
ClusterMode::None => {
for file in files {
mermaid_emit_node(&mut out, file, " ");
}
}
ClusterMode::Directory => {
let mut by_dir: BTreeMap<&str, Vec<&str>> = BTreeMap::new();
for file in files {
by_dir.entry(file_dir(file)).or_default().push(file);
}
for (dir, dir_files) in &by_dir {
let label = if dir.is_empty() { "/" } else { dir };
let id = format!("{}_{}", mermaid_id(repo_id), mermaid_id(label));
out.push_str(&format!(
" subgraph {id}[\"{label}\"]\n",
id = id,
label = mermaid_escape(label),
));
for file in dir_files {
mermaid_emit_node(&mut out, file, " ");
}
out.push_str(" end\n");
}
}
ClusterMode::Consumer => {
let mut by_consumers: BTreeMap<Vec<&str>, Vec<&str>> = BTreeMap::new();
for file in files {
let key: Vec<&str> = consumers
.get(file)
.map(|s| s.iter().copied().collect())
.unwrap_or_default();
by_consumers.entry(key).or_default().push(file);
}
for (grp_idx, (consumer_ids, grp_files)) in by_consumers.iter().enumerate() {
let label = if consumer_ids.is_empty() {
"internal".to_string()
} else {
let names: Vec<&str> = consumer_ids
.iter()
.map(|id| {
graph
.repositories
.get(*id)
.map(|r| r.name.as_str())
.unwrap_or(id)
})
.collect();
format!("used by {}", names.join(", "))
};
let id = format!("{}_c{}", mermaid_id(repo_id), grp_idx);
out.push_str(&format!(
" subgraph {id}[\"{label}\"]\n",
id = id,
label = mermaid_escape(&label),
));
for file in grp_files {
mermaid_emit_node(&mut out, file, " ");
}
out.push_str(" end\n");
}
}
}
out.push_str(" end\n");
}

out.push('\n');
for edge in &graph.edges {
let arrow = if edge.weight > 1 {
format!(" -- {} --> ", edge.weight)
} else {
" --> ".to_string()
};
out.push_str(&format!(
" {}{}{}\n",
mermaid_id(&edge.from_file),
arrow,
mermaid_id(&edge.to_file),
));
}
out
}
}

// ── Directory aggregation ─────────────────────────────────────────────────────

/// Collapse file-level `FileGraph` to directory-level by replacing each
/// `file_path` with its parent directory and re-summing edge weights.
fn aggregate_to_directories(graph: FileGraph) -> FileGraph {
let mut edge_map: HashMap<(String, String, String, String), (usize, HashSet<String>)> =
HashMap::new();

for edge in &graph.edges {
let from = file_dir(&edge.from_file).to_string();
let to = file_dir(&edge.to_file).to_string();
// Skip self-loops that appear after collapsing (files in same directory).
if from == to && edge.from_repo_id == edge.to_repo_id {
continue;
}
let key = (
from,
edge.from_repo_id.clone(),
to,
edge.to_repo_id.clone(),
);
let entry = edge_map.entry(key).or_insert((0, HashSet::new()));
entry.0 += edge.weight;
entry.1.extend(edge.reference_kinds.iter().cloned());
}

let mut edges: Vec<FileEdge> = edge_map
.into_iter()
.map(
|((from_file, from_repo_id, to_file, to_repo_id), (weight, kinds))| FileEdge {
from_file,
from_repo_id,
to_file,
to_repo_id,
weight,
reference_kinds: {
let mut v: Vec<String> = kinds.into_iter().collect();
v.sort();
v
},
},
)
.collect();
edges.sort_by(|a, b| {
b.weight
.cmp(&a.weight)
.then(a.from_file.cmp(&b.from_file))
.then(a.to_file.cmp(&b.to_file))
});

let files: HashSet<String> = edges
.iter()
.flat_map(|e| [e.from_file.clone(), e.to_file.clone()])
.collect();

FileGraph {
repositories: graph.repositories,
files,
edges,
}
}

// ── Rendering helpers ─────────────────────────────────────────────────────────

fn file_dir(path: &str) -> &str {
match path.rfind('/') {
Some(pos) => &path[..pos],
None => "",
}
}

fn short_path(path: &str) -> &str {
let bytes = path.as_bytes();
let mut slash_count = 0;
let mut last_pos = None;
for (i, &b) in bytes.iter().enumerate().rev() {
if b == b'/' {
slash_count += 1;
if slash_count == 2 {
last_pos = Some(i + 1);
break;
}
}
}
last_pos.map(|p| &path[p..]).unwrap_or(path)
}

fn dot_node_id(path: &str) -> String {
let s: String = path
.chars()
.map(|c| if c.is_alphanumeric() { c } else { '_' })
.collect();
format!("n_{s}")
}

fn dot_escape(s: &str) -> String {
s.replace('\\', "\\\\").replace('"', "\\\"")
}

fn dot_emit_node(out: &mut String, file: &str, indent: &str) {
out.push_str(&format!(
"{indent}{id} [label=\"{label}\"];\n",
id = dot_node_id(file),
label = dot_escape(short_path(file)),
));
}

fn mermaid_id(s: &str) -> String {
s.chars()
.map(|c| if c.is_alphanumeric() { c } else { '_' })
.collect()
}

fn mermaid_escape(s: &str) -> String {
s.replace('"', "'")
}

fn mermaid_emit_node(out: &mut String, file: &str, indent: &str) {
out.push_str(&format!(
"{indent}{id}[\"{label}\"]\n",
id = mermaid_id(file),
label = mermaid_escape(short_path(file)),
));
}

// ── HTML template (Sigma.js v2) ───────────────────────────────────────────────

const HTML_TEMPLATE: &str = r#"<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>File Dependency Graph — codesearch</title>
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body { display: flex; height: 100vh; overflow: hidden; background: #0d1117; color: #c9d1d9; font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif; font-size: 13px; }
#sidebar { width: 260px; min-width: 200px; background: #161b22; border-right: 1px solid #30363d; display: flex; flex-direction: column; overflow: hidden; }
#sidebar-inner { flex: 1; overflow-y: auto; padding: 14px; display: flex; flex-direction: column; gap: 14px; }
h1 { font-size: 13px; font-weight: 600; color: #f0f6fc; }
.section-label { font-size: 10px; font-weight: 600; color: #8b949e; text-transform: uppercase; letter-spacing: .6px; margin-bottom: 4px; }
.repo-item { display: flex; align-items: center; gap: 8px; padding: 5px 8px; border-radius: 6px; cursor: pointer; transition: background .15s; }
.repo-item:hover { background: #21262d; }
.repo-item.active { background: #21262d; }
.repo-dot { width: 9px; height: 9px; border-radius: 50%; flex-shrink: 0; }
.repo-name { color: #c9d1d9; overflow: hidden; text-overflow: ellipsis; white-space: nowrap; font-size: 12px; }
input[type="text"] { width: 100%; padding: 6px 10px; border-radius: 6px; background: #0d1117; border: 1px solid #30363d; color: #c9d1d9; font-size: 12px; outline: none; }
input[type="text"]:focus { border-color: #58a6ff; }
.stat-row { display: flex; justify-content: space-between; font-size: 12px; }
.stat-val { font-weight: 600; color: #58a6ff; }
input[type="range"] { width: 100%; accent-color: #58a6ff; }
.range-label { font-size: 11px; color: #8b949e; }
#hint { padding: 10px 14px; font-size: 10px; color: #6e7681; border-top: 1px solid #30363d; line-height: 1.6; }
#graph-area { flex: 1; position: relative; }
#sigma-container { width: 100%; height: 100%; }
#tooltip { position: absolute; pointer-events: none; display: none; z-index: 100; background: #161b22; border: 1px solid #30363d; border-radius: 6px; padding: 8px 12px; font-size: 11px; color: #c9d1d9; max-width: 320px; word-break: break-all; line-height: 1.5; box-shadow: 0 4px 12px rgba(0,0,0,0.4); }
</style>
</head>
<body>
<div id="sidebar">
<div id="sidebar-inner">
<h1>File Dependencies</h1>
<div>
<div class="section-label">Repositories</div>
<div id="repo-list"></div>
</div>
<div>
<div class="section-label">Search</div>
<input type="text" id="search-input" placeholder="Filter by file path…">
</div>
<div>
<div class="section-label">Stats</div>
<div id="stats" style="display:flex;flex-direction:column;gap:3px;"></div>
</div>
<div>
<div class="section-label">Min edge weight</div>
<input type="range" id="weight-slider" min="1" max="__MAX_WEIGHT__" value="1">
<div class="range-label" id="weight-label">≥ 1 reference</div>
</div>
</div>
<div id="hint">Hover a node — highlights its repo group + direct calls<br>Scroll to zoom · drag to pan · click background to reset</div>
</div>
<div id="graph-area">
<div id="sigma-container"></div>
<div id="tooltip"></div>
</div>

<script src="https://cdn.jsdelivr.net/npm/graphology@0.25.4/dist/graphology.umd.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/sigma@2.4.0/build/sigma.min.js"></script>
<script>
// ── Utilities ──────────────────────────────────────────────────────────────────
function escHtml(s) { return String(s).replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;'); }
function hexRgb(h) { return [parseInt(h.slice(1,3),16),parseInt(h.slice(3,5),16),parseInt(h.slice(5,7),16)]; }
function withAlpha(h,a) { var c=hexRgb(h); return 'rgba('+c[0]+','+c[1]+','+c[2]+','+a+')'; }
function dimColor(c) { if(!c)return '#2d333b'; try{var r=hexRgb(c);return 'rgba('+r[0]+','+r[1]+','+r[2]+',0.15)';}catch(e){return '#2d333b';} }

// ── State ──────────────────────────────────────────────────────────────────────
var activeRepoId = null;
var currentMinWeight = 1;

// ── Data ───────────────────────────────────────────────────────────────────────
const FILES = __FILES__;
const EDGES_DATA = __EDGES__;
const REPOS = __REPOS__;
const MAX_WEIGHT = __MAX_WEIGHT__;

const PALETTE = ['#388bfd','#3fb950','#d29922','#f78166','#a371f7','#ffa657','#79c0ff','#ff7b72','#56d364','#db6d28'];
const repoIds = Object.keys(REPOS);
const repoColor = {};
repoIds.forEach(function(id,i){ repoColor[id] = PALETTE[i % PALETTE.length]; });

// ── Sidebar ────────────────────────────────────────────────────────────────────
const repoList = document.getElementById('repo-list');
repoIds.forEach(function(id) {
const repo = REPOS[id];
const el = document.createElement('div');
el.className = 'repo-item';
el.dataset.repoId = id;
el.innerHTML = '<div class="repo-dot" style="background:'+repoColor[id]+'"></div><span class="repo-name">'+escHtml(repo ? repo.name : id)+'</span>';
el.addEventListener('click', function() { toggleRepo(id, el); });
repoList.appendChild(el);
});

function toggleRepo(id, el) {
if (activeRepoId === id) {
activeRepoId = null;
document.querySelectorAll('.repo-item').forEach(function(r){r.classList.remove('active');});
restoreColors();
return;
}
document.querySelectorAll('.repo-item').forEach(function(r){r.classList.remove('active');});
el.classList.add('active');
activeRepoId = id;
isolateRepo(id);
}

// ── Build graphology graph ─────────────────────────────────────────────────────
const graph = new graphology.Graph({ multi: false, type: 'directed' });

// Layout: repos in a ring, files in sub-circles
const REPO_R = repoIds.length === 1 ? 0 : 650;
const repoCenters = {};
repoIds.forEach(function(id, i) {
var angle = (2 * Math.PI * i) / Math.max(repoIds.length, 1) - Math.PI / 2;
repoCenters[id] = { x: REPO_R * Math.cos(angle), y: REPO_R * Math.sin(angle) };
});

// Group files by repo, assign positions
var filesByRepo = {};
FILES.forEach(function(f) {
if (!filesByRepo[f.repoId]) filesByRepo[f.repoId] = [];
filesByRepo[f.repoId].push(f);
});

FILES.forEach(function(f) {
var center = repoCenters[f.repoId] || { x: 0, y: 0 };
var repoFiles = filesByRepo[f.repoId] || [];
var idx = repoFiles.indexOf(f);
var n = repoFiles.length;
var r = n <= 1 ? 0 : Math.max(80, Math.sqrt(n) * 22);
var angle = (2 * Math.PI * idx) / Math.max(n, 1);
graph.addNode(f.id, {
x: center.x + r * Math.cos(angle),
y: center.y + r * Math.sin(angle),
size: 5,
color: repoColor[f.repoId] || '#58a6ff',
label: '', // no label by default; shown only on hover
fullPath: f.id,
shortLabel: f.label,
repoId: f.repoId,
origColor: repoColor[f.repoId] || '#58a6ff',
});
});

const EDGE_DEFAULT = '#3d444d';
const EDGE_DIM = '#1c2128';

function addEdges() {
EDGES_DATA.forEach(function(e, i) {
if (e.weight < currentMinWeight) return;
if (!graph.hasNode(e.source) || !graph.hasNode(e.target)) return;
// Skip if an edge between this source→target already exists
// (can happen when the same file pair appears in multiple repo contexts)
if (graph.hasEdge(e.source, e.target)) return;
graph.addEdgeWithKey('e'+i, e.source, e.target, {
size: Math.min(3, 0.6 + (e.weight / MAX_WEIGHT) * 2.5),
color: EDGE_DEFAULT,
weight: e.weight,
kinds: e.kinds,
});
});
}
addEdges();

// ── Sigma ──────────────────────────────────────────────────────────────────────
const sigmaContainer = document.getElementById('sigma-container');
const renderer = new Sigma(graph, sigmaContainer, {
renderEdgeLabels: false,
defaultEdgeColor: '#3d444d',
defaultNodeColor: '#58a6ff',
labelColor: { color: '#c9d1d9' },
labelSize: 11,
labelRenderedSizeThreshold: 999,
minCameraRatio: 0.04,
maxCameraRatio: 14,
});

// ── Color helpers ──────────────────────────────────────────────────────────────
function restoreColors() {
graph.forEachNode(function(n,a){ graph.setNodeAttribute(n,'color',a.origColor); });
graph.forEachEdge(function(e){ graph.setEdgeAttribute(e,'color',EDGE_DEFAULT); });
}

// Highlight: same repo group + direct neighbours of hovered node.
// Edges are shown only when they are a direct connection to/from hovered.
function highlightNode(nodeKey) {
var attrs = graph.getNodeAttributes(nodeKey);
var hovRepoId = attrs.repoId;

// Build lit set: same-repo nodes
var lit = new Set();
graph.forEachNode(function(n,a){
if (a.repoId === hovRepoId) lit.add(n);
});
// Add direct neighbours (can be in other repos)
graph.forEachNeighbor(nodeKey, function(n){ lit.add(n); });

// Apply node dimming
graph.forEachNode(function(n,a){
graph.setNodeAttribute(n,'color', lit.has(n) ? a.origColor : dimColor(a.origColor));
});
// Highlight only edges directly connected to the hovered node
var accentColor = repoColor[hovRepoId] || '#58a6ff';
graph.forEachEdge(function(ek,a,src,tgt){
var direct = (src === nodeKey || tgt === nodeKey);
graph.setEdgeAttribute(ek,'color', direct ? accentColor : EDGE_DIM);
});
}

// Isolate a repo (sidebar click): highlight that repo's nodes + outbound edges
function isolateRepo(id) {
graph.forEachNode(function(n,a){
graph.setNodeAttribute(n,'color', a.repoId===id ? a.origColor : dimColor(a.origColor));
});
graph.forEachEdge(function(e,a,src,tgt){
var sa = graph.getNodeAttributes(src), ta = graph.getNodeAttributes(tgt);
graph.setEdgeAttribute(e,'color', (sa.repoId===id || ta.repoId===id) ? repoColor[id] : EDGE_DIM);
});
}

// ── Stats ──────────────────────────────────────────────────────────────────────
function updateStats() {
var edgeCount = 0;
graph.forEachEdge(function(){ edgeCount++; });
document.getElementById('stats').innerHTML =
'<div class="stat-row"><span>Files/dirs</span><span class="stat-val">'+graph.order+'</span></div>'+
'<div class="stat-row"><span>Edges</span><span class="stat-val">'+edgeCount+'</span></div>'+
'<div class="stat-row"><span>Repositories</span><span class="stat-val">'+repoIds.length+'</span></div>';
}
updateStats();

// ── Tooltip ────────────────────────────────────────────────────────────────────
const tooltip = document.getElementById('tooltip');

renderer.on('enterNode', function(e) {
var attrs = graph.getNodeAttributes(e.node);
var repoName = REPOS[attrs.repoId] ? REPOS[attrs.repoId].name : attrs.repoId;
tooltip.innerHTML =
'<strong>'+escHtml(attrs.fullPath || e.node)+'</strong>'+
'<br><span style="color:'+repoColor[attrs.repoId]+'">'+escHtml(repoName)+'</span>';
tooltip.style.left = (e.event.x + 14) + 'px';
tooltip.style.top = (e.event.y + 14) + 'px';
tooltip.style.display = 'block';
// Only highlight on hover when no sidebar repo is active
if (!activeRepoId) highlightNode(e.node);
});

renderer.on('leaveNode', function() {
tooltip.style.display = 'none';
if (!activeRepoId) restoreColors();
});

renderer.on('enterEdge', function(e) {
var a = graph.getEdgeAttributes(e.edge);
tooltip.innerHTML = '<strong>'+a.weight+'</strong> ref'+(a.weight!==1?'s':'')+
(a.kinds ? '<br><span style="color:#8b949e">'+escHtml(a.kinds)+'</span>' : '');
tooltip.style.display = 'block';
});
renderer.on('leaveEdge', function() { tooltip.style.display = 'none'; });

// ── Sidebar repo isolation ─────────────────────────────────────────────────────
function toggleRepo(id, el) {
if (activeRepoId === id) {
activeRepoId = null;
document.querySelectorAll('.repo-item').forEach(function(r){r.classList.remove('active');});
restoreColors();
return;
}
document.querySelectorAll('.repo-item').forEach(function(r){r.classList.remove('active');});
el.classList.add('active');
activeRepoId = id;
isolateRepo(id);
}

renderer.on('clickStage', function() {
activeRepoId = null;
document.querySelectorAll('.repo-item').forEach(function(r){r.classList.remove('active');});
restoreColors();
});

// ── Search ─────────────────────────────────────────────────────────────────────
document.getElementById('search-input').addEventListener('input', function(e) {
var q = e.target.value.trim().toLowerCase();
if (!q) { restoreColors(); return; }
activeRepoId = null;
document.querySelectorAll('.repo-item').forEach(function(r){r.classList.remove('active');});
var matched = new Set();
graph.forEachNode(function(n,a){ if((a.fullPath||'').toLowerCase().includes(q)) matched.add(n); });
graph.forEachNode(function(n,a){
graph.setNodeAttribute(n,'color', matched.has(n) ? a.origColor : dimColor(a.origColor));
});
graph.forEachEdge(function(e,a,src,tgt){
graph.setEdgeAttribute(e,'color', matched.has(src)&&matched.has(tgt) ? EDGE_DEFAULT : EDGE_DIM);
});
});

// ── Weight slider ──────────────────────────────────────────────────────────────
document.getElementById('weight-slider').addEventListener('input', function() {
currentMinWeight = parseInt(this.value);
document.getElementById('weight-label').textContent = '\u2265 '+currentMinWeight+' reference'+(currentMinWeight!==1?'s':'');
graph.forEachEdge(function(e){ graph.dropEdge(e); });
addEdges();
updateStats();
});
</script>
</body>
</html>
"#;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Split the renderer/template code out of this controller.

This file now mixes controller flow, directory aggregation, three exporters, and an embedded HTML app in one ~800-line module. Moving the renderers and template into dedicated modules will make this much easier to test and maintain.

As per coding guidelines, "Keep one logical concept per file. If a file grows beyond ~300 lines, split it".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 10 - 818,
The FileGraphController file is too large and mixes controller logic with
rendering/export templates; extract the HTML/JS template and the three renderer
functions into their own modules/files and keep the controller focused on
orchestration. Specifically, move format_html (including HTML_TEMPLATE),
format_dot, format_mermaid, and related helpers (dot_node_id, dot_escape,
dot_emit_node, mermaid_id, mermaid_escape, mermaid_emit_node, and any JS/CSS
string constants) into a new renderer module (e.g., file_graph_renderer) and
export functions like render_html, render_dot, render_mermaid; keep
aggregate_to_directories, file_dir, short_path, file_to_repo, files_by_repo,
consumer_map in the controller or utility module as appropriate; update
FileGraphController::graph to call the new renderer functions and import the
renderer module, and ensure tests and visibility (pub(crate)/pub) are adjusted
for the moved symbols.

Comment on lines +19 to +30
pub async fn graph(
&self,
repository: Option<Vec<String>>,
format: GraphFormat,
granularity: NodeGranularity,
min_weight: usize,
cluster: ClusterMode,
) -> Result<String> {
let use_case = self.container.file_graph_use_case();
let graph = use_case
.build_graph(repository.as_deref(), min_weight, true)
.await?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Thread the cross-repo switch through the controller.

FileRelationshipUseCase::build_graph already accepts include_cross_repo in src/application/use_cases/file_relationship.rs:38-48, but this controller hardcodes true and has no cross_repo parameter. The CLI cannot vary that behavior per invocation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 19 - 30,
The controller method graph currently hardcodes include_cross_repo=true when
calling FileRelationshipUseCase::build_graph; add a cross_repo boolean parameter
to the graph method signature (e.g., cross_repo: bool), thread that value
through to the call produced by container.file_graph_use_case() by replacing the
hardcoded true with cross_repo (use_case.build_graph(repository.as_deref(),
min_weight, cross_repo).await?), and update any callers (CLI/route bindings) to
supply the desired cross-repo flag so the controller no longer forces cross-repo
behavior.

Comment thread src/connector/api/controller/file_graph_controller.rs Outdated
Comment on lines +386 to +392
fn aggregate_to_directories(graph: FileGraph) -> FileGraph {
let mut edge_map: HashMap<(String, String, String, String), (usize, HashSet<String>)> =
HashMap::new();

for edge in &graph.edges {
let from = file_dir(&edge.from_file).to_string();
let to = file_dir(&edge.to_file).to_string();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Label the repository root explicitly in directory mode.

aggregate_to_directories represents root-level files as "", and the node emitters then feed that through short_path, so the root directory renders as a blank node label in DOT, Mermaid, and HTML. Normalize that sentinel to / or . before emitting nodes.

Also applies to: 446-466, 481-503

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 386 -
392, aggregate_to_directories currently represents repository root as the empty
string (""), which later becomes a blank node label; change the code to
normalize that sentinel to a visible root label (e.g., "/" or ".") wherever
directory path strings are produced or used: in the loop inside
aggregate_to_directories use file_dir(&edge.from_file) and
file_dir(&edge.to_file) but replace "" with "/" (or ".") before constructing
keys for edge_map and before any node emission; apply the same normalization in
the other affected blocks referenced (the similar loops/emitters around the
other two ranges) so that edge keys and node labels never use an empty-string
root.

- Replace circumference-only file layout with phyllotaxis (golden-angle
  spiral) so dots fill the entire cluster disc evenly
- Add a large labeled hub node per repo at the cluster centre; size
  scales with sqrt(file count)
- Hover on hub node highlights the full repo cluster + all cross-repo
  connections (files with no relation are dimmed)
- Hover on file node keeps previous behaviour (same-repo + direct calls)
- labelRenderedSizeThreshold: 12 — hub labels always visible, file dots
  (label='') remain silent
- Remove split command: delete split_controller, split_analysis use case
  and domain model, all wiring in router/container/cli/lib/main

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
src/connector/api/controller/file_graph_controller.rs (4)

469-493: ⚠️ Potential issue | 🟠 Major

Make DOT/Mermaid IDs lossless.

Normalizing every non-alphanumeric to _ merges distinct keys like src/a-b.rs and src/a/b.rs. After you switch to repo-scoped node keys, encode them losslessly here too instead of stripping information away.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 469 -
493, dot_node_id and mermaid_id currently collapse non-alphanumeric characters
to '_' which loses distinctions (e.g., src/a-b.rs vs src/a/b.rs); change both to
a lossless, reversible encoding (for example percent-encoding or base64
URL-safe) applied to the full path string so different paths map to unique IDs,
and keep dot_node_id still prefixing with "n_" to ensure it’s a valid DOT
identifier; update dot_emit_node (and any consumers like short_path usage) to
use the new encoding functions and ensure dot_escape still escapes label text
but not the identifier encoding.

32-39: ⚠️ Potential issue | 🟠 Major

Return the requested format even when the graph is empty.

This short-circuits to prose before format is checked, so --format json|dot|mermaid|html becomes inconsistent exactly when callers most need a predictable payload.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 32 - 39,
The early return for empty graphs (the graph.is_empty() branch that currently
does return Ok("No file dependency edges found...".to_string())) bypasses the
format handling; change it to route through the existing formatting logic
instead of returning plain prose: remove the hardcoded string return and call
the same formatter/serialization used for non-empty graphs (respecting the
format variable such as "json", "dot", "mermaid", "html") so callers always
receive the requested format even when the graph is empty; keep the human tip
(if desired) included as a field or comment inside the formatted output rather
than a raw string response.

130-135: ⚠️ Potential issue | 🔴 Critical

Escape graph JSON before injecting it into <script>.

__FILES__, __EDGES__, and __REPOS__ are interpolated verbatim into executable script content. A repository or file name containing </script> will terminate the block and run arbitrary HTML/JS when the report is opened.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 130 -
135, The HTML_TEMPLATE is currently embedding __FILES__, __EDGES__, and
__REPOS__ verbatim into a <script>, allowing a malicious name like "</script>"
to break out; fix by serializing these Rust structures with
serde_json::to_string (or to_string_pretty) to produce valid JSON text and then
escape any occurrences of "</script>" in the resulting JSON strings (e.g.,
replace "</script>" with "<\\/script>") before calling .replace on
HTML_TEMPLATE; update the code paths where HTML_TEMPLATE.replace("__FILES__",
...).replace("__EDGES__", ...).replace("__REPOS__", ...) are used to pass the
escaped JSON strings (referencing HTML_TEMPLATE, __FILES__, __EDGES__,
__REPOS__, and the variables producing
files_json/edges_json/repos_str/max_weight).

19-30: ⚠️ Potential issue | 🟠 Major

Honor the caller's cross-repo setting.

FileRelationshipUseCase::build_graph already accepts include_cross_repo, but this call hardcodes true. That forces cross-repo edges on every invocation and leaves the route/controller path unable to honor the advertised switch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 19 - 30,
The graph handler currently forces cross-repo edges by passing the literal true
into FileRelationshipUseCase::build_graph; change that call to pass the caller's
cross-repo flag instead (i.e., replace the hardcoded true with the incoming
cross-repo parameter or field provided to graph), so
build_graph(repository.as_deref(), min_weight, include_cross_repo). Locate the
graph method in file_graph_controller.rs and ensure the include_cross_repo
variable comes from the controller's input (add a parameter if needed) and is
forwarded to the use_case.build_graph call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/connector/api/controller/file_graph_controller.rs`:
- Around line 57-88: file_to_repo, files_by_repo and consumer_map treat file
paths as globally unique which causes collisions when different repos contain
the same file name; change these helpers to use a repo-scoped node key (e.g.
combine repo_id and file path into a single key like "{repo_id}:{file}" or a
(repo_id, file) tuple) instead of raw file path. Update file_to_repo to build
and return a map keyed by the repo-scoped node key (and keep repo_id as part of
the value if needed), update files_by_repo to group by repo using the repo
portion of the scoped key or construct keys that include repo_id, and update
consumer_map to insert and lookup consumers using the same repo-scoped node key;
apply the same change to the equivalent helper referenced around lines 103-122
so all renderers/use-sites use consistent repo-scoped node IDs (functions:
file_to_repo, files_by_repo, consumer_map and the duplicated helper at 103-122).
- Line 48: The HTML renderer currently ignores the cluster argument: update the
format_html function to accept and use the cluster parameter (remove the unused
`_cluster`), implement the clustering-dependent behavior in the HTML output
(e.g., filter/group nodes or add cluster-specific attributes/classes based on
the Cluster enum/string), and change all call sites that do
Self::format_html(&graph, cluster) (including the other occurrence noted) to
pass the cluster through. Ensure the function signature and any helper calls are
updated to propagate the cluster value instead of discarding it.
- Around line 767-773: The sidebar file count uses graph.order which includes
synthetic repo hub nodes, so change updateStats to exclude those hubs when
computing "Files/dirs": compute a fileCount by either subtracting repoIds.length
from graph.order (fileCount = graph.order - repoIds.length) or by iterating
graph.forEachNode and counting only nodes whose id does not start with "repo::"
(or that lack the repo-hub flag), then replace graph.order with that fileCount
in the DOM update; ensure you still use repoIds.length for the "Repositories"
stat.

---

Duplicate comments:
In `@src/connector/api/controller/file_graph_controller.rs`:
- Around line 469-493: dot_node_id and mermaid_id currently collapse
non-alphanumeric characters to '_' which loses distinctions (e.g., src/a-b.rs vs
src/a/b.rs); change both to a lossless, reversible encoding (for example
percent-encoding or base64 URL-safe) applied to the full path string so
different paths map to unique IDs, and keep dot_node_id still prefixing with
"n_" to ensure it’s a valid DOT identifier; update dot_emit_node (and any
consumers like short_path usage) to use the new encoding functions and ensure
dot_escape still escapes label text but not the identifier encoding.
- Around line 32-39: The early return for empty graphs (the graph.is_empty()
branch that currently does return Ok("No file dependency edges
found...".to_string())) bypasses the format handling; change it to route through
the existing formatting logic instead of returning plain prose: remove the
hardcoded string return and call the same formatter/serialization used for
non-empty graphs (respecting the format variable such as "json", "dot",
"mermaid", "html") so callers always receive the requested format even when the
graph is empty; keep the human tip (if desired) included as a field or comment
inside the formatted output rather than a raw string response.
- Around line 130-135: The HTML_TEMPLATE is currently embedding __FILES__,
__EDGES__, and __REPOS__ verbatim into a <script>, allowing a malicious name
like "</script>" to break out; fix by serializing these Rust structures with
serde_json::to_string (or to_string_pretty) to produce valid JSON text and then
escape any occurrences of "</script>" in the resulting JSON strings (e.g.,
replace "</script>" with "<\\/script>") before calling .replace on
HTML_TEMPLATE; update the code paths where HTML_TEMPLATE.replace("__FILES__",
...).replace("__EDGES__", ...).replace("__REPOS__", ...) are used to pass the
escaped JSON strings (referencing HTML_TEMPLATE, __FILES__, __EDGES__,
__REPOS__, and the variables producing
files_json/edges_json/repos_str/max_weight).
- Around line 19-30: The graph handler currently forces cross-repo edges by
passing the literal true into FileRelationshipUseCase::build_graph; change that
call to pass the caller's cross-repo flag instead (i.e., replace the hardcoded
true with the incoming cross-repo parameter or field provided to graph), so
build_graph(repository.as_deref(), min_weight, include_cross_repo). Locate the
graph method in file_graph_controller.rs and ensure the include_cross_repo
variable comes from the controller's input (add a parameter if needed) and is
forwarded to the use_case.build_graph call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d7f9b79c-28cf-49f6-8132-d7a28febce5a

📥 Commits

Reviewing files that changed from the base of the PR and between 8bf0181 and 3059b80.

📒 Files selected for processing (9)
  • src/application/use_cases/mod.rs
  • src/cli/mod.rs
  • src/connector/api/container.rs
  • src/connector/api/controller/file_graph_controller.rs
  • src/connector/api/controller/mod.rs
  • src/connector/api/router.rs
  • src/domain/models/mod.rs
  • src/lib.rs
  • src/main.rs
✅ Files skipped from review due to trivial changes (4)
  • src/application/use_cases/mod.rs
  • src/main.rs
  • src/domain/models/mod.rs
  • src/connector/api/controller/mod.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/connector/api/container.rs
  • src/cli/mod.rs
  • src/lib.rs

};

Ok(match format {
GraphFormat::Html => Self::format_html(&graph, cluster),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

--cluster currently does nothing for HTML output.

graph() forwards cluster, but format_html discards it via _cluster. Since HTML is the default format, codesearch graph --cluster directory|consumer silently renders the same view as --cluster none.

Also applies to: 92-92

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` at line 48, The HTML
renderer currently ignores the cluster argument: update the format_html function
to accept and use the cluster parameter (remove the unused `_cluster`),
implement the clustering-dependent behavior in the HTML output (e.g.,
filter/group nodes or add cluster-specific attributes/classes based on the
Cluster enum/string), and change all call sites that do
Self::format_html(&graph, cluster) (including the other occurrence noted) to
pass the cluster through. Ensure the function signature and any helper calls are
updated to propagate the cluster value instead of discarding it.

Comment on lines +57 to +88
fn file_to_repo<'g>(graph: &'g FileGraph) -> HashMap<&'g str, &'g str> {
let mut map: HashMap<&str, &str> = HashMap::new();
for edge in &graph.edges {
map.entry(edge.from_file.as_str())
.or_insert(edge.from_repo_id.as_str());
map.entry(edge.to_file.as_str())
.or_insert(edge.to_repo_id.as_str());
}
map
}

fn files_by_repo<'g>(graph: &'g FileGraph) -> BTreeMap<&'g str, BTreeSet<&'g str>> {
let file_repo = Self::file_to_repo(graph);
let mut map: BTreeMap<&str, BTreeSet<&str>> = BTreeMap::new();
for (file, repo) in &file_repo {
map.entry(repo).or_default().insert(file);
}
map
}

/// For each file, return the set of *other* repo_ids whose edges point to it.
fn consumer_map<'g>(graph: &'g FileGraph) -> HashMap<&'g str, BTreeSet<&'g str>> {
let mut map: HashMap<&str, BTreeSet<&str>> = HashMap::new();
for edge in &graph.edges {
if edge.from_repo_id != edge.to_repo_id {
map.entry(edge.to_file.as_str())
.or_default()
.insert(edge.from_repo_id.as_str());
}
}
map
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Use a repo-scoped node key instead of the raw file path.

These helpers treat from_file/to_file as globally unique, even though the model carries from_repo_id/to_repo_id separately. If two repositories both contain README.md or src/lib.rs, file_to_repo keeps the first repo via or_insert, consumer_map merges consumers, and the renderers reuse the same node ID for distinct files.

Also applies to: 103-122

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/connector/api/controller/file_graph_controller.rs` around lines 57 - 88,
file_to_repo, files_by_repo and consumer_map treat file paths as globally unique
which causes collisions when different repos contain the same file name; change
these helpers to use a repo-scoped node key (e.g. combine repo_id and file path
into a single key like "{repo_id}:{file}" or a (repo_id, file) tuple) instead of
raw file path. Update file_to_repo to build and return a map keyed by the
repo-scoped node key (and keep repo_id as part of the value if needed), update
files_by_repo to group by repo using the repo portion of the scoped key or
construct keys that include repo_id, and update consumer_map to insert and
lookup consumers using the same repo-scoped node key; apply the same change to
the equivalent helper referenced around lines 103-122 so all renderers/use-sites
use consistent repo-scoped node IDs (functions: file_to_repo, files_by_repo,
consumer_map and the duplicated helper at 103-122).

Comment thread src/connector/api/controller/file_graph_controller.rs Outdated
claude added 5 commits March 31, 2026 10:48
… slider

- Edges are invisible at rest (background color); appear only on hover
  as thin (size 1) lines in the hovered node/repo accent color
- Clicking any node (repo hub or file dot) opens a slide-in detail
  panel on the right:
  - Repo hub: "Used by" and "Uses" sections listing cross-repo repos
    and the specific files involved
  - File node: "Referenced by" and "References" file lists with repo
    color indicators
- Detail panel close (×) or clicking the canvas resets everything
- Sidebar repo clicks close the panel before toggling
- Removed min-edge-weight slider and MAX_WEIGHT placeholder

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
- Near-black void background (#06060a) with subtle indigo radial gradient
- Vibrant jewel-tone palette (indigo/emerald/pink/amber) that pops on dark bg
- Curved SVG edge overlay: edges drawn as quadratic bezier curves on hover
  instead of straight sigma lines; completely invisible at rest
- Aggressive dimming on non-highlighted nodes (0.07 alpha vs 0.12 before)
- Hover glow ring via sigma nodeReducer: hovered node gets highlighted:true
  + 1.25x size so sigma draws its built-in border ring
- hideEdgesOnMove:true for clean pan/zoom experience
- Sidebar and detail panel restyled: darker surfaces, softer borders,
  custom scrollbars, monospace font for file paths
- repo-dot gets CSS glow shadow matching its color
- Inter font loaded for cleaner UI typography
- Tooltip restyled: frosted-glass effect with backdrop-filter blur

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
  codesearch uses <from-repo> <to-repo>

Lists every file in <from> that references symbols defined in <to>,
grouped by the target file. Accepts repository names or IDs.

Example output:
  Files in 'api-service' that use files from 'core-lib':

    src/auth/token.rs
      ← api/handlers/login.rs
      ← api/handlers/refresh.rs
    src/db/connection.rs
      ← api/middleware/db.rs

  3 file(s) in 'api-service' depend on 2 file(s) in 'core-lib'.

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
Dead code removed:
- src/application/use_cases/split_analysis.rs (orphaned, not wired)
- src/domain/models/split_analysis.rs (orphaned, not wired)
- src/connector/api/controller/split_controller.rs (orphaned, not wired)

file_relationship.rs:
- repo_id filter now uses HashSet<&str> for O(1) lookup instead of
  O(N) slice contains() with per-element String allocation
- reference_kind stored as &'static str in the edge accumulator set
  instead of an allocated String; converted to String only at
  materialisation time

uses_controller.rs:
- unique_sources / unique_targets counted in the single output loop
  instead of two separate post-hoc HashSet passes over edges
- removed explicit &&Repository type annotations (inferred)
- name comparison uses eq_ignore_ascii_case instead of double to_lowercase

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
The `graph` command and all its rendering code is no longer needed.
The `uses` command (plain text CLI output) covers the use case.

Removed:
- src/connector/api/controller/file_graph_controller.rs (~1000 lines)
- Commands::Graph, GraphFormat, NodeGranularity, ClusterMode from CLI
- FileGraphController from router/container/mod
- FileEdge, FileGraph, FileGraphRepo re-exports from lib.rs
  (types are still defined in domain for use by the uses command)

https://claude.ai/code/session_016eVRs9KeLDtRw4Gc9XKW9a
@ArtemisMucaj ArtemisMucaj merged commit 67ff693 into main Apr 3, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants