Skip to content

baml grep and baml describe - agent-oriented semantic search tools#3347

Merged
imalsogreg merged 8 commits intocanaryfrom
greg/describe
Apr 10, 2026
Merged

baml grep and baml describe - agent-oriented semantic search tools#3347
imalsogreg merged 8 commits intocanaryfrom
greg/describe

Conversation

@imalsogreg
Copy link
Copy Markdown
Contributor

@imalsogreg imalsogreg commented Apr 9, 2026

Summary by CodeRabbit

  • New Features
    • Added describe command: rich symbol descriptions (definition, resolved type, docstring, deps, references), JSON output, budgeted/truncated rendering, history tracking, hint suppression, and custom project-root support.
    • Added grep command: semantic + text search with pattern matching, symbol-kind filtering, --symbols listing, --def/--refs modes, semantic→text fallback, grouped file results, and optional JSON output.
  • Tests
    • Added snapshot-based tests and a multi-file test harness covering describe/grep behaviors.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds CLI commands describe and grep to baml_cli; implements semantic/text search and structured symbol description APIs in baml_lsp2_actions (new describe/grep modules, types, and serde serialization); makes type_info_for_definition public; and updates Cargo manifests to add serde/JSON and related deps.

Changes

Cohort / File(s) Summary
Manifests / Dependencies
baml_language/crates/baml_cli/Cargo.toml, baml_language/crates/baml_lsp2_actions/Cargo.toml
baml_cli dependencies updated to include workspace baml_lsp2_actions, text-size, and serde_json; baml_lsp2_actions enables serde (with derive) and adds insta as a dev-dependency.
CLI surface & modules
baml_language/crates/baml_cli/src/lib.rs, baml_language/crates/baml_cli/src/commands.rs
Registered new modules and added Describe and Grep variants to the CLI enum; updated RuntimeCli::run to dispatch describe and grep.
CLI: describe implementation
baml_language/crates/baml_cli/src/describe_command.rs
New DescribeArgs command with run(); builds ProjectDatabase from files, renders symbol descriptions (JSON or formatted), supports budgeted truncation, history/hints, symbol listing, and exposes rendering helpers (render_description, shape_with_elision, truncate_body).
CLI: grep implementation
baml_language/crates/baml_cli/src/grep_command.rs
New GrepArgs command with run(); builds DB, supports kind filtering, semantic-first grep (uses describe), fallback text search, options for defs/refs, JSON serializers, and integration with describe renderer.
LSP actions: public API & modules
baml_language/crates/baml_lsp2_actions/src/lib.rs, .../type_info.rs
Added describe and grep modules and re-exports; changed type_info_for_definition visibility to pub fn.
LSP actions: describe logic
baml_language/crates/baml_lsp2_actions/src/describe.rs
New serde-serializable models (SymbolDescription, DepRef, RefSite) and describe(db, files, name) implementation: symbol resolution (top-level, member, locals), CST slicing, docstring extraction, resolved-type and dependency gathering, reference collection, and serde helpers.
LSP actions: grep logic
baml_language/crates/baml_lsp2_actions/src/grep.rs
New semantic/text grep with grep() and list_symbols(): returns GrepResult (semantic or text mode), computes TextMatch with optional MatchAnnotation, and builds outline/index for annotations.
Tests & testing harness
baml_language/crates/baml_lsp2_actions/src/{describe_tests.rs,grep_tests.rs,testing.rs}
Added multi-file test harness (ProjectTest/builder), snapshot tests for describe/grep, and test-formatting helpers (format_description, format_text_match).

Sequence Diagram

sequenceDiagram
    participant User as User
    participant CLI as baml_cli
    participant FS as File System
    participant DB as ProjectDatabase
    participant LSP as baml_lsp2_actions

    User->>CLI: invoke "describe" or "grep"
    CLI->>FS: discover .baml files (from --from)
    FS-->>CLI: file list
    CLI->>DB: initialize database & register files
    loop per file
        DB->>FS: read file contents
        FS-->>DB: file text
        DB->>DB: parse & index file
    end
    CLI->>LSP: call describe()/grep(files, opts)
    LSP->>DB: query symbols/types/usages
    DB-->>LSP: symbol/type/reference data
    LSP-->>CLI: descriptions or grep results (semantic or text)
    CLI->>CLI: format output (JSON/text, budget, hints)
    CLI->>User: print results
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐰
I hop through files and scent each name,
Describe brings shapes; grep plays the game.
I nibble types, gather clues and threads,
Leave JSON crumbs and helpful treads—
A rabbit’s joy in code and spreads.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: adding two new CLI subcommands (baml grep and baml describe) for semantic search, with clear focus on their agent-oriented use case.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch greg/describe

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 9, 2026

Deployment failed with the following error:

You must set up Two-Factor Authentication before accessing this team.

View Documentation: https://vercel.com/docs/two-factor-authentication

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (6)
baml_language/crates/baml_lsp2_actions/src/type_info.rs (1)

192-192: Narrow this helper back to crate visibility.

Making type_info_for_definition public leaks Definition<'_> into the external API of baml_lsp2_actions. The new caller shown in this PR is intra-crate, so pub(crate) keeps the boundary tighter without blocking describe.rs.

Suggested change
-pub fn type_info_for_definition(db: &dyn Db, def: Definition<'_>) -> TypeInfo {
+pub(crate) fn type_info_for_definition(db: &dyn Db, def: Definition<'_>) -> TypeInfo {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_lsp2_actions/src/type_info.rs` at line 192, The
function type_info_for_definition is unnecessarily exported; restrict its
visibility to the crate by changing its declaration from public to crate-private
(use pub(crate) fn type_info_for_definition(...)) so Definition<'_> is not
leaked from baml_lsp2_actions; update the signature in type_info.rs (function
name type_info_for_definition, parameters Db and Definition<'_>, return
TypeInfo) and ensure the intra-crate caller in describe.rs still calls it
successfully.
baml_language/crates/baml_cli/src/describe_command.rs (3)

51-52: Unused return value from set_project_root.

The return value is captured but unused (_project). Either use it or remove the binding if it's not needed.

-        let _project = db.set_project_root(&from);
+        db.set_project_root(&from);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/describe_command.rs` around lines 51 - 52,
The return value from db.set_project_root(&from) is being bound to _project but
never used; either remove the unused binding and call db.set_project_root(&from)
for its side effect, or keep and use the returned value (e.g., assign to project
and use it where needed). Update the call in the block that constructs
ProjectDatabase::new() and invokes db.set_project_root(&from) to either drop the
assignment or replace _project with a meaningful variable used later.

454-559: Consider adding unit tests for helper functions.

The pure helper functions (shape_with_elision, truncate_body, find_line_in_body, render_body_with_context, line_number_at_offset) are excellent candidates for unit tests. As per the coding guidelines: "Prefer writing Rust unit tests over integration tests where possible".

Example test cases for truncate_body:

  • Body fits within budget (no truncation)
  • Body exceeds budget with no annotated blocks
  • Body with //# annotated blocks that should be preserved
  • Edge cases with very small budgets
💚 Example unit test structure
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_truncate_body_fits_within_budget() {
        let body = vec!["fn foo() {", "  bar()", "}"];
        let result = truncate_body(&body, 10);
        assert_eq!(result, body.iter().map(|s| s.to_string()).collect::<Vec<_>>());
    }

    #[test]
    fn test_truncate_body_preserves_annotated_blocks() {
        let body = vec![
            "fn foo() {",
            "  line1()",
            "  //# important",
            "  critical_call()",
            "  line2()",
            "}",
        ];
        let result = truncate_body(&body, 5);
        assert!(result.iter().any(|l| l.contains("critical_call")));
    }

    #[test]
    fn test_shape_with_elision_adds_block() {
        assert_eq!(
            shape_with_elision("fn foo()", "fn foo() { body }"),
            "fn foo() { ... }"
        );
    }

    #[test]
    fn test_shape_with_elision_no_change() {
        assert_eq!(
            shape_with_elision("fn foo() {}", "fn foo() {}"),
            "fn foo() {}"
        );
    }
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/describe_command.rs` around lines 454 -
559, Add focused Rust unit tests for the pure helper functions to cover the
behaviors described: create tests for shape_with_elision (when the shape lacks a
block but full_body has one, and when no change is required), for truncate_body
(body fits within budget, body exceeds budget with no annotated blocks, body
with `//#` annotated blocks preserved, and edge cases with very small
available_lines), and for the other pure helpers find_line_in_body,
render_body_with_context, and line_number_at_offset (include normal and boundary
cases); put these tests in a #[cfg(test)] mod and use the exact function names
shape_with_elision, truncate_body, find_line_in_body, render_body_with_context,
and line_number_at_offset so they exercise expected outputs and edge conditions
from the implementation.

136-333: Large function could benefit from extraction.

render_description at ~200 lines handles many concerns: header, type, docstring, body, dependencies, references, and hints. While the sections are clearly marked with comments, consider extracting each section into helper functions (e.g., render_header, render_body, render_dependencies, etc.) for improved testability and readability.

This is optional given the clear section markers and the function's readability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/describe_command.rs` around lines 136 -
333, render_description is too large and should be split into focused helpers;
extract the header, type/docstring, body rendering, dependency rendering,
references rendering, and hints rendering into separate functions (e.g.,
render_header, render_type_and_doc, render_body, render_dependencies,
render_references, render_hints) and replace the corresponding large code blocks
with calls to these helpers. Each helper should accept the minimal data it needs
(for example, render_header(db, desc, project_root) returning lines_used or
updating a mutable lines_used; render_body(desc, budget, history, project_root)
returning lines consumed; render_dependencies(db, desc, project_root, history,
remaining_budget) etc.), preserve the current printing behavior and budgeting
logic, and keep the existing public signature of render_description unchanged so
callers are unaffected. Ensure you move related helper logic such as is_local
checks, shape/truncation decisions, and history filtering into the appropriate
helper, keep existing comment markers, and add small unit tests for each
extracted helper where feasible to confirm behavior parity.
baml_language/crates/baml_lsp2_actions/src/grep.rs (2)

138-143: Consider memory efficiency for large projects.

Inserting one entry per byte offset in the name span can consume significant memory for projects with many symbols. For a symbol name of length N, this creates N HashMap entries.

A more memory-efficient approach would be to store (file, start, end) ranges and use range-based lookup, though this would trade query speed for memory.

For a CLI tool this may be acceptable, but worth noting if performance issues arise with large codebases.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_lsp2_actions/src/grep.rs` around lines 138 - 143,
The current approach in grep.rs creates one HashMap entry per byte offset
between item.name_span.start() and end(), which is memory-inefficient for large
projects; change def_spans to store range entries instead of per-offset entries
(e.g., store tuples like (file, start, end) -> (item.name.clone(), item.kind) or
push a span struct into a Vec/HashMap keyed by file) and update any lookup logic
that previously expected (file, offset) keys to perform a range check against
item.name_span (or iterate the file's span list) when resolving a position;
update references to def_spans, item.name_span, and any code that
inserts/queries def_spans so lookups use start/end range comparison and keep
symbol_kinds usage unchanged.

186-201: Unnecessary string allocations when ignore_case is false.

When ignore_case is false, text.clone() (line 189) and line.to_string() (line 200) create unnecessary allocations since the original strings could be used directly.

♻️ Proposed fix using Cow to avoid allocations
+use std::borrow::Cow;
+
 fn text_search(db: &dyn Db, files: &[SourceFile], opts: &GrepOptions<'_>) -> Vec<TextMatch> {
-    let pattern = if opts.ignore_case {
-        opts.pattern.to_lowercase()
-    } else {
-        opts.pattern.to_string()
-    };
+    let pattern: Cow<'_, str> = if opts.ignore_case {
+        Cow::Owned(opts.pattern.to_lowercase())
+    } else {
+        Cow::Borrowed(opts.pattern)
+    };
 
     // Build an outline index for CST-based semantic annotation.
     let index = OutlineIndex::build(db, files);
 
     let mut matches = Vec::new();
 
     for &file in files {
         let text = file.text(db);
 
         // Pre-filter: skip files that don't contain the pattern.
-        let text_to_check = if opts.ignore_case {
-            text.to_lowercase()
+        let text_to_check: Cow<'_, str> = if opts.ignore_case {
+            Cow::Owned(text.to_lowercase())
         } else {
-            text.clone()
+            Cow::Borrowed(text)
         };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_lsp2_actions/src/grep.rs` around lines 186 - 201,
The code currently always allocates when building text_to_check and
line_to_check; change both to use std::borrow::Cow<str> so you borrow when
opts.ignore_case is false and only allocate when true: for the block creating
text_to_check replace text.clone() with Cow::Borrowed(text.as_str()) (and when
ignore_case true use Cow::Owned(text.to_lowercase())), and similarly in the line
loop replace line.to_string() with Cow::Borrowed(line) (or
Cow::Owned(line.to_lowercase()) when ignore_case), keeping the contains check
against pattern as before; update variable types from String to Cow<str> for
text_to_check and line_to_check so unnecessary allocations are avoided.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@baml_language/crates/baml_cli/src/grep_command.rs`:
- Around line 320-342: parse_kind_filter currently silently ignores unknown kind
strings by printing "Unknown kind: ..." and continuing; change it to fail fast
by returning an error (or exiting) when any unknown kind is encountered. Update
parse_kind_filter to signal failure (e.g., change signature to return
Result<Vec<DefinitionKind>, String> or have it call std::process::exit(1) in CLI
contexts), detect unknown values in the match arm and return Err with a clear
message like "Unknown kind: {other}" instead of printing and dropping the
filter, and then propagate that error to the caller so the CLI returns a
non-zero exit code; keep references to parse_kind_filter and DefinitionKind when
making the change.
- Around line 94-115: The JSON flag handling is missing in the symbols/refs
branches and result.descriptions is serialized before checking result.mode,
causing wrong or dropped json output; update the grep_command logic (referencing
list_symbols, relative_path, line_number_at_offset, and the branches that print
symbols/refs) to check self.json and emit proper JSON for symbols and refs
instead of plain println!, and move/adjust the serialization of
result.descriptions so it happens only when result.mode requires it (use
result.mode to decide formatting), ensuring the default text-search branch and
the refs branch produce the correct JSON shape when self.json is true.

In `@baml_language/crates/baml_lsp2_actions/src/describe.rs`:
- Around line 595-607: extract_docstring currently picks the first ancestor that
satisfies is_item_node, which makes member docstrings (fields/variants) resolve
to the enclosing item; update extract_docstring so it first searches the token's
parent_ancestors for member-level nodes (e.g., field, variant, or whatever AST
kinds represent members) and uses that node if found, and only if no member node
is found fall back to finding an enclosing item node (the existing is_item_node
check). Reference the function extract_docstring and ensure describe_member's
passed item_range will now match the member ancestor before the container item.
- Around line 321-333: The current describe logic only reads bindings from
index.scope_bindings[func_scope_idx], so locals declared in nested block or
lambda scopes are ignored; update the code that computes bindings for a function
(after computing func_scope_idx) to collect bindings from all scopes that belong
to that function by walking/ filtering index.scopes (e.g., find every scope
whose ancestor chain leads to func_scope_idx or whose range is contained in
func.span) and then merge their corresponding entries from index.scope_bindings
into bindings before proceeding; use the existing symbols func_scope_idx,
index.scopes, index.scope_bindings and func.span in the traversal so describe
reports nested let bindings too.
- Around line 750-756: The code handling TypeExpr::Path only inspects
segments.last() and uses that bare tail (name_str) for dependency lookup, which
loses qualification and misresolves qualified names; update the logic in the
TypeExpr::Path branch (where segments, seen, resolve_dep, and deps.push(dep) are
used) to construct the full path string (e.g., join all segments with the proper
separator or preserve their original qualified form) and use that full qualified
name when calling resolve_dep and when inserting into seen, falling back to the
tail-only lookup only if qualified lookup fails to preserve correct dependency
resolution.
- Around line 548-559: resolve_type_for_item currently only handles top-level
Definition variants and returns None for Field and Variant, causing
describe_member's resolved_type to be empty; update resolve_type_for_item to
also match Definition::Field and Definition::Variant (from the
baml_compiler2_tir::resolve::ResolvedName::Item/Builtin result) and derive their
concrete type info by either delegating to the existing type_info_for_definition
path or by calling the appropriate TypeInfo helper for fields/variants so the
function returns Some(String) for member symbols; ensure you use the same
symbol/definition identifiers (resolve_type_for_item, type_info_for_definition,
TypeInfo, SymbolInfo, describe_member) to locate and implement the missing
branches.

In `@baml_language/crates/baml_lsp2_actions/src/grep.rs`:
- Line 221: The current loop uses str::lines() and then does line_start_offset
+= line.len() + 1, which miscounts CRLF line endings; change the iteration to
use split_inclusive('\n') (or an equivalent that preserves the actual newline
bytes) so each yielded slice includes its newline(s) and you can advance by
line.len() without adding a fixed +1; update the code around line_start_offset
and the loop that computes offsets (referencing the loop that updates
line_start_offset in grep.rs and the use-site annotate_match) to remove the
manual +1 and rely on the inclusive-split slice lengths so CRLF vs LF are
handled correctly.

---

Nitpick comments:
In `@baml_language/crates/baml_cli/src/describe_command.rs`:
- Around line 51-52: The return value from db.set_project_root(&from) is being
bound to _project but never used; either remove the unused binding and call
db.set_project_root(&from) for its side effect, or keep and use the returned
value (e.g., assign to project and use it where needed). Update the call in the
block that constructs ProjectDatabase::new() and invokes
db.set_project_root(&from) to either drop the assignment or replace _project
with a meaningful variable used later.
- Around line 454-559: Add focused Rust unit tests for the pure helper functions
to cover the behaviors described: create tests for shape_with_elision (when the
shape lacks a block but full_body has one, and when no change is required), for
truncate_body (body fits within budget, body exceeds budget with no annotated
blocks, body with `//#` annotated blocks preserved, and edge cases with very
small available_lines), and for the other pure helpers find_line_in_body,
render_body_with_context, and line_number_at_offset (include normal and boundary
cases); put these tests in a #[cfg(test)] mod and use the exact function names
shape_with_elision, truncate_body, find_line_in_body, render_body_with_context,
and line_number_at_offset so they exercise expected outputs and edge conditions
from the implementation.
- Around line 136-333: render_description is too large and should be split into
focused helpers; extract the header, type/docstring, body rendering, dependency
rendering, references rendering, and hints rendering into separate functions
(e.g., render_header, render_type_and_doc, render_body, render_dependencies,
render_references, render_hints) and replace the corresponding large code blocks
with calls to these helpers. Each helper should accept the minimal data it needs
(for example, render_header(db, desc, project_root) returning lines_used or
updating a mutable lines_used; render_body(desc, budget, history, project_root)
returning lines consumed; render_dependencies(db, desc, project_root, history,
remaining_budget) etc.), preserve the current printing behavior and budgeting
logic, and keep the existing public signature of render_description unchanged so
callers are unaffected. Ensure you move related helper logic such as is_local
checks, shape/truncation decisions, and history filtering into the appropriate
helper, keep existing comment markers, and add small unit tests for each
extracted helper where feasible to confirm behavior parity.

In `@baml_language/crates/baml_lsp2_actions/src/grep.rs`:
- Around line 138-143: The current approach in grep.rs creates one HashMap entry
per byte offset between item.name_span.start() and end(), which is
memory-inefficient for large projects; change def_spans to store range entries
instead of per-offset entries (e.g., store tuples like (file, start, end) ->
(item.name.clone(), item.kind) or push a span struct into a Vec/HashMap keyed by
file) and update any lookup logic that previously expected (file, offset) keys
to perform a range check against item.name_span (or iterate the file's span
list) when resolving a position; update references to def_spans, item.name_span,
and any code that inserts/queries def_spans so lookups use start/end range
comparison and keep symbol_kinds usage unchanged.
- Around line 186-201: The code currently always allocates when building
text_to_check and line_to_check; change both to use std::borrow::Cow<str> so you
borrow when opts.ignore_case is false and only allocate when true: for the block
creating text_to_check replace text.clone() with Cow::Borrowed(text.as_str())
(and when ignore_case true use Cow::Owned(text.to_lowercase())), and similarly
in the line loop replace line.to_string() with Cow::Borrowed(line) (or
Cow::Owned(line.to_lowercase()) when ignore_case), keeping the contains check
against pattern as before; update variable types from String to Cow<str> for
text_to_check and line_to_check so unnecessary allocations are avoided.

In `@baml_language/crates/baml_lsp2_actions/src/type_info.rs`:
- Line 192: The function type_info_for_definition is unnecessarily exported;
restrict its visibility to the crate by changing its declaration from public to
crate-private (use pub(crate) fn type_info_for_definition(...)) so
Definition<'_> is not leaked from baml_lsp2_actions; update the signature in
type_info.rs (function name type_info_for_definition, parameters Db and
Definition<'_>, return TypeInfo) and ensure the intra-crate caller in
describe.rs still calls it successfully.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f09e756e-6b88-4d0b-952a-285a8449405e

📥 Commits

Reviewing files that changed from the base of the PR and between e817f17 and d69104f.

⛔ Files ignored due to path filters (1)
  • baml_language/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • baml_language/crates/baml_cli/Cargo.toml
  • baml_language/crates/baml_cli/src/commands.rs
  • baml_language/crates/baml_cli/src/describe_command.rs
  • baml_language/crates/baml_cli/src/grep_command.rs
  • baml_language/crates/baml_cli/src/lib.rs
  • baml_language/crates/baml_lsp2_actions/Cargo.toml
  • baml_language/crates/baml_lsp2_actions/src/describe.rs
  • baml_language/crates/baml_lsp2_actions/src/grep.rs
  • baml_language/crates/baml_lsp2_actions/src/lib.rs
  • baml_language/crates/baml_lsp2_actions/src/type_info.rs

Comment thread baml_language/crates/baml_cli/src/grep_command.rs
Comment thread baml_language/crates/baml_cli/src/grep_command.rs Outdated
Comment thread baml_language/crates/baml_lsp2_actions/src/describe.rs Outdated
Comment thread baml_language/crates/baml_lsp2_actions/src/describe.rs
Comment thread baml_language/crates/baml_lsp2_actions/src/describe.rs Outdated
Comment thread baml_language/crates/baml_lsp2_actions/src/describe.rs
Comment thread baml_language/crates/baml_lsp2_actions/src/grep.rs Outdated
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 9, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 15 untouched benchmarks
⏩ 105 skipped benchmarks1


Comparing greg/describe (f85adce) with canary (d124833)

Open in CodSpeed

Footnotes

  1. 105 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 9, 2026

Binary size checks passed

7 passed

Artifact Platform Gzip Baseline Delta Status
bridge_cffi Linux 5.8 MB 5.7 MB +121.6 KB (+2.1%) OK
bridge_cffi-stripped Linux 5.8 MB 5.7 MB +99.1 KB (+1.7%) OK
bridge_cffi macOS 4.8 MB 4.6 MB +143.6 KB (+3.1%) OK
bridge_cffi-stripped macOS 4.8 MB 4.7 MB +86.3 KB (+1.8%) OK
bridge_cffi Windows 4.8 MB 4.6 MB +137.0 KB (+3.0%) OK
bridge_cffi-stripped Windows 4.8 MB 4.7 MB +89.5 KB (+1.9%) OK
bridge_wasm WASM 3.1 MB 3.0 MB +117.2 KB (+4.0%) OK

Generated by cargo size-gate · workflow run

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
baml_language/crates/baml_lsp2_actions/src/describe.rs (1)

976-985: Minor typo in comment.

Line 982: "poluting" should be "polluting".

📝 Proposed fix
-    // This stub implementation exists so that we can serialize `SymbolDescription`,
-    // which contains a `SourceFile`, to json, without actually poluting it with the
-    // whole source file.
+    // This stub implementation exists so that we can serialize `SymbolDescription`,
+    // which contains a `SourceFile`, to json, without actually polluting it with the
+    // whole source file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_lsp2_actions/src/describe.rs` around lines 976 -
985, Update the comment inside the serialize_file function so the misspelled
word "poluting" is corrected to "polluting"; this comment sits above fn
serialize_file and explains why SourceFile (used in SymbolDescription) is
serialized as none, so just edit the comment text to read "polluting" instead.
baml_language/crates/baml_cli/src/grep_command.rs (1)

224-291: Consider documenting --kind behavior with text search fallback.

When the pattern doesn't match a known symbol and grep() falls back to text search (line 231), the --kind filter is silently ignored because text matches don't have symbol-kind metadata. Users specifying --kind might expect all results to be filtered.

Consider either:

  1. Documenting this in the --kind help text
  2. Warning when --kind is provided but text search is used

This is a minor UX concern, not a bug.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/grep_command.rs` around lines 224 - 291,
Grep falls back to text search and silently ignores the --kind filter
(kind_filter) when grep() returns GrepMode::TextSearch; detect when a kind
filter was provided (e.g., check self.kind or kind_filter) and, when result.mode
== GrepMode::TextSearch, emit a short warning to stderr (e.g., via eprintln!)
that "--kind is ignored when pattern falls back to text search" (include pattern
for context) before rendering text matches; this keeps existing behavior but
makes the UX explicit.
baml_language/crates/baml_cli/src/describe_command.rs (2)

469-558: Consider adding unit tests for truncation logic.

The truncate_body function has non-trivial logic for prioritizing header, annotated blocks, head/tail content, and skip markers. Unit tests would help verify edge cases like:

  • Body shorter than budget
  • No annotated blocks
  • Many annotated blocks exceeding budget
  • Single-line body

As per coding guidelines: prefer writing Rust unit tests over integration tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/describe_command.rs` around lines 469 -
558, Add unit tests for the truncate_body function to exercise its non-trivial
truncation rules: create a #[cfg(test)] mod with tests that call truncate_body
directly and assert on the returned Vec<String>; include cases for body shorter
than available_lines (should return full body), no annotated blocks (verify
balanced head/tail and skip marker placement), many annotated blocks exceeding
budget (annotated_ranges get fully included and other content is truncated),
single-line body (only header included), and skip-marker indentation/line-count
correctness (verify "... skipped N lines ..." appears with the same indentation
as the next included line). Use slices of &str to build inputs, vary
available_lines to hit edge conditions, and assert exact string sequences so the
behavior of truncate_body, annotated_ranges, head/tail allocation, and skip
markers is covered.

496-512: Edge case: annotated blocks may exceed budget.

If annotated_line_count is large (many //# blocks), remaining_for_content becomes 0 (line 505-508), but annotated ranges are still included unconditionally (lines 530-535). This could produce output exceeding the budget.

Consider either:

  1. Capping annotated blocks to fit within budget
  2. Documenting that //# blocks are always preserved regardless of budget

This is a minor edge case and the current behavior (preserving annotated content) may be intentional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/describe_command.rs` around lines 496 -
512, annotated_line_count can exceed the available budget and annotated_ranges
are included unconditionally later, so detect when annotated_line_count >
available_lines - header_reserve - skip_line_reserve and trim or cap
annotated_ranges to fit the budget before computing
head_lines_count/tail_lines_count; specifically, compute budget_for_annotated =
available_lines.saturating_sub(header_reserve + skip_line_reserve) and if
annotated_line_count > budget_for_annotated, produce a capped_annotated_ranges
(e.g., keep earliest and latest annotated blocks or drop least-important ones)
and use capped_annotated_ranges in place of annotated_ranges in the later logic
(the variables annotated_line_count, annotated_ranges, remaining_for_content,
head_lines_count, tail_lines_count are the key symbols to update) so the final
output cannot exceed the available_lines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@baml_language/crates/baml_cli/src/describe_command.rs`:
- Around line 469-558: Add unit tests for the truncate_body function to exercise
its non-trivial truncation rules: create a #[cfg(test)] mod with tests that call
truncate_body directly and assert on the returned Vec<String>; include cases for
body shorter than available_lines (should return full body), no annotated blocks
(verify balanced head/tail and skip marker placement), many annotated blocks
exceeding budget (annotated_ranges get fully included and other content is
truncated), single-line body (only header included), and skip-marker
indentation/line-count correctness (verify "... skipped N lines ..." appears
with the same indentation as the next included line). Use slices of &str to
build inputs, vary available_lines to hit edge conditions, and assert exact
string sequences so the behavior of truncate_body, annotated_ranges, head/tail
allocation, and skip markers is covered.
- Around line 496-512: annotated_line_count can exceed the available budget and
annotated_ranges are included unconditionally later, so detect when
annotated_line_count > available_lines - header_reserve - skip_line_reserve and
trim or cap annotated_ranges to fit the budget before computing
head_lines_count/tail_lines_count; specifically, compute budget_for_annotated =
available_lines.saturating_sub(header_reserve + skip_line_reserve) and if
annotated_line_count > budget_for_annotated, produce a capped_annotated_ranges
(e.g., keep earliest and latest annotated blocks or drop least-important ones)
and use capped_annotated_ranges in place of annotated_ranges in the later logic
(the variables annotated_line_count, annotated_ranges, remaining_for_content,
head_lines_count, tail_lines_count are the key symbols to update) so the final
output cannot exceed the available_lines.

In `@baml_language/crates/baml_cli/src/grep_command.rs`:
- Around line 224-291: Grep falls back to text search and silently ignores the
--kind filter (kind_filter) when grep() returns GrepMode::TextSearch; detect
when a kind filter was provided (e.g., check self.kind or kind_filter) and, when
result.mode == GrepMode::TextSearch, emit a short warning to stderr (e.g., via
eprintln!) that "--kind is ignored when pattern falls back to text search"
(include pattern for context) before rendering text matches; this keeps existing
behavior but makes the UX explicit.

In `@baml_language/crates/baml_lsp2_actions/src/describe.rs`:
- Around line 976-985: Update the comment inside the serialize_file function so
the misspelled word "poluting" is corrected to "polluting"; this comment sits
above fn serialize_file and explains why SourceFile (used in SymbolDescription)
is serialized as none, so just edit the comment text to read "polluting"
instead.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d61b8c15-a611-4bd9-b688-a48ec9ad4ec2

📥 Commits

Reviewing files that changed from the base of the PR and between d69104f and 40e71f2.

📒 Files selected for processing (3)
  • baml_language/crates/baml_cli/src/describe_command.rs
  • baml_language/crates/baml_cli/src/grep_command.rs
  • baml_language/crates/baml_lsp2_actions/src/describe.rs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
baml_language/crates/baml_lsp2_actions/src/describe.rs (1)

1-10: Consider adding unit tests for this module.

This module contains substantial logic for symbol description, docstring extraction, dependency collection, and text slicing — all of which would benefit from unit tests. Functions like line_at_offset, slice_text, and the serde helpers are particularly good candidates for isolated unit tests.

As per coding guidelines: "Prefer writing Rust unit tests over integration tests where possible" and "Always run cargo test --lib if you changed any Rust code."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_lsp2_actions/src/describe.rs` around lines 1 - 10,
Add unit tests for the describe module to cover core logic: write tests
targeting the describe() function for basic symbol descriptions and isolated
unit tests for line_at_offset, slice_text, and the serde helper functions (e.g.,
any serialize/deserialize helpers in this file) to validate edge cases
(start/end offsets, multi-line docstrings, empty inputs). Use small synthetic
source strings and minimal mocked contexts or construct the required inputs
directly to assert expected outputs (shape, full body, docstring extraction,
dependency collection, and reference slicing). Place tests in the module’s
#[cfg(test)] mod tests and run them with cargo test --lib to ensure they pass
and catch regressions. Ensure each test names the function under test
(line_at_offset, slice_text, describe) so failures point to the correct code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@baml_language/crates/baml_lsp2_actions/src/describe.rs`:
- Around line 991-994: Fix the typo in the comment above the stub serialization
(the block describing why we serialize `SymbolDescription` without including the
full `SourceFile`) by changing "poluting" to "polluting"; update the comment
near `s.serialize_none()` that references `SymbolDescription` and `SourceFile`
so it reads "...without actually polluting it with the whole source file."

---

Nitpick comments:
In `@baml_language/crates/baml_lsp2_actions/src/describe.rs`:
- Around line 1-10: Add unit tests for the describe module to cover core logic:
write tests targeting the describe() function for basic symbol descriptions and
isolated unit tests for line_at_offset, slice_text, and the serde helper
functions (e.g., any serialize/deserialize helpers in this file) to validate
edge cases (start/end offsets, multi-line docstrings, empty inputs). Use small
synthetic source strings and minimal mocked contexts or construct the required
inputs directly to assert expected outputs (shape, full body, docstring
extraction, dependency collection, and reference slicing). Place tests in the
module’s #[cfg(test)] mod tests and run them with cargo test --lib to ensure
they pass and catch regressions. Ensure each test names the function under test
(line_at_offset, slice_text, describe) so failures point to the correct code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9c39fa30-706b-4088-8126-f93d3dbee71c

📥 Commits

Reviewing files that changed from the base of the PR and between 40e71f2 and 3dce573.

📒 Files selected for processing (1)
  • baml_language/crates/baml_lsp2_actions/src/describe.rs

Comment thread baml_language/crates/baml_lsp2_actions/src/describe.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
baml_language/crates/baml_lsp2_actions/src/describe.rs (2)

797-803: ⚠️ Potential issue | 🟠 Major

Preserve qualified type paths during dependency lookup.

Using only segments.last() collapses qualified names to their tail segment, so foo.Point and bar.Point become indistinguishable and can resolve to the wrong dependency or none at all. Try the fully qualified path first, then fall back to the last segment only if that lookup fails.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_lsp2_actions/src/describe.rs` around lines 797 -
803, The code in TypeExpr::Path currently uses segments.last() (and
seen.insert(name_str) / resolve_dep) which loses qualification; change the
lookup to first join the full segments path (e.g., build the fully qualified
string from segments) and call resolve_dep(db, file, &full_path) and only if
that returns None fall back to the existing last-segment flow (using
segments.last() -> name_str and seen.insert(name_str.clone()) -> resolve_dep).
Ensure seen tracking uses the same string you pass to resolve_dep so you don't
deduplicate qualified vs unqualified names inconsistently.

652-654: ⚠️ Potential issue | 🟠 Major

Member docstrings still resolve to the enclosing item.

describe_member() passes a field/variant range, but this helper immediately climbs to the first item-level ancestor. A documented field or enum variant will therefore inherit the class/enum docstring instead of its own comment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_lsp2_actions/src/describe.rs` around lines 652 -
654, The code in describe_member() currently climbs to the first item-level
ancestor using token.parent_ancestors().find(|n| is_item_node(n.kind())) which
causes field/variant docstrings to resolve to the enclosing item; change the
ancestor search to first look for a member-level node (e.g., field or variant)
instead of immediately selecting an item. Update the lookup to find an ancestor
where the node kind matches member kinds (create or use a helper like
is_member_node that checks for field/variant) or whose text range matches the
token's field/variant range, and only fall back to is_item_node when no member
ancestor is found; adjust the code replacing the current
token.parent_ancestors().find(...) usage and the item_node binding accordingly.
🧹 Nitpick comments (1)
baml_language/crates/baml_lsp2_actions/src/describe_tests.rs (1)

52-105: Add regression coverage for member and local descriptions.

This suite only exercises top-level symbols. describe_member() and describe_locals() are still untested here, so regressions in field/variant docstrings and parameter/let-binding handling can slip through without any failing snapshot.

As per coding guidelines, "Prefer writing Rust unit tests over integration tests where possible".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_lsp2_actions/src/describe_tests.rs` around lines 52
- 105, Tests only cover top-level symbols; add unit tests that exercise
describe_member() and describe_locals() to prevent regressions in field/variant
docstrings and parameter/let-binding handling. Create tests similar to existing
ones: call make_project(), invoke project.describe_member(...) for a class/enum
member and project.describe_locals(...) for a function with parameters and local
bindings, assert expected lengths (e.g., assert_eq!/assert! as in other tests)
and use insta::assert_snapshot!(project.format_description(&...)) to record the
rendered descriptions; place them alongside the existing describe_* tests so
they run as unit tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@baml_language/crates/baml_lsp2_actions/src/describe.rs`:
- Around line 987-994: The custom serializer serialize_file currently returns
null which destroys SymbolDescription.file, DepRef.file, and RefSite.file for
JSON consumers; instead add a concrete serializable path/URI field (e.g.,
file_path or file_uri: String/Url) to each struct and mark the internal
SourceFile field with #[serde(skip)] so the in-memory SourceFile is not
serialized, then populate that path/URI when constructing the structs and update
or remove serialize_file to serialize the path/URI value (or have it serialize
SourceFile.path if you keep it) so JSON contains a usable location for
definitions/references; refer to serialize_file, SymbolDescription.file,
DepRef.file, RefSite.file and SourceFile when making these changes.
- Around line 358-420: The loop is emitting parameter definitions twice because
DefinitionSite::Parameter is treated as a binding; before computing type_str and
pushing the SymbolDescription into results, detect parameter sites and skip them
by adding an early continue when def_site matches
baml_compiler2_hir::semantic_index::DefinitionSite::Parameter(_). In other
words, check def_site (the variable used in the match), and if it's a Parameter,
do not construct the SymbolDescription or push to results (skip the push to
results), leaving parameters to the earlier sig.params pass.

---

Duplicate comments:
In `@baml_language/crates/baml_lsp2_actions/src/describe.rs`:
- Around line 797-803: The code in TypeExpr::Path currently uses segments.last()
(and seen.insert(name_str) / resolve_dep) which loses qualification; change the
lookup to first join the full segments path (e.g., build the fully qualified
string from segments) and call resolve_dep(db, file, &full_path) and only if
that returns None fall back to the existing last-segment flow (using
segments.last() -> name_str and seen.insert(name_str.clone()) -> resolve_dep).
Ensure seen tracking uses the same string you pass to resolve_dep so you don't
deduplicate qualified vs unqualified names inconsistently.
- Around line 652-654: The code in describe_member() currently climbs to the
first item-level ancestor using token.parent_ancestors().find(|n|
is_item_node(n.kind())) which causes field/variant docstrings to resolve to the
enclosing item; change the ancestor search to first look for a member-level node
(e.g., field or variant) instead of immediately selecting an item. Update the
lookup to find an ancestor where the node kind matches member kinds (create or
use a helper like is_member_node that checks for field/variant) or whose text
range matches the token's field/variant range, and only fall back to
is_item_node when no member ancestor is found; adjust the code replacing the
current token.parent_ancestors().find(...) usage and the item_node binding
accordingly.

---

Nitpick comments:
In `@baml_language/crates/baml_lsp2_actions/src/describe_tests.rs`:
- Around line 52-105: Tests only cover top-level symbols; add unit tests that
exercise describe_member() and describe_locals() to prevent regressions in
field/variant docstrings and parameter/let-binding handling. Create tests
similar to existing ones: call make_project(), invoke
project.describe_member(...) for a class/enum member and
project.describe_locals(...) for a function with parameters and local bindings,
assert expected lengths (e.g., assert_eq!/assert! as in other tests) and use
insta::assert_snapshot!(project.format_description(&...)) to record the rendered
descriptions; place them alongside the existing describe_* tests so they run as
unit tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0858e627-e093-4b02-bcf8-427ccee7253b

📥 Commits

Reviewing files that changed from the base of the PR and between 3dce573 and 76d82d5.

⛔ Files ignored due to path filters (11)
  • baml_language/Cargo.lock is excluded by !**/*.lock
  • baml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__describe_tests__describe_class.snap is excluded by !**/*.snap
  • baml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__describe_tests__describe_class_with_refs.snap is excluded by !**/*.snap
  • baml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__describe_tests__describe_enum.snap is excluded by !**/*.snap
  • baml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__describe_tests__describe_function.snap is excluded by !**/*.snap
  • baml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__describe_tests__describe_function_with_enum_param.snap is excluded by !**/*.snap
  • baml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__grep_tests__grep_case_insensitive_text_search.snap is excluded by !**/*.snap
  • baml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__grep_tests__grep_enum_symbol.snap is excluded by !**/*.snap
  • baml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__grep_tests__grep_semantic_result_snapshot.snap is excluded by !**/*.snap
  • baml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__grep_tests__grep_text_search_with_matches.snap is excluded by !**/*.snap
  • baml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__grep_tests__list_symbols_snapshot.snap is excluded by !**/*.snap
📒 Files selected for processing (6)
  • baml_language/crates/baml_lsp2_actions/Cargo.toml
  • baml_language/crates/baml_lsp2_actions/src/describe.rs
  • baml_language/crates/baml_lsp2_actions/src/describe_tests.rs
  • baml_language/crates/baml_lsp2_actions/src/grep_tests.rs
  • baml_language/crates/baml_lsp2_actions/src/lib.rs
  • baml_language/crates/baml_lsp2_actions/src/testing.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • baml_language/crates/baml_lsp2_actions/Cargo.toml

Comment thread baml_language/crates/baml_lsp2_actions/src/describe.rs
Comment thread baml_language/crates/baml_lsp2_actions/src/describe.rs Outdated
@imalsogreg imalsogreg force-pushed the greg/describe branch 3 times, most recently from 2df4036 to 1c4cce1 Compare April 9, 2026 21:35
@imalsogreg imalsogreg enabled auto-merge April 9, 2026 21:42
@imalsogreg imalsogreg force-pushed the greg/describe branch 3 times, most recently from 55849bd to dddb11e Compare April 10, 2026 17:57
@imalsogreg imalsogreg added this pull request to the merge queue Apr 10, 2026
Merged via the queue into canary with commit 661c525 Apr 10, 2026
39 of 41 checks passed
@imalsogreg imalsogreg deleted the greg/describe branch April 10, 2026 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant