baml grep and baml describe - agent-oriented semantic search tools#3347
baml grep and baml describe - agent-oriented semantic search tools#3347imalsogreg merged 8 commits intocanaryfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds CLI commands Changes
Sequence DiagramsequenceDiagram
participant User as User
participant CLI as baml_cli
participant FS as File System
participant DB as ProjectDatabase
participant LSP as baml_lsp2_actions
User->>CLI: invoke "describe" or "grep"
CLI->>FS: discover .baml files (from --from)
FS-->>CLI: file list
CLI->>DB: initialize database & register files
loop per file
DB->>FS: read file contents
FS-->>DB: file text
DB->>DB: parse & index file
end
CLI->>LSP: call describe()/grep(files, opts)
LSP->>DB: query symbols/types/usages
DB-->>LSP: symbol/type/reference data
LSP-->>CLI: descriptions or grep results (semantic or text)
CLI->>CLI: format output (JSON/text, budget, hints)
CLI->>User: print results
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Deployment failed with the following error: View Documentation: https://vercel.com/docs/two-factor-authentication |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (6)
baml_language/crates/baml_lsp2_actions/src/type_info.rs (1)
192-192: Narrow this helper back to crate visibility.Making
type_info_for_definitionpublic leaksDefinition<'_>into the external API ofbaml_lsp2_actions. The new caller shown in this PR is intra-crate, sopub(crate)keeps the boundary tighter without blockingdescribe.rs.Suggested change
-pub fn type_info_for_definition(db: &dyn Db, def: Definition<'_>) -> TypeInfo { +pub(crate) fn type_info_for_definition(db: &dyn Db, def: Definition<'_>) -> TypeInfo {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_lsp2_actions/src/type_info.rs` at line 192, The function type_info_for_definition is unnecessarily exported; restrict its visibility to the crate by changing its declaration from public to crate-private (use pub(crate) fn type_info_for_definition(...)) so Definition<'_> is not leaked from baml_lsp2_actions; update the signature in type_info.rs (function name type_info_for_definition, parameters Db and Definition<'_>, return TypeInfo) and ensure the intra-crate caller in describe.rs still calls it successfully.baml_language/crates/baml_cli/src/describe_command.rs (3)
51-52: Unused return value fromset_project_root.The return value is captured but unused (
_project). Either use it or remove the binding if it's not needed.- let _project = db.set_project_root(&from); + db.set_project_root(&from);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_cli/src/describe_command.rs` around lines 51 - 52, The return value from db.set_project_root(&from) is being bound to _project but never used; either remove the unused binding and call db.set_project_root(&from) for its side effect, or keep and use the returned value (e.g., assign to project and use it where needed). Update the call in the block that constructs ProjectDatabase::new() and invokes db.set_project_root(&from) to either drop the assignment or replace _project with a meaningful variable used later.
454-559: Consider adding unit tests for helper functions.The pure helper functions (
shape_with_elision,truncate_body,find_line_in_body,render_body_with_context,line_number_at_offset) are excellent candidates for unit tests. As per the coding guidelines: "Prefer writing Rust unit tests over integration tests where possible".Example test cases for
truncate_body:
- Body fits within budget (no truncation)
- Body exceeds budget with no annotated blocks
- Body with
//#annotated blocks that should be preserved- Edge cases with very small budgets
💚 Example unit test structure
#[cfg(test)] mod tests { use super::*; #[test] fn test_truncate_body_fits_within_budget() { let body = vec!["fn foo() {", " bar()", "}"]; let result = truncate_body(&body, 10); assert_eq!(result, body.iter().map(|s| s.to_string()).collect::<Vec<_>>()); } #[test] fn test_truncate_body_preserves_annotated_blocks() { let body = vec![ "fn foo() {", " line1()", " //# important", " critical_call()", " line2()", "}", ]; let result = truncate_body(&body, 5); assert!(result.iter().any(|l| l.contains("critical_call"))); } #[test] fn test_shape_with_elision_adds_block() { assert_eq!( shape_with_elision("fn foo()", "fn foo() { body }"), "fn foo() { ... }" ); } #[test] fn test_shape_with_elision_no_change() { assert_eq!( shape_with_elision("fn foo() {}", "fn foo() {}"), "fn foo() {}" ); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_cli/src/describe_command.rs` around lines 454 - 559, Add focused Rust unit tests for the pure helper functions to cover the behaviors described: create tests for shape_with_elision (when the shape lacks a block but full_body has one, and when no change is required), for truncate_body (body fits within budget, body exceeds budget with no annotated blocks, body with `//#` annotated blocks preserved, and edge cases with very small available_lines), and for the other pure helpers find_line_in_body, render_body_with_context, and line_number_at_offset (include normal and boundary cases); put these tests in a #[cfg(test)] mod and use the exact function names shape_with_elision, truncate_body, find_line_in_body, render_body_with_context, and line_number_at_offset so they exercise expected outputs and edge conditions from the implementation.
136-333: Large function could benefit from extraction.
render_descriptionat ~200 lines handles many concerns: header, type, docstring, body, dependencies, references, and hints. While the sections are clearly marked with comments, consider extracting each section into helper functions (e.g.,render_header,render_body,render_dependencies, etc.) for improved testability and readability.This is optional given the clear section markers and the function's readability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_cli/src/describe_command.rs` around lines 136 - 333, render_description is too large and should be split into focused helpers; extract the header, type/docstring, body rendering, dependency rendering, references rendering, and hints rendering into separate functions (e.g., render_header, render_type_and_doc, render_body, render_dependencies, render_references, render_hints) and replace the corresponding large code blocks with calls to these helpers. Each helper should accept the minimal data it needs (for example, render_header(db, desc, project_root) returning lines_used or updating a mutable lines_used; render_body(desc, budget, history, project_root) returning lines consumed; render_dependencies(db, desc, project_root, history, remaining_budget) etc.), preserve the current printing behavior and budgeting logic, and keep the existing public signature of render_description unchanged so callers are unaffected. Ensure you move related helper logic such as is_local checks, shape/truncation decisions, and history filtering into the appropriate helper, keep existing comment markers, and add small unit tests for each extracted helper where feasible to confirm behavior parity.baml_language/crates/baml_lsp2_actions/src/grep.rs (2)
138-143: Consider memory efficiency for large projects.Inserting one entry per byte offset in the name span can consume significant memory for projects with many symbols. For a symbol name of length N, this creates N HashMap entries.
A more memory-efficient approach would be to store
(file, start, end)ranges and use range-based lookup, though this would trade query speed for memory.For a CLI tool this may be acceptable, but worth noting if performance issues arise with large codebases.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_lsp2_actions/src/grep.rs` around lines 138 - 143, The current approach in grep.rs creates one HashMap entry per byte offset between item.name_span.start() and end(), which is memory-inefficient for large projects; change def_spans to store range entries instead of per-offset entries (e.g., store tuples like (file, start, end) -> (item.name.clone(), item.kind) or push a span struct into a Vec/HashMap keyed by file) and update any lookup logic that previously expected (file, offset) keys to perform a range check against item.name_span (or iterate the file's span list) when resolving a position; update references to def_spans, item.name_span, and any code that inserts/queries def_spans so lookups use start/end range comparison and keep symbol_kinds usage unchanged.
186-201: Unnecessary string allocations whenignore_caseis false.When
ignore_caseis false,text.clone()(line 189) andline.to_string()(line 200) create unnecessary allocations since the original strings could be used directly.♻️ Proposed fix using Cow to avoid allocations
+use std::borrow::Cow; + fn text_search(db: &dyn Db, files: &[SourceFile], opts: &GrepOptions<'_>) -> Vec<TextMatch> { - let pattern = if opts.ignore_case { - opts.pattern.to_lowercase() - } else { - opts.pattern.to_string() - }; + let pattern: Cow<'_, str> = if opts.ignore_case { + Cow::Owned(opts.pattern.to_lowercase()) + } else { + Cow::Borrowed(opts.pattern) + }; // Build an outline index for CST-based semantic annotation. let index = OutlineIndex::build(db, files); let mut matches = Vec::new(); for &file in files { let text = file.text(db); // Pre-filter: skip files that don't contain the pattern. - let text_to_check = if opts.ignore_case { - text.to_lowercase() + let text_to_check: Cow<'_, str> = if opts.ignore_case { + Cow::Owned(text.to_lowercase()) } else { - text.clone() + Cow::Borrowed(text) };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_lsp2_actions/src/grep.rs` around lines 186 - 201, The code currently always allocates when building text_to_check and line_to_check; change both to use std::borrow::Cow<str> so you borrow when opts.ignore_case is false and only allocate when true: for the block creating text_to_check replace text.clone() with Cow::Borrowed(text.as_str()) (and when ignore_case true use Cow::Owned(text.to_lowercase())), and similarly in the line loop replace line.to_string() with Cow::Borrowed(line) (or Cow::Owned(line.to_lowercase()) when ignore_case), keeping the contains check against pattern as before; update variable types from String to Cow<str> for text_to_check and line_to_check so unnecessary allocations are avoided.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@baml_language/crates/baml_cli/src/grep_command.rs`:
- Around line 320-342: parse_kind_filter currently silently ignores unknown kind
strings by printing "Unknown kind: ..." and continuing; change it to fail fast
by returning an error (or exiting) when any unknown kind is encountered. Update
parse_kind_filter to signal failure (e.g., change signature to return
Result<Vec<DefinitionKind>, String> or have it call std::process::exit(1) in CLI
contexts), detect unknown values in the match arm and return Err with a clear
message like "Unknown kind: {other}" instead of printing and dropping the
filter, and then propagate that error to the caller so the CLI returns a
non-zero exit code; keep references to parse_kind_filter and DefinitionKind when
making the change.
- Around line 94-115: The JSON flag handling is missing in the symbols/refs
branches and result.descriptions is serialized before checking result.mode,
causing wrong or dropped json output; update the grep_command logic (referencing
list_symbols, relative_path, line_number_at_offset, and the branches that print
symbols/refs) to check self.json and emit proper JSON for symbols and refs
instead of plain println!, and move/adjust the serialization of
result.descriptions so it happens only when result.mode requires it (use
result.mode to decide formatting), ensuring the default text-search branch and
the refs branch produce the correct JSON shape when self.json is true.
In `@baml_language/crates/baml_lsp2_actions/src/describe.rs`:
- Around line 595-607: extract_docstring currently picks the first ancestor that
satisfies is_item_node, which makes member docstrings (fields/variants) resolve
to the enclosing item; update extract_docstring so it first searches the token's
parent_ancestors for member-level nodes (e.g., field, variant, or whatever AST
kinds represent members) and uses that node if found, and only if no member node
is found fall back to finding an enclosing item node (the existing is_item_node
check). Reference the function extract_docstring and ensure describe_member's
passed item_range will now match the member ancestor before the container item.
- Around line 321-333: The current describe logic only reads bindings from
index.scope_bindings[func_scope_idx], so locals declared in nested block or
lambda scopes are ignored; update the code that computes bindings for a function
(after computing func_scope_idx) to collect bindings from all scopes that belong
to that function by walking/ filtering index.scopes (e.g., find every scope
whose ancestor chain leads to func_scope_idx or whose range is contained in
func.span) and then merge their corresponding entries from index.scope_bindings
into bindings before proceeding; use the existing symbols func_scope_idx,
index.scopes, index.scope_bindings and func.span in the traversal so describe
reports nested let bindings too.
- Around line 750-756: The code handling TypeExpr::Path only inspects
segments.last() and uses that bare tail (name_str) for dependency lookup, which
loses qualification and misresolves qualified names; update the logic in the
TypeExpr::Path branch (where segments, seen, resolve_dep, and deps.push(dep) are
used) to construct the full path string (e.g., join all segments with the proper
separator or preserve their original qualified form) and use that full qualified
name when calling resolve_dep and when inserting into seen, falling back to the
tail-only lookup only if qualified lookup fails to preserve correct dependency
resolution.
- Around line 548-559: resolve_type_for_item currently only handles top-level
Definition variants and returns None for Field and Variant, causing
describe_member's resolved_type to be empty; update resolve_type_for_item to
also match Definition::Field and Definition::Variant (from the
baml_compiler2_tir::resolve::ResolvedName::Item/Builtin result) and derive their
concrete type info by either delegating to the existing type_info_for_definition
path or by calling the appropriate TypeInfo helper for fields/variants so the
function returns Some(String) for member symbols; ensure you use the same
symbol/definition identifiers (resolve_type_for_item, type_info_for_definition,
TypeInfo, SymbolInfo, describe_member) to locate and implement the missing
branches.
In `@baml_language/crates/baml_lsp2_actions/src/grep.rs`:
- Line 221: The current loop uses str::lines() and then does line_start_offset
+= line.len() + 1, which miscounts CRLF line endings; change the iteration to
use split_inclusive('\n') (or an equivalent that preserves the actual newline
bytes) so each yielded slice includes its newline(s) and you can advance by
line.len() without adding a fixed +1; update the code around line_start_offset
and the loop that computes offsets (referencing the loop that updates
line_start_offset in grep.rs and the use-site annotate_match) to remove the
manual +1 and rely on the inclusive-split slice lengths so CRLF vs LF are
handled correctly.
---
Nitpick comments:
In `@baml_language/crates/baml_cli/src/describe_command.rs`:
- Around line 51-52: The return value from db.set_project_root(&from) is being
bound to _project but never used; either remove the unused binding and call
db.set_project_root(&from) for its side effect, or keep and use the returned
value (e.g., assign to project and use it where needed). Update the call in the
block that constructs ProjectDatabase::new() and invokes
db.set_project_root(&from) to either drop the assignment or replace _project
with a meaningful variable used later.
- Around line 454-559: Add focused Rust unit tests for the pure helper functions
to cover the behaviors described: create tests for shape_with_elision (when the
shape lacks a block but full_body has one, and when no change is required), for
truncate_body (body fits within budget, body exceeds budget with no annotated
blocks, body with `//#` annotated blocks preserved, and edge cases with very
small available_lines), and for the other pure helpers find_line_in_body,
render_body_with_context, and line_number_at_offset (include normal and boundary
cases); put these tests in a #[cfg(test)] mod and use the exact function names
shape_with_elision, truncate_body, find_line_in_body, render_body_with_context,
and line_number_at_offset so they exercise expected outputs and edge conditions
from the implementation.
- Around line 136-333: render_description is too large and should be split into
focused helpers; extract the header, type/docstring, body rendering, dependency
rendering, references rendering, and hints rendering into separate functions
(e.g., render_header, render_type_and_doc, render_body, render_dependencies,
render_references, render_hints) and replace the corresponding large code blocks
with calls to these helpers. Each helper should accept the minimal data it needs
(for example, render_header(db, desc, project_root) returning lines_used or
updating a mutable lines_used; render_body(desc, budget, history, project_root)
returning lines consumed; render_dependencies(db, desc, project_root, history,
remaining_budget) etc.), preserve the current printing behavior and budgeting
logic, and keep the existing public signature of render_description unchanged so
callers are unaffected. Ensure you move related helper logic such as is_local
checks, shape/truncation decisions, and history filtering into the appropriate
helper, keep existing comment markers, and add small unit tests for each
extracted helper where feasible to confirm behavior parity.
In `@baml_language/crates/baml_lsp2_actions/src/grep.rs`:
- Around line 138-143: The current approach in grep.rs creates one HashMap entry
per byte offset between item.name_span.start() and end(), which is
memory-inefficient for large projects; change def_spans to store range entries
instead of per-offset entries (e.g., store tuples like (file, start, end) ->
(item.name.clone(), item.kind) or push a span struct into a Vec/HashMap keyed by
file) and update any lookup logic that previously expected (file, offset) keys
to perform a range check against item.name_span (or iterate the file's span
list) when resolving a position; update references to def_spans, item.name_span,
and any code that inserts/queries def_spans so lookups use start/end range
comparison and keep symbol_kinds usage unchanged.
- Around line 186-201: The code currently always allocates when building
text_to_check and line_to_check; change both to use std::borrow::Cow<str> so you
borrow when opts.ignore_case is false and only allocate when true: for the block
creating text_to_check replace text.clone() with Cow::Borrowed(text.as_str())
(and when ignore_case true use Cow::Owned(text.to_lowercase())), and similarly
in the line loop replace line.to_string() with Cow::Borrowed(line) (or
Cow::Owned(line.to_lowercase()) when ignore_case), keeping the contains check
against pattern as before; update variable types from String to Cow<str> for
text_to_check and line_to_check so unnecessary allocations are avoided.
In `@baml_language/crates/baml_lsp2_actions/src/type_info.rs`:
- Line 192: The function type_info_for_definition is unnecessarily exported;
restrict its visibility to the crate by changing its declaration from public to
crate-private (use pub(crate) fn type_info_for_definition(...)) so
Definition<'_> is not leaked from baml_lsp2_actions; update the signature in
type_info.rs (function name type_info_for_definition, parameters Db and
Definition<'_>, return TypeInfo) and ensure the intra-crate caller in
describe.rs still calls it successfully.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f09e756e-6b88-4d0b-952a-285a8449405e
⛔ Files ignored due to path filters (1)
baml_language/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
baml_language/crates/baml_cli/Cargo.tomlbaml_language/crates/baml_cli/src/commands.rsbaml_language/crates/baml_cli/src/describe_command.rsbaml_language/crates/baml_cli/src/grep_command.rsbaml_language/crates/baml_cli/src/lib.rsbaml_language/crates/baml_lsp2_actions/Cargo.tomlbaml_language/crates/baml_lsp2_actions/src/describe.rsbaml_language/crates/baml_lsp2_actions/src/grep.rsbaml_language/crates/baml_lsp2_actions/src/lib.rsbaml_language/crates/baml_lsp2_actions/src/type_info.rs
Merging this PR will not alter performance
|
Binary size checks passed✅ 7 passed
Generated by |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
baml_language/crates/baml_lsp2_actions/src/describe.rs (1)
976-985: Minor typo in comment.Line 982: "poluting" should be "polluting".
📝 Proposed fix
- // This stub implementation exists so that we can serialize `SymbolDescription`, - // which contains a `SourceFile`, to json, without actually poluting it with the - // whole source file. + // This stub implementation exists so that we can serialize `SymbolDescription`, + // which contains a `SourceFile`, to json, without actually polluting it with the + // whole source file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_lsp2_actions/src/describe.rs` around lines 976 - 985, Update the comment inside the serialize_file function so the misspelled word "poluting" is corrected to "polluting"; this comment sits above fn serialize_file and explains why SourceFile (used in SymbolDescription) is serialized as none, so just edit the comment text to read "polluting" instead.baml_language/crates/baml_cli/src/grep_command.rs (1)
224-291: Consider documenting--kindbehavior with text search fallback.When the pattern doesn't match a known symbol and
grep()falls back to text search (line 231), the--kindfilter is silently ignored because text matches don't have symbol-kind metadata. Users specifying--kindmight expect all results to be filtered.Consider either:
- Documenting this in the
--kindhelp text- Warning when
--kindis provided but text search is usedThis is a minor UX concern, not a bug.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_cli/src/grep_command.rs` around lines 224 - 291, Grep falls back to text search and silently ignores the --kind filter (kind_filter) when grep() returns GrepMode::TextSearch; detect when a kind filter was provided (e.g., check self.kind or kind_filter) and, when result.mode == GrepMode::TextSearch, emit a short warning to stderr (e.g., via eprintln!) that "--kind is ignored when pattern falls back to text search" (include pattern for context) before rendering text matches; this keeps existing behavior but makes the UX explicit.baml_language/crates/baml_cli/src/describe_command.rs (2)
469-558: Consider adding unit tests for truncation logic.The
truncate_bodyfunction has non-trivial logic for prioritizing header, annotated blocks, head/tail content, and skip markers. Unit tests would help verify edge cases like:
- Body shorter than budget
- No annotated blocks
- Many annotated blocks exceeding budget
- Single-line body
As per coding guidelines: prefer writing Rust unit tests over integration tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_cli/src/describe_command.rs` around lines 469 - 558, Add unit tests for the truncate_body function to exercise its non-trivial truncation rules: create a #[cfg(test)] mod with tests that call truncate_body directly and assert on the returned Vec<String>; include cases for body shorter than available_lines (should return full body), no annotated blocks (verify balanced head/tail and skip marker placement), many annotated blocks exceeding budget (annotated_ranges get fully included and other content is truncated), single-line body (only header included), and skip-marker indentation/line-count correctness (verify "... skipped N lines ..." appears with the same indentation as the next included line). Use slices of &str to build inputs, vary available_lines to hit edge conditions, and assert exact string sequences so the behavior of truncate_body, annotated_ranges, head/tail allocation, and skip markers is covered.
496-512: Edge case: annotated blocks may exceed budget.If
annotated_line_countis large (many//#blocks),remaining_for_contentbecomes 0 (line 505-508), but annotated ranges are still included unconditionally (lines 530-535). This could produce output exceeding the budget.Consider either:
- Capping annotated blocks to fit within budget
- Documenting that
//#blocks are always preserved regardless of budgetThis is a minor edge case and the current behavior (preserving annotated content) may be intentional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_cli/src/describe_command.rs` around lines 496 - 512, annotated_line_count can exceed the available budget and annotated_ranges are included unconditionally later, so detect when annotated_line_count > available_lines - header_reserve - skip_line_reserve and trim or cap annotated_ranges to fit the budget before computing head_lines_count/tail_lines_count; specifically, compute budget_for_annotated = available_lines.saturating_sub(header_reserve + skip_line_reserve) and if annotated_line_count > budget_for_annotated, produce a capped_annotated_ranges (e.g., keep earliest and latest annotated blocks or drop least-important ones) and use capped_annotated_ranges in place of annotated_ranges in the later logic (the variables annotated_line_count, annotated_ranges, remaining_for_content, head_lines_count, tail_lines_count are the key symbols to update) so the final output cannot exceed the available_lines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@baml_language/crates/baml_cli/src/describe_command.rs`:
- Around line 469-558: Add unit tests for the truncate_body function to exercise
its non-trivial truncation rules: create a #[cfg(test)] mod with tests that call
truncate_body directly and assert on the returned Vec<String>; include cases for
body shorter than available_lines (should return full body), no annotated blocks
(verify balanced head/tail and skip marker placement), many annotated blocks
exceeding budget (annotated_ranges get fully included and other content is
truncated), single-line body (only header included), and skip-marker
indentation/line-count correctness (verify "... skipped N lines ..." appears
with the same indentation as the next included line). Use slices of &str to
build inputs, vary available_lines to hit edge conditions, and assert exact
string sequences so the behavior of truncate_body, annotated_ranges, head/tail
allocation, and skip markers is covered.
- Around line 496-512: annotated_line_count can exceed the available budget and
annotated_ranges are included unconditionally later, so detect when
annotated_line_count > available_lines - header_reserve - skip_line_reserve and
trim or cap annotated_ranges to fit the budget before computing
head_lines_count/tail_lines_count; specifically, compute budget_for_annotated =
available_lines.saturating_sub(header_reserve + skip_line_reserve) and if
annotated_line_count > budget_for_annotated, produce a capped_annotated_ranges
(e.g., keep earliest and latest annotated blocks or drop least-important ones)
and use capped_annotated_ranges in place of annotated_ranges in the later logic
(the variables annotated_line_count, annotated_ranges, remaining_for_content,
head_lines_count, tail_lines_count are the key symbols to update) so the final
output cannot exceed the available_lines.
In `@baml_language/crates/baml_cli/src/grep_command.rs`:
- Around line 224-291: Grep falls back to text search and silently ignores the
--kind filter (kind_filter) when grep() returns GrepMode::TextSearch; detect
when a kind filter was provided (e.g., check self.kind or kind_filter) and, when
result.mode == GrepMode::TextSearch, emit a short warning to stderr (e.g., via
eprintln!) that "--kind is ignored when pattern falls back to text search"
(include pattern for context) before rendering text matches; this keeps existing
behavior but makes the UX explicit.
In `@baml_language/crates/baml_lsp2_actions/src/describe.rs`:
- Around line 976-985: Update the comment inside the serialize_file function so
the misspelled word "poluting" is corrected to "polluting"; this comment sits
above fn serialize_file and explains why SourceFile (used in SymbolDescription)
is serialized as none, so just edit the comment text to read "polluting"
instead.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: d61b8c15-a611-4bd9-b688-a48ec9ad4ec2
📒 Files selected for processing (3)
baml_language/crates/baml_cli/src/describe_command.rsbaml_language/crates/baml_cli/src/grep_command.rsbaml_language/crates/baml_lsp2_actions/src/describe.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
baml_language/crates/baml_lsp2_actions/src/describe.rs (1)
1-10: Consider adding unit tests for this module.This module contains substantial logic for symbol description, docstring extraction, dependency collection, and text slicing — all of which would benefit from unit tests. Functions like
line_at_offset,slice_text, and the serde helpers are particularly good candidates for isolated unit tests.As per coding guidelines: "Prefer writing Rust unit tests over integration tests where possible" and "Always run
cargo test --libif you changed any Rust code."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_lsp2_actions/src/describe.rs` around lines 1 - 10, Add unit tests for the describe module to cover core logic: write tests targeting the describe() function for basic symbol descriptions and isolated unit tests for line_at_offset, slice_text, and the serde helper functions (e.g., any serialize/deserialize helpers in this file) to validate edge cases (start/end offsets, multi-line docstrings, empty inputs). Use small synthetic source strings and minimal mocked contexts or construct the required inputs directly to assert expected outputs (shape, full body, docstring extraction, dependency collection, and reference slicing). Place tests in the module’s #[cfg(test)] mod tests and run them with cargo test --lib to ensure they pass and catch regressions. Ensure each test names the function under test (line_at_offset, slice_text, describe) so failures point to the correct code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@baml_language/crates/baml_lsp2_actions/src/describe.rs`:
- Around line 991-994: Fix the typo in the comment above the stub serialization
(the block describing why we serialize `SymbolDescription` without including the
full `SourceFile`) by changing "poluting" to "polluting"; update the comment
near `s.serialize_none()` that references `SymbolDescription` and `SourceFile`
so it reads "...without actually polluting it with the whole source file."
---
Nitpick comments:
In `@baml_language/crates/baml_lsp2_actions/src/describe.rs`:
- Around line 1-10: Add unit tests for the describe module to cover core logic:
write tests targeting the describe() function for basic symbol descriptions and
isolated unit tests for line_at_offset, slice_text, and the serde helper
functions (e.g., any serialize/deserialize helpers in this file) to validate
edge cases (start/end offsets, multi-line docstrings, empty inputs). Use small
synthetic source strings and minimal mocked contexts or construct the required
inputs directly to assert expected outputs (shape, full body, docstring
extraction, dependency collection, and reference slicing). Place tests in the
module’s #[cfg(test)] mod tests and run them with cargo test --lib to ensure
they pass and catch regressions. Ensure each test names the function under test
(line_at_offset, slice_text, describe) so failures point to the correct code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 9c39fa30-706b-4088-8126-f93d3dbee71c
📒 Files selected for processing (1)
baml_language/crates/baml_lsp2_actions/src/describe.rs
3dce573 to
76d82d5
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
baml_language/crates/baml_lsp2_actions/src/describe.rs (2)
797-803:⚠️ Potential issue | 🟠 MajorPreserve qualified type paths during dependency lookup.
Using only
segments.last()collapses qualified names to their tail segment, sofoo.Pointandbar.Pointbecome indistinguishable and can resolve to the wrong dependency or none at all. Try the fully qualified path first, then fall back to the last segment only if that lookup fails.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_lsp2_actions/src/describe.rs` around lines 797 - 803, The code in TypeExpr::Path currently uses segments.last() (and seen.insert(name_str) / resolve_dep) which loses qualification; change the lookup to first join the full segments path (e.g., build the fully qualified string from segments) and call resolve_dep(db, file, &full_path) and only if that returns None fall back to the existing last-segment flow (using segments.last() -> name_str and seen.insert(name_str.clone()) -> resolve_dep). Ensure seen tracking uses the same string you pass to resolve_dep so you don't deduplicate qualified vs unqualified names inconsistently.
652-654:⚠️ Potential issue | 🟠 MajorMember docstrings still resolve to the enclosing item.
describe_member()passes a field/variant range, but this helper immediately climbs to the first item-level ancestor. A documented field or enum variant will therefore inherit the class/enum docstring instead of its own comment.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_lsp2_actions/src/describe.rs` around lines 652 - 654, The code in describe_member() currently climbs to the first item-level ancestor using token.parent_ancestors().find(|n| is_item_node(n.kind())) which causes field/variant docstrings to resolve to the enclosing item; change the ancestor search to first look for a member-level node (e.g., field or variant) instead of immediately selecting an item. Update the lookup to find an ancestor where the node kind matches member kinds (create or use a helper like is_member_node that checks for field/variant) or whose text range matches the token's field/variant range, and only fall back to is_item_node when no member ancestor is found; adjust the code replacing the current token.parent_ancestors().find(...) usage and the item_node binding accordingly.
🧹 Nitpick comments (1)
baml_language/crates/baml_lsp2_actions/src/describe_tests.rs (1)
52-105: Add regression coverage for member and local descriptions.This suite only exercises top-level symbols.
describe_member()anddescribe_locals()are still untested here, so regressions in field/variant docstrings and parameter/let-binding handling can slip through without any failing snapshot.As per coding guidelines, "Prefer writing Rust unit tests over integration tests where possible".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_lsp2_actions/src/describe_tests.rs` around lines 52 - 105, Tests only cover top-level symbols; add unit tests that exercise describe_member() and describe_locals() to prevent regressions in field/variant docstrings and parameter/let-binding handling. Create tests similar to existing ones: call make_project(), invoke project.describe_member(...) for a class/enum member and project.describe_locals(...) for a function with parameters and local bindings, assert expected lengths (e.g., assert_eq!/assert! as in other tests) and use insta::assert_snapshot!(project.format_description(&...)) to record the rendered descriptions; place them alongside the existing describe_* tests so they run as unit tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@baml_language/crates/baml_lsp2_actions/src/describe.rs`:
- Around line 987-994: The custom serializer serialize_file currently returns
null which destroys SymbolDescription.file, DepRef.file, and RefSite.file for
JSON consumers; instead add a concrete serializable path/URI field (e.g.,
file_path or file_uri: String/Url) to each struct and mark the internal
SourceFile field with #[serde(skip)] so the in-memory SourceFile is not
serialized, then populate that path/URI when constructing the structs and update
or remove serialize_file to serialize the path/URI value (or have it serialize
SourceFile.path if you keep it) so JSON contains a usable location for
definitions/references; refer to serialize_file, SymbolDescription.file,
DepRef.file, RefSite.file and SourceFile when making these changes.
- Around line 358-420: The loop is emitting parameter definitions twice because
DefinitionSite::Parameter is treated as a binding; before computing type_str and
pushing the SymbolDescription into results, detect parameter sites and skip them
by adding an early continue when def_site matches
baml_compiler2_hir::semantic_index::DefinitionSite::Parameter(_). In other
words, check def_site (the variable used in the match), and if it's a Parameter,
do not construct the SymbolDescription or push to results (skip the push to
results), leaving parameters to the earlier sig.params pass.
---
Duplicate comments:
In `@baml_language/crates/baml_lsp2_actions/src/describe.rs`:
- Around line 797-803: The code in TypeExpr::Path currently uses segments.last()
(and seen.insert(name_str) / resolve_dep) which loses qualification; change the
lookup to first join the full segments path (e.g., build the fully qualified
string from segments) and call resolve_dep(db, file, &full_path) and only if
that returns None fall back to the existing last-segment flow (using
segments.last() -> name_str and seen.insert(name_str.clone()) -> resolve_dep).
Ensure seen tracking uses the same string you pass to resolve_dep so you don't
deduplicate qualified vs unqualified names inconsistently.
- Around line 652-654: The code in describe_member() currently climbs to the
first item-level ancestor using token.parent_ancestors().find(|n|
is_item_node(n.kind())) which causes field/variant docstrings to resolve to the
enclosing item; change the ancestor search to first look for a member-level node
(e.g., field or variant) instead of immediately selecting an item. Update the
lookup to find an ancestor where the node kind matches member kinds (create or
use a helper like is_member_node that checks for field/variant) or whose text
range matches the token's field/variant range, and only fall back to
is_item_node when no member ancestor is found; adjust the code replacing the
current token.parent_ancestors().find(...) usage and the item_node binding
accordingly.
---
Nitpick comments:
In `@baml_language/crates/baml_lsp2_actions/src/describe_tests.rs`:
- Around line 52-105: Tests only cover top-level symbols; add unit tests that
exercise describe_member() and describe_locals() to prevent regressions in
field/variant docstrings and parameter/let-binding handling. Create tests
similar to existing ones: call make_project(), invoke
project.describe_member(...) for a class/enum member and
project.describe_locals(...) for a function with parameters and local bindings,
assert expected lengths (e.g., assert_eq!/assert! as in other tests) and use
insta::assert_snapshot!(project.format_description(&...)) to record the rendered
descriptions; place them alongside the existing describe_* tests so they run as
unit tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 0858e627-e093-4b02-bcf8-427ccee7253b
⛔ Files ignored due to path filters (11)
baml_language/Cargo.lockis excluded by!**/*.lockbaml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__describe_tests__describe_class.snapis excluded by!**/*.snapbaml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__describe_tests__describe_class_with_refs.snapis excluded by!**/*.snapbaml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__describe_tests__describe_enum.snapis excluded by!**/*.snapbaml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__describe_tests__describe_function.snapis excluded by!**/*.snapbaml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__describe_tests__describe_function_with_enum_param.snapis excluded by!**/*.snapbaml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__grep_tests__grep_case_insensitive_text_search.snapis excluded by!**/*.snapbaml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__grep_tests__grep_enum_symbol.snapis excluded by!**/*.snapbaml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__grep_tests__grep_semantic_result_snapshot.snapis excluded by!**/*.snapbaml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__grep_tests__grep_text_search_with_matches.snapis excluded by!**/*.snapbaml_language/crates/baml_lsp2_actions/src/snapshots/baml_lsp2_actions__grep_tests__list_symbols_snapshot.snapis excluded by!**/*.snap
📒 Files selected for processing (6)
baml_language/crates/baml_lsp2_actions/Cargo.tomlbaml_language/crates/baml_lsp2_actions/src/describe.rsbaml_language/crates/baml_lsp2_actions/src/describe_tests.rsbaml_language/crates/baml_lsp2_actions/src/grep_tests.rsbaml_language/crates/baml_lsp2_actions/src/lib.rsbaml_language/crates/baml_lsp2_actions/src/testing.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- baml_language/crates/baml_lsp2_actions/Cargo.toml
2df4036 to
1c4cce1
Compare
55849bd to
dddb11e
Compare
115b6fa to
07448e8
Compare
07448e8 to
f85adce
Compare
Summary by CodeRabbit
describecommand: rich symbol descriptions (definition, resolved type, docstring, deps, references), JSON output, budgeted/truncated rendering, history tracking, hint suppression, and custom project-root support.grepcommand: semantic + text search with pattern matching, symbol-kind filtering,--symbolslisting,--def/--refsmodes, semantic→text fallback, grouped file results, and optional JSON output.