Optimize external directory path resolution and add benchmarks#95
Optimize external directory path resolution and add benchmarks#95
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #95 +/- ##
==========================================
- Coverage 81.56% 81.17% -0.40%
==========================================
Files 106 106
Lines 4389 4472 +83
==========================================
+ Hits 3580 3630 +50
- Misses 809 842 +33
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2026-03-28T02:14:04.465ZApplied to files:
🔇 Additional comments (9)
WalkthroughRefactors path resolution by splitting absolute vs relative handling into Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/llm-coding-tools-core/src/path/mod.rs (1)
152-161: Redundant fallback in Unix implementation.On Unix,
OsStrbytes are the same underlying data thatpath.to_str()validates. Iffrom_utf8(os_str.as_bytes())fails due to invalid UTF-8,path.to_str()will also returnNone(since it performs the same UTF-8 check on the same bytes). The fallback on line 159 will therefore always resolve to"".This can be simplified:
♻️ Simplify by removing redundant fallback
#[cfg(unix)] #[inline] pub(crate) fn path_as_str(path: &Path) -> &str { use std::os::unix::ffi::OsStrExt; let os_str = path.as_os_str(); - match std::str::from_utf8(os_str.as_bytes()) { - Ok(s) => s, - Err(_) => path.to_str().unwrap_or(""), - } + std::str::from_utf8(os_str.as_bytes()).unwrap_or("") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llm-coding-tools-core/src/path/mod.rs` around lines 152 - 161, The Unix implementation of path_as_str has a redundant fallback to path.to_str().unwrap_or("") because from_utf8(os_str.as_bytes()) and path.to_str() perform the same UTF-8 validation; remove the Err branch and simply return the UTF-8 conversion result (e.g., use std::str::from_utf8(os_str.as_bytes()).unwrap_or("")) in the path_as_str function to eliminate the pointless fallback while keeping behavior consistent.src/llm-coding-tools-core/src/path/allowed_glob/mod.rs (1)
287-351: Consider extracting repeated policy-check logic.The policy verification block (lines 292-300, 314-322, 336-344) is duplicated three times. While this is understandable for a hot path where closure overhead might matter, it increases maintenance burden.
If you're confident the compiler inlines effectively, a helper closure could reduce duplication:
♻️ Optional: Extract policy check to closure
fn resolve_absolute( &self, path: &str, input_path: &Path, policy: Option<&GlobPolicy>, has_dots: bool, ) -> ToolResult<PathBuf> { let fast_policy_input = if !has_dots { Some(path) } else { None }; let check_bases = |resolved: &Path| -> bool { self.base_directories.iter().any(|base_dir| { if !resolved.starts_with(base_dir) { return false; } if let Some(policy) = policy { let relative_path = resolved.strip_prefix(base_dir).unwrap_or(Path::new("")); let normalized_relative = normalize::normalize_path(relative_path); if fast_policy_input != Some(normalized_relative.as_ref()) && !policy.is_allowed(&normalized_relative) { return false; } } true }) }; if let Ok(resolved) = input_path.canonicalize() { if check_bases(&resolved) { return Ok(resolved); } return self.try_external(path, resolved); } // ... similar for other branches }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llm-coding-tools-core/src/path/allowed_glob/mod.rs` around lines 287 - 351, The policy verification logic is duplicated three times in resolve_absolute; extract it into a single helper (either a local closure like check_bases or a private method) that accepts &Path and returns bool, reuse it for the canonicalize, resolve_new_file_fast, and soft_canonicalize branches, and replace each duplicated iterator/policy block with a call to that helper and then call try_external when the helper returns false. Ensure the helper references fast_policy_input and policy the same way as the inlined code so behavior remains identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/llm-coding-tools-core/src/path/allowed_glob/mod.rs`:
- Around line 287-351: The policy verification logic is duplicated three times
in resolve_absolute; extract it into a single helper (either a local closure
like check_bases or a private method) that accepts &Path and returns bool, reuse
it for the canonicalize, resolve_new_file_fast, and soft_canonicalize branches,
and replace each duplicated iterator/policy block with a call to that helper and
then call try_external when the helper returns false. Ensure the helper
references fast_policy_input and policy the same way as the inlined code so
behavior remains identical.
In `@src/llm-coding-tools-core/src/path/mod.rs`:
- Around line 152-161: The Unix implementation of path_as_str has a redundant
fallback to path.to_str().unwrap_or("") because from_utf8(os_str.as_bytes()) and
path.to_str() perform the same UTF-8 validation; remove the Err branch and
simply return the UTF-8 conversion result (e.g., use
std::str::from_utf8(os_str.as_bytes()).unwrap_or("")) in the path_as_str
function to eliminate the pointless fallback while keeping behavior consistent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 343b219c-e555-49f9-b998-eb7bb61b28e5
📒 Files selected for processing (4)
src/llm-coding-tools-core/benches/path_resolvers.rssrc/llm-coding-tools-core/src/path/allowed.rssrc/llm-coding-tools-core/src/path/allowed_glob/mod.rssrc/llm-coding-tools-core/src/path/mod.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Semver Checks (Serdesai Full)
- GitHub Check: Semver Checks (Serdesai Full+Linux)
- GitHub Check: Async Windows
- GitHub Check: Blocking Windows
- GitHub Check: Blocking macOS
- GitHub Check: Async macOS
- GitHub Check: Async Linux
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Sewer56
Repo: Sewer56/llm-coding-tools PR: 69
File: src/llm-coding-tools-bubblewrap/src/profile/validation.rs:57-67
Timestamp: 2026-03-28T02:14:04.465Z
Learning: In `src/llm-coding-tools-bubblewrap/src/profile/` (Rust, llm-coding-tools-bubblewrap crate), the `Builder` API paths (workspace, synthetic_home, cache_root, mount lists, overlays, etc.) are always set by trusted application/operator code — the library consumer is the trusted party. Path normalization and `..`-component hardening in validators like `validate_absolute_path` is therefore NOT required to defend against traversal attacks. Untrusted input (LLM-generated shell commands) only enters through `wrap_command`/`execute_command_with_mode`, not through the `Builder`.
📚 Learning: 2026-03-28T02:14:04.465Z
Learnt from: Sewer56
Repo: Sewer56/llm-coding-tools PR: 69
File: src/llm-coding-tools-bubblewrap/src/profile/validation.rs:57-67
Timestamp: 2026-03-28T02:14:04.465Z
Learning: In `src/llm-coding-tools-bubblewrap/src/profile/` (Rust, llm-coding-tools-bubblewrap crate), the `Builder` API paths (workspace, synthetic_home, cache_root, mount lists, overlays, etc.) are always set by trusted application/operator code — the library consumer is the trusted party. Path normalization and `..`-component hardening in validators like `validate_absolute_path` is therefore NOT required to defend against traversal attacks. Untrusted input (LLM-generated shell commands) only enters through `wrap_command`/`execute_command_with_mode`, not through the `Builder`.
Applied to files:
src/llm-coding-tools-core/src/path/allowed_glob/mod.rssrc/llm-coding-tools-core/benches/path_resolvers.rssrc/llm-coding-tools-core/src/path/allowed.rs
🔇 Additional comments (5)
src/llm-coding-tools-core/src/path/allowed_glob/mod.rs (1)
356-389: Clean helper implementation.The
try_externalhelper centralizes external permission handling well. Theimpl Into<Option<PathBuf>>parameter is ergonomic, allowing callers to pass either aPathBufdirectly orNone.src/llm-coding-tools-core/src/path/allowed.rs (2)
182-220: LGTM - clean refactoring of absolute path resolution.The optimization to canonicalize once and check all bases with
.any()is correct and efficient. The flow through the three resolution strategies (canonicalize → resolve_new_file_fast → soft_canonicalize) maintains the same semantics as before while avoiding redundant filesystem calls.
222-255: Consistent implementation with glob resolver.The
try_externalhelper mirrors the implementation inAllowedGlobResolver, maintaining consistency across both resolvers.src/llm-coding-tools-core/benches/path_resolvers.rs (2)
309-412: Good benchmark coverage for external directory scenarios.The benchmark function is well-structured with clear documentation and covers the key scenarios:
- Fast path (existing file with permission)
- Slow path (new file requiring soft_canonicalize)
- Rejection paths (denied and no-ruleset)
- Edge case (relative path with external permission)
The test case matrix with resolver selection logic is clean and maintainable.
332-335: Glob pattern concern is unfounded—*does match nested paths in this implementation.The
wildcard_matchfunction does not use shell glob semantics. The pattern*is treated as a universal wildcard that matches any sequence of bytes, including path separators (/). The fast path at line 347–349 returnstrueforpattern == "*"regardless of input, and the backtracking implementation allows*to consume any characters without restriction.With pattern
/tmp/xyz/*and input/tmp/xyz/subdir/new_file.txt, the*will successfully matchsubdir/new_file.txt(including the/). The benchmark correctly tests the intended "soft_canonicalize + permission allow" path.
For absolute paths, base.join(input) == input regardless of base, so canonicalize once and check all bases instead of per-base FS calls. Extract resolve_absolute and try_external helpers in both resolvers. Add external directory and multiple-bases benchmarks.
7a5e026 to
9b1fcb0
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/llm-coding-tools-core/src/path/allowed_glob/mod.rs (1)
355-377:fast_policy_inputoptimization is ineffective for absolute paths.The
fast_policy_inputis set to the original absolute path (e.g.,/home/user/project/src/lib.rs), but it's compared againstnormalized_relativewhich is a relative path after stripping the base prefix (e.g.,src/lib.rs). These will never match, so the condition at line 367 (fast_policy_input != Some(normalized_relative.as_ref())) is always true for absolute paths, meaningpolicy.is_allowed()is always called.This doesn't affect correctness—the policy check still happens—but the optimization is dead code. Consider either:
- Removing
fast_policy_inputfrom this function since it can't trigger- Computing a relative form to enable the optimization
♻️ Option 1: Simplify by removing the ineffective optimization
fn resolve_absolute( base_directories: &[Arc<Path>], external_permission: Option<&Ruleset>, policy: Option<&GlobPolicy>, path: &str, input_path: &Path, - has_dots: bool, + _has_dots: bool, ) -> ToolResult<PathBuf> { - let fast_policy_input = if !has_dots { Some(path) } else { None }; - // Step 1: canonicalize for existing files. if let Ok(resolved) = input_path.canonicalize() { // Check if any base claims this path and policy approves. let accepted = base_directories.iter().any(|base_dir| { if !resolved.starts_with(base_dir) { return false; } if let Some(policy) = policy { let relative_path = resolved.strip_prefix(base_dir).unwrap_or(Path::new("")); let normalized_relative = normalize::normalize_path(relative_path); - if fast_policy_input != Some(normalized_relative.as_ref()) - && !policy.is_allowed(&normalized_relative) - { + if !policy.is_allowed(&normalized_relative) { return false; } } true });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llm-coding-tools-core/src/path/allowed_glob/mod.rs` around lines 355 - 377, The fast_policy_input optimization is ineffective because fast_policy_input holds the absolute input_path while the code compares it to normalized_relative (a path stripped of base_dir), so the shortcut never triggers; remove the fast_policy_input logic (its declaration and the special-case comparison in the base_directories.any closure) and always compute normalized_relative and call policy.is_allowed as currently done, or alternatively compute a relative form of input_path (using input_path.strip_prefix(base_dir) when canonicalization succeeds) and compare that to normalized_relative before calling policy.is_allowed; update references around fast_policy_input, input_path.canonicalize, normalize::normalize_path, and policy.is_allowed accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/llm-coding-tools-core/src/path/allowed_glob/mod.rs`:
- Around line 355-377: The fast_policy_input optimization is ineffective because
fast_policy_input holds the absolute input_path while the code compares it to
normalized_relative (a path stripped of base_dir), so the shortcut never
triggers; remove the fast_policy_input logic (its declaration and the
special-case comparison in the base_directories.any closure) and always compute
normalized_relative and call policy.is_allowed as currently done, or
alternatively compute a relative form of input_path (using
input_path.strip_prefix(base_dir) when canonicalization succeeds) and compare
that to normalized_relative before calling policy.is_allowed; update references
around fast_policy_input, input_path.canonicalize, normalize::normalize_path,
and policy.is_allowed accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fb6fad74-02a0-48c7-84b6-8bc8d7dad431
📒 Files selected for processing (4)
src/llm-coding-tools-core/benches/path_resolvers.rssrc/llm-coding-tools-core/src/path/allowed.rssrc/llm-coding-tools-core/src/path/allowed_glob/mod.rssrc/llm-coding-tools-core/src/path/mod.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- src/llm-coding-tools-core/src/path/mod.rs
- src/llm-coding-tools-core/benches/path_resolvers.rs
- src/llm-coding-tools-core/src/path/allowed.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Async Windows
- GitHub Check: Blocking macOS
- GitHub Check: Blocking Windows
- GitHub Check: Async macOS
- GitHub Check: Async Linux
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-03-28T02:14:04.465Z
Learnt from: Sewer56
Repo: Sewer56/llm-coding-tools PR: 69
File: src/llm-coding-tools-bubblewrap/src/profile/validation.rs:57-67
Timestamp: 2026-03-28T02:14:04.465Z
Learning: In `src/llm-coding-tools-bubblewrap/src/profile/` (Rust, llm-coding-tools-bubblewrap crate), the `Builder` API paths (workspace, synthetic_home, cache_root, mount lists, overlays, etc.) are always set by trusted application/operator code — the library consumer is the trusted party. Path normalization and `..`-component hardening in validators like `validate_absolute_path` is therefore NOT required to defend against traversal attacks. Untrusted input (LLM-generated shell commands) only enters through `wrap_command`/`execute_command_with_mode`, not through the `Builder`.
Applied to files:
src/llm-coding-tools-core/src/path/allowed_glob/mod.rs
🔇 Additional comments (5)
src/llm-coding-tools-core/src/path/allowed_glob/mod.rs (5)
5-40: Clear and comprehensive documentation of the resolution algorithm.The documentation accurately describes the dispatch logic and resolution tiers. Good addition for maintainability.
217-239: Clean dispatch pattern for absolute vs relative paths.The separation into dedicated helpers improves readability and allows for optimization strategies specific to each path type.
254-336: Well-structured relative path resolution with proper policy re-checking.The tiered approach (canonicalize → new-file-fast → soft_canonicalize) correctly handles existing files, new files in existing directories, and paths with missing parents. The policy re-check after normalization/symlink resolution is essential for security.
439-472: Clean consolidation of external permission evaluation.The
try_externalhelper nicely encapsulates the external-directory fallback logic. Theimpl Into<Option<PathBuf>>signature is ergonomic, allowing callers to passresolveddirectly orNone. The early empty-ruleset check at line 446 avoids unnecessary canonicalization work.
840-852: Test comment clarifies the intent.Good clarification that external permission only applies to absolute paths.
The fast_policy_input shortcut compared an absolute input path against a normalized relative path, so it could never match and the code always fell through to policy.is_allowed. Remove the dead branch, the unused has_dots parameter, and simplify all three resolution tiers.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/llm-coding-tools-core/src/path/allowed_glob/mod.rs`:
- Around line 352-370: The current canonicalization block uses a single
`accepted` check that conflates "inside a base directory" and "allowed by the
base's glob policy", then calls `try_external(external_permission, path,
resolved)` whenever `accepted` is false; change this so you separately track
whether `resolved` is contained in any `base_directories` (e.g.,
`inside_any_base`) and whether any base's
`policy.is_allowed(&normalize::normalize_path(...))` returned true (e.g.,
`policy_allowed`), and only call `try_external` when `inside_any_base` is false
(i.e., the path is outside all bases). Update the logic around
`input_path.canonicalize()`, the `base_directories.iter().any(...)` closure, and
the eventual decision that returns `Ok(resolved)` or falls back to
`try_external(external_permission, path, resolved)` so that an in-base denial by
`policy.is_allowed` is not bypassed by `external_permission`.
- Around line 227-237: The current fast-path using fast_policy_input (set when
analysis.has_dots is false) treats a policy denial on the raw input as final,
which can incorrectly reject valid paths that normalize or follow symlinks;
update the logic around fast_policy_input and resolve_relative so that a policy
denial on the fast probe is not a hard reject: use the fast probe only as a
cached allow (if the policy allows, return allow immediately), but if the fast
probe denies, continue and run the canonicalized/normalized check (via
resolve_relative using self.base_directories, policy, path, input_path) to make
the final decision; ensure functions like resolve_relative or its caller handle
a "probe denied" outcome by falling back to full resolution rather than
returning deny.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8daf20d6-c19b-492b-b475-f1b004cb9395
📒 Files selected for processing (1)
src/llm-coding-tools-core/src/path/allowed_glob/mod.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Async Windows
- GitHub Check: Async Linux
- GitHub Check: Blocking macOS
- GitHub Check: Async macOS
- GitHub Check: Blocking Windows
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-03-28T02:14:04.465Z
Learnt from: Sewer56
Repo: Sewer56/llm-coding-tools PR: 69
File: src/llm-coding-tools-bubblewrap/src/profile/validation.rs:57-67
Timestamp: 2026-03-28T02:14:04.465Z
Learning: In `src/llm-coding-tools-bubblewrap/src/profile/` (Rust, llm-coding-tools-bubblewrap crate), the `Builder` API paths (workspace, synthetic_home, cache_root, mount lists, overlays, etc.) are always set by trusted application/operator code — the library consumer is the trusted party. Path normalization and `..`-component hardening in validators like `validate_absolute_path` is therefore NOT required to defend against traversal attacks. Untrusted input (LLM-generated shell commands) only enters through `wrap_command`/`execute_command_with_mode`, not through the `Builder`.
Applied to files:
src/llm-coding-tools-core/src/path/allowed_glob/mod.rs
🔇 Additional comments (1)
src/llm-coding-tools-core/src/path/allowed_glob/mod.rs (1)
417-459: Nice extraction of the external fallback.Centralizing canonicalization reuse, permission evaluation, and the shared rejection path in
try_externalmakes the absolute-path tiers much easier to keep aligned.
Paths inside a base directory that were denied by glob policy could incorrectly fall through to try_external() and be approved via external_permission. Now we track inside_any_base separately from accepted so that in-base denials are rejected immediately. - Add not_allowed_error() helper to unify error messages - Add test rejects_in_base_path_denied_by_policy_even_with_external_permission
…mlinked paths in resolve_relative The fast-path checked glob policy on the raw input string before canonicalization, causing it to skip bases where the raw path was denied but the symlink-resolved path would have been allowed. This also removes the now-unused has_dots field from PathAnalysis.
Summary
Optimize the absolute-path resolution path in
AllowedPathResolverandAllowedGlobResolver, add external-directory benchmark coverage, and refresh all reference numbers.Changes
base.join(absolute_path) == absolute_path, so canonicalize once then check all bases instead of per-base join + canonicalizePathBuf::cloneon the matching-base fast path by using.any()Cowallocation in permission evaluation via directOsStr→strconversionexternal_directorybenchmark group with 5 test cases covering the external permission fallbackcomplex_policyandmultiple_basesreference numbersBenchmark Results
Controlled A/B comparison on same hardware (pre-optimization vs this branch):
Pre-existing benchmarks: no regressions
All within ±2% (noise). Some slight improvements:
External directory: significant improvements