Add file policy engine for file tool harness#1974
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a file access policy engine that enforces path-level restrictions for file-related tools using glob-style patterns defined in execpolicy.toml. The changes include the core evaluation logic with regex caching, integration into both serial and parallel tool execution paths, a new FilePolicy feature flag, and comprehensive documentation. Reviewer feedback identifies a bug in handling filenames with spaces in diff headers and potential false positives when parsing patch contents. Further suggestions include improving error reporting for invalid configurations, reordering policy evaluation to better support whitelisting with exceptions, and extending coverage to tools like list_dir and grep_files.
| } | ||
|
|
||
| fn normalize_diff_path(raw: &str) -> Option<String> { | ||
| let path = raw.split_whitespace().next()?.trim_matches('"'); |
There was a problem hiding this comment.
Using split_whitespace() to extract the path from a diff header will fail for filenames containing spaces. Unified diff headers typically use a tab character (\t) to separate the filename from the timestamp. Using split_whitespace().next() will truncate the path at the first space, leading to incorrect policy enforcement or potential bypasses for files with spaces in their names.
| let path = raw.split_whitespace().next()?.trim_matches('"'); | |
| let path = raw.split('\t').next()?.trim_matches('"'); |
| let file_policy = if config.features.enabled(Feature::FilePolicy) { | ||
| crate::execpolicy::load_default_policy() | ||
| .ok() | ||
| .flatten() | ||
| .map(Arc::new) | ||
| } else { | ||
| None | ||
| }; |
There was a problem hiding this comment.
The policy engine fails silently if the configuration file exists but is invalid (e.g., due to a syntax error). Using .ok() converts the error into None, effectively disabling the safety layer without notifying the user. For a security-related feature, it is better to log a high-visibility error or surface the failure to the user so they are aware that the expected safety guardrails are not active.
| pub fn evaluate_file(&self, category: &str, path: &str) -> ExecPolicyDecision { | ||
| let mut rule_sets = Vec::new(); | ||
| if let Some(rules) = self.file_rules.get(category) { | ||
| rule_sets.push(rules); | ||
| } | ||
| if category != "default" | ||
| && let Some(rules) = self.file_rules.get("default") | ||
| { | ||
| rule_sets.push(rules); | ||
| } | ||
|
|
||
| if rule_sets.is_empty() { | ||
| return ExecPolicyDecision::Allow; | ||
| } | ||
|
|
||
| for rules in &rule_sets { | ||
| for pattern in &rules.deny { | ||
| if pattern_matches_path(pattern, path) { | ||
| return ExecPolicyDecision::Deny(format!( | ||
| "file policy denied by {category}: {pattern}" | ||
| )); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| for rules in &rule_sets { | ||
| for pattern in &rules.allow { | ||
| if pattern_matches_path(pattern, path) { | ||
| return ExecPolicyDecision::Allow; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| ExecPolicyDecision::AskUser(format!( | ||
| "file policy: no matching allow rule for {category}" | ||
| )) | ||
| } |
There was a problem hiding this comment.
The current implementation checks all deny rules (category then default) before any allow rules. This makes it impossible to create a "whitelist with exceptions" policy. For example, if default has deny = ["*"], no specific allow rule in a category (like read_file) can ever permit a file because the global deny will always trigger first. Reordering the checks to evaluate the specific category (deny then allow) before falling back to the default category would be more flexible and intuitive.
pub fn evaluate_file(&self, category: &str, path: &str) -> ExecPolicyDecision {
if let Some(rules) = self.file_rules.get(category) {
for pattern in &rules.deny {
if pattern_matches_path(pattern, path) {
return ExecPolicyDecision::Deny(format!(
"file policy denied by {category}: {pattern}"
));
}
}
for pattern in &rules.allow {
if pattern_matches_path(pattern, path) {
return ExecPolicyDecision::Allow;
}
}
}
if category != "default" && let Some(rules) = self.file_rules.get("default") {
for pattern in &rules.deny {
if pattern_matches_path(pattern, path) {
return ExecPolicyDecision::Deny(format!(
"file policy denied by default: {pattern}"
));
}
}
for pattern in &rules.allow {
if pattern_matches_path(pattern, path) {
return ExecPolicyDecision::Allow;
}
}
}
if self.file_rules.is_empty() {
ExecPolicyDecision::Allow
} else {
ExecPolicyDecision::AskUser(format!(
"file policy: no matching allow rule for {category}"
))
}
}| pub fn is_file_tool(tool_name: &str) -> bool { | ||
| matches!( | ||
| tool_name, | ||
| "read_file" | "write_file" | "edit_file" | "apply_patch" | ||
| ) | ||
| } | ||
|
|
||
| pub fn extract_file_path<'a>( | ||
| tool_input: &'a serde_json::Value, | ||
| tool_name: &str, | ||
| ) -> Option<&'a str> { | ||
| match tool_name { | ||
| "read_file" | "write_file" | "edit_file" | "apply_patch" => { | ||
| tool_input.get("path")?.as_str() | ||
| } | ||
| _ => None, | ||
| } | ||
| } | ||
|
|
||
| pub fn extract_file_paths(tool_input: &serde_json::Value, tool_name: &str) -> Vec<String> { | ||
| match tool_name { | ||
| "read_file" | "write_file" | "edit_file" => extract_file_path(tool_input, tool_name) | ||
| .map(|path| vec![path.to_string()]) | ||
| .unwrap_or_default(), | ||
| "apply_patch" => extract_apply_patch_paths(tool_input), | ||
| _ => Vec::new(), | ||
| } | ||
| } |
There was a problem hiding this comment.
The policy engine currently omits list_dir and grep_files, which can be used to access sensitive information or discover workspace structure. These tools should be included in the file access policy to provide a comprehensive safety layer.
pub fn is_file_tool(tool_name: &str) -> bool {
matches!(
tool_name,
"read_file" | "write_file" | "edit_file" | "apply_patch" | "list_dir" | "grep_files"
)
}
pub fn extract_file_path<'a>(
tool_input: &'a serde_json::Value,
tool_name: &str,
) -> Option<&'a str> {
match tool_name {
"read_file" | "write_file" | "edit_file" | "apply_patch" | "list_dir" | "grep_files" => {
tool_input.get("path")?.as_str()
}
_ => None,
}
}
pub fn extract_file_paths(tool_input: &serde_json::Value, tool_name: &str) -> Vec<String> {
match tool_name {
"read_file" | "write_file" | "edit_file" | "list_dir" | "grep_files" => {
extract_file_path(tool_input, tool_name)
.map(|path| vec![path.to_string()])
.unwrap_or_default()
}
"apply_patch" => extract_apply_patch_paths(tool_input),
_ => Vec::new(),
}
}| for line in patch.lines() { | ||
| let candidate = line | ||
| .strip_prefix("+++ ") | ||
| .or_else(|| line.strip_prefix("--- ")); | ||
| let Some(candidate) = candidate else { | ||
| continue; | ||
| }; | ||
| if let Some(path) = normalize_diff_path(candidate) { | ||
| push_unique_path(&mut paths, path); | ||
| } | ||
| } |
There was a problem hiding this comment.
Scanning the entire patch body for lines starting with --- or +++ can lead to false positives if the file content being patched contains these sequences (e.g., when patching a diff file or documentation about diffs). This could cause the policy engine to block a patch based on text that is actually part of the file content rather than a target path. Consider a more robust approach that only parses headers outside of hunks.
Summary
Reopens the stale pre-v0.8.41 file policy engine work from #1585 on current
main.This adds a harness-level file access policy that can allow or deny file-touching tool calls by category before mutation/read execution. It is intended as a reusable safety layer for workspace policy, diagnostics, and audit behavior.
Changes
refs/execpolicy.example.toml.execute_tool_with_lockdefensive execution.read_file,write_file,edit_file, andapply_patchincluding path override and unified diff headers.tool.file_policy_deniedaudit metadata on denied execution.docs/file-policy-design.md.Validation
cargo fmt --checkcargo test -p codewhale-tui file_policy(15 passed)cargo test -p codewhale-tui engine::(117 passed)cargo test -p codewhale-tui composer_arrows_scroll(10 passed)cargo check --workspaceSupersedes #1585, which became stale after the v0.8.41 rebrand.