native support for grep#4368
Conversation
|
|
||
| pattern = glob_pattern or "*" | ||
| files: list[Path] = [] | ||
| for p in search_path.rglob(pattern): |
There was a problem hiding this comment.
I don't think pathlib.rglob supports brace expansion. Any use of {*.py,*.txt} will yield empty results
| # Compile regex | ||
| flags = re.IGNORECASE if case_insensitive else 0 | ||
| try: | ||
| compiled = re.compile(pattern, flags) |
There was a problem hiding this comment.
This is a ReDoS risk - malicious / careless pattern like (a+)+$, (.*a){20}, or (\w+\s?)*$ may cause catastrophic backtracking when matched against certain lines
| Formatted search results as a string. | ||
| """ | ||
| # Resolve search path | ||
| search_path = Path(path) if path else Path(os.getcwd()) |
There was a problem hiding this comment.
This allows for transversal of any absolute path in the system
There was a problem hiding this comment.
Along with the comments below, there is risk of sensitive content leakage. Any .env, .netrc, .npmrc, etc can be in a result
|
|
||
| pattern = glob_pattern or "*" | ||
| files: list[Path] = [] | ||
| for p in search_path.rglob(pattern): |
There was a problem hiding this comment.
There is also risk of symlink transversal here
|
|
||
| try: | ||
| with open(file_path, encoding="utf-8", errors="replace") as f: | ||
| lines = f.readlines() |
There was a problem hiding this comment.
We need a size check here as a fast follow, GB+ will be read fully into memory
| default=False, | ||
| description="Whether to perform case-insensitive matching", | ||
| ) | ||
| context_lines: int = Field( |
There was a problem hiding this comment.
Should set an upper bound here
…d brace expansion support - Added MAX_REGEX_LENGTH to limit regex pattern length and prevent ReDoS. - Introduced allow_unrestricted_paths option to enable searching outside the current working directory. - Implemented brace expansion for glob patterns to support multiple file types. - Enhanced error handling for path traversal and regex compilation. - Updated tests to cover new features and ensure robustness.
- Added MAX_CONTEXT_LINES to define the upper limit for context lines shown in search results. - Introduced MAX_FILE_SIZE_BYTES to skip files larger than 10 MB during searches. - Implemented logic to exclude sensitive files (e.g., .env, .netrc) from search results to prevent accidental leakage of credentials. - Updated tests to validate sensitive file exclusion and file size limits, ensuring robustness in handling sensitive content.
…into lorenze/feat/grep-tool
…into lorenze/feat/grep-tool
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| return None | ||
| finally: | ||
| signal.alarm(0) | ||
| signal.signal(signal.SIGALRM, old_handler) |
There was a problem hiding this comment.
SIGALRM crashes when tool runs in worker threads
High Severity
_safe_search calls signal.signal(signal.SIGALRM, ...) which raises ValueError: signal only works in main thread when invoked from a non-main thread. CrewAI's crew_agent_executor.py runs tool calls inside a ThreadPoolExecutor, so this will crash in normal usage. The except TimeoutError block doesn't catch ValueError, so the exception propagates up unhandled and kills the tool invocation entirely.
| continue | ||
| files.append(p) | ||
| if len(files) >= MAX_FILES: | ||
| break |
There was a problem hiding this comment.
Symlink traversal bypasses path restriction boundary
Medium Severity
_collect_files uses rglob which follows symlinks, and never verifies that discovered files' real paths remain within the search boundary. A symlink inside the search directory pointing to an arbitrary location (e.g., /etc/shadow) would be followed, read, and its contents returned. Neither is_symlink() nor resolve() is called on individual file paths before reading them.
| if self._safe_search(compiled_pattern, line): | ||
| match_line_nums.append(i) | ||
| if len(match_line_nums) >= MAX_MATCHES_PER_FILE: | ||
| break |
There was a problem hiding this comment.
ReDoS mitigation absent — regex runs on untruncated lines
Medium Severity
The _safe_search docstring claims Windows is "bounded by MAX_LINE_LENGTH truncation applied earlier in the pipeline," but _search_file calls _safe_search on full, untruncated lines. Truncation only occurs later during output formatting. On Windows (no SIGALRM), and in non-main threads on Unix (where SIGALRM crashes), there is no effective ReDoS protection — a malicious pattern against a long line can hang indefinitely.
Additional Locations (1)
|
This PR is stale because it has been open for 45 days with no activity. |


Note
Medium Risk
Introduces a new filesystem-search capability that touches path handling and regex execution; while guarded (path restriction, sensitive-file filtering, size/binary limits), bugs could still leak data or cause performance issues if misconfigured.
Overview
Adds a new
GrepToolthat performs recursive regex search over local files with glob filtering, context lines, and multiple output modes, plus safety controls (cwd-only path restriction by default, sensitive-file exclusions, binary/size skipping, output truncation, and basic ReDoS mitigations).Exposes
GrepToolvia the package/tool__init__exports, adds comprehensive unit tests, and regeneratestool.specs.jsonto include the new tool while also removing some previously listed init/required fields from other tool schemas (e.g. Couchbase cluster and Oxylabs/Tavily client params).Written by Cursor Bugbot for commit 2c78e60. This will update automatically on new commits. Configure here.