Skip to content

native support for grep#4368

Open
lorenzejay wants to merge 19 commits into
mainfrom
lorenze/feat/grep-tool
Open

native support for grep#4368
lorenzejay wants to merge 19 commits into
mainfrom
lorenze/feat/grep-tool

Conversation

@lorenzejay
Copy link
Copy Markdown
Collaborator

@lorenzejay lorenzejay commented Feb 4, 2026

Note

Medium Risk
Introduces a new filesystem-search capability that touches path handling and regex execution; while guarded (path restriction, sensitive-file filtering, size/binary limits), bugs could still leak data or cause performance issues if misconfigured.

Overview
Adds a new GrepTool that performs recursive regex search over local files with glob filtering, context lines, and multiple output modes, plus safety controls (cwd-only path restriction by default, sensitive-file exclusions, binary/size skipping, output truncation, and basic ReDoS mitigations).

Exposes GrepTool via the package/tool __init__ exports, adds comprehensive unit tests, and regenerates tool.specs.json to include the new tool while also removing some previously listed init/required fields from other tool schemas (e.g. Couchbase cluster and Oxylabs/Tavily client params).

Written by Cursor Bugbot for commit 2c78e60. This will update automatically on new commits. Configure here.


pattern = glob_pattern or "*"
files: list[Path] = []
for p in search_path.rglob(pattern):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think pathlib.rglob supports brace expansion. Any use of {*.py,*.txt} will yield empty results

# Compile regex
flags = re.IGNORECASE if case_insensitive else 0
try:
compiled = re.compile(pattern, flags)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a ReDoS risk - malicious / careless pattern like (a+)+$, (.*a){20}, or (\w+\s?)*$ may cause catastrophic backtracking when matched against certain lines

Formatted search results as a string.
"""
# Resolve search path
search_path = Path(path) if path else Path(os.getcwd())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows for transversal of any absolute path in the system

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with the comments below, there is risk of sensitive content leakage. Any .env, .netrc, .npmrc, etc can be in a result


pattern = glob_pattern or "*"
files: list[Path] = []
for p in search_path.rglob(pattern):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also risk of symlink transversal here


try:
with open(file_path, encoding="utf-8", errors="replace") as f:
lines = f.readlines()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a size check here as a fast follow, GB+ will be read fully into memory

default=False,
description="Whether to perform case-insensitive matching",
)
context_lines: int = Field(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should set an upper bound here

lorenzejay and others added 3 commits February 11, 2026 20:44
…d brace expansion support

- Added MAX_REGEX_LENGTH to limit regex pattern length and prevent ReDoS.
- Introduced allow_unrestricted_paths option to enable searching outside the current working directory.
- Implemented brace expansion for glob patterns to support multiple file types.
- Enhanced error handling for path traversal and regex compilation.
- Updated tests to cover new features and ensure robustness.
lorenzejay and others added 10 commits February 12, 2026 09:24
- Added MAX_CONTEXT_LINES to define the upper limit for context lines shown in search results.
- Introduced MAX_FILE_SIZE_BYTES to skip files larger than 10 MB during searches.
- Implemented logic to exclude sensitive files (e.g., .env, .netrc) from search results to prevent accidental leakage of credentials.
- Updated tests to validate sensitive file exclusion and file size limits, ensuring robustness in handling sensitive content.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

return None
finally:
signal.alarm(0)
signal.signal(signal.SIGALRM, old_handler)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIGALRM crashes when tool runs in worker threads

High Severity

_safe_search calls signal.signal(signal.SIGALRM, ...) which raises ValueError: signal only works in main thread when invoked from a non-main thread. CrewAI's crew_agent_executor.py runs tool calls inside a ThreadPoolExecutor, so this will crash in normal usage. The except TimeoutError block doesn't catch ValueError, so the exception propagates up unhandled and kills the tool invocation entirely.

Fix in Cursor Fix in Web

continue
files.append(p)
if len(files) >= MAX_FILES:
break
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symlink traversal bypasses path restriction boundary

Medium Severity

_collect_files uses rglob which follows symlinks, and never verifies that discovered files' real paths remain within the search boundary. A symlink inside the search directory pointing to an arbitrary location (e.g., /etc/shadow) would be followed, read, and its contents returned. Neither is_symlink() nor resolve() is called on individual file paths before reading them.

Fix in Cursor Fix in Web

if self._safe_search(compiled_pattern, line):
match_line_nums.append(i)
if len(match_line_nums) >= MAX_MATCHES_PER_FILE:
break
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReDoS mitigation absent — regex runs on untruncated lines

Medium Severity

The _safe_search docstring claims Windows is "bounded by MAX_LINE_LENGTH truncation applied earlier in the pipeline," but _search_file calls _safe_search on full, untruncated lines. Truncation only occurs later during output formatting. On Windows (no SIGALRM), and in non-main threads on Unix (where SIGALRM crashes), there is no effective ReDoS protection — a malicious pattern against a long line can hang indefinitely.

Additional Locations (1)
Fix in Cursor Fix in Web

@github-actions
Copy link
Copy Markdown
Contributor

This PR is stale because it has been open for 45 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants