fix: add path and URL validation to RAG tools#5310
Merged
Conversation
ace35fb to
fdc6439
Compare
4a07018 to
1f37345
Compare
4214a8d to
7468dca
Compare
b0da5a7 to
f41d86f
Compare
fbe56bc to
4c8f289
Compare
02c8b42 to
774a3b8
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 892ee94. Configure here.
9d040fd to
4ca16d5
Compare
Add validation utilities to prevent unauthorized file reads and SSRF when RAG tools accept LLM-controlled paths/URLs at runtime. Changes: - New crewai_tools.utilities.safe_path module with validate_file_path(), validate_directory_path(), and validate_url() - File paths validated against base directory (defaults to cwd). Resolves symlinks and ../ traversal. Rejects escape attempts. - URLs validated: file:// blocked entirely. HTTP/HTTPS resolves DNS and blocks private/reserved IPs (10.x, 172.16-31.x, 192.168.x, 127.x, 169.254.x, 0.0.0.0, ::1, fc00::/7). - Validation applied in RagTool.add() — catches all RAG search tools (JSON, CSV, PDF, TXT, DOCX, MDX, Directory, etc.) - Removed file:// scheme support from DataTypes.from_content() - CREWAI_TOOLS_ALLOW_UNSAFE_PATHS=true env var for backward compat - 27 tests covering traversal, symlinks, private IPs, cloud metadata, IPv6, escape hatch, and valid paths/URLs
The original patch validated positional *args but left all keyword arguments (path=, file_path=, directory_path=, url=, website=, github_url=, youtube_url=) unvalidated, providing a trivial bypass for both path-traversal and SSRF checks. Applies validate_file_path() to path/file_path/directory_path kwargs and validate_url() to url/website/github_url/youtube_url kwargs before they reach the adapter. Adds a regression-test file covering all eight kwarg vectors plus the two existing positional-arg checks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace insecure tempfile.mktemp() with inline symlink target in test - Remove unused 'target' variable and unused tempfile import - Narrow broad except Exception: pass to only catch urlparse errors; validate_url ValueError now propagates instead of being silently swallowed - Fix ruff B904 (raise-without-from-inside-except) in safe_path.py - Fix ruff B007 (unused loop variable 'family') in safe_path.py - Use validate_directory_path in DirectorySearchTool.add() so the public utility is exercised in production code Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Cast sockaddr[0] to str() to satisfy mypy (socket.getaddrinfo returns sockaddr where [0] is str but typed as str | int) - Remove now-unnecessary `type: ignore[assignment]` and `type: ignore[literal-required]` comments in rag_tool.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…NSAFE_PATHS TemporaryDirectory creates files under /tmp/ which is outside CWD and is correctly blocked by the new path validation. These tests exercise data-type handling, not security, so add an autouse fixture that sets CREWAI_TOOLS_ALLOW_UNSAFE_PATHS=true for the whole file. Path/URL security is covered by test_rag_tool_path_validation.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…OLS_ALLOW_UNSAFE_PATHS test_search_tools.py has tests for TXTSearchTool, CSVSearchTool, MDXSearchTool, JSONSearchTool, and DirectorySearchTool that create files under /tmp/ via tempfile, which is outside CWD and correctly blocked by the new path validation. rag_tool_test.py has one test that calls tool.add() with a TemporaryDirectory path. Add the same autouse allow_tmp_paths fixture used in test_rag_tool_add_data_type.py. Security is covered separately by test_rag_tool_path_validation.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- safe_path._is_private_or_reserved: after unwrapping IPv4-mapped IPv6
to IPv4, only check against IPv4 networks to avoid TypeError when
comparing an IPv4Address against IPv6Network objects.
- safe_path.validate_file_path: handle filesystem-root base_dir ('/')
by not appending os.sep when the base already ends with a separator,
preventing the '//'-prefix bug.
- rag_tool.add: path-detection heuristic now checks for both '/' and
os.sep so forward-slash paths are caught on Windows as well as Unix.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4ca16d5 to
4e23226
Compare
joaomdmoura
approved these changes
Apr 7, 2026
5 tasks
volkanozyildirim
pushed a commit
to volkanozyildirim/crew-ai
that referenced
this pull request
Apr 15, 2026
* fix: add path and URL validation to RAG tools
Add validation utilities to prevent unauthorized file reads and SSRF
when RAG tools accept LLM-controlled paths/URLs at runtime.
Changes:
- New crewai_tools.utilities.safe_path module with validate_file_path(),
validate_directory_path(), and validate_url()
- File paths validated against base directory (defaults to cwd).
Resolves symlinks and ../ traversal. Rejects escape attempts.
- URLs validated: file:// blocked entirely. HTTP/HTTPS resolves DNS
and blocks private/reserved IPs (10.x, 172.16-31.x, 192.168.x,
127.x, 169.254.x, 0.0.0.0, ::1, fc00::/7).
- Validation applied in RagTool.add() — catches all RAG search tools
(JSON, CSV, PDF, TXT, DOCX, MDX, Directory, etc.)
- Removed file:// scheme support from DataTypes.from_content()
- CREWAI_TOOLS_ALLOW_UNSAFE_PATHS=true env var for backward compat
- 27 tests covering traversal, symlinks, private IPs, cloud metadata,
IPv6, escape hatch, and valid paths/URLs
* fix: validate path/URL keyword args in RagTool.add()
The original patch validated positional *args but left all keyword
arguments (path=, file_path=, directory_path=, url=, website=,
github_url=, youtube_url=) unvalidated, providing a trivial bypass
for both path-traversal and SSRF checks.
Applies validate_file_path() to path/file_path/directory_path kwargs
and validate_url() to url/website/github_url/youtube_url kwargs before
they reach the adapter. Adds a regression-test file covering all eight
kwarg vectors plus the two existing positional-arg checks.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: address CodeQL and review comments on RAG path/URL validation
- Replace insecure tempfile.mktemp() with inline symlink target in test
- Remove unused 'target' variable and unused tempfile import
- Narrow broad except Exception: pass to only catch urlparse errors;
validate_url ValueError now propagates instead of being silently swallowed
- Fix ruff B904 (raise-without-from-inside-except) in safe_path.py
- Fix ruff B007 (unused loop variable 'family') in safe_path.py
- Use validate_directory_path in DirectorySearchTool.add() so the
public utility is exercised in production code
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: fix ruff format + remaining lint issues
* fix: resolve mypy type errors in RAG path/URL validation
- Cast sockaddr[0] to str() to satisfy mypy (socket.getaddrinfo returns
sockaddr where [0] is str but typed as str | int)
- Remove now-unnecessary `type: ignore[assignment]` and
`type: ignore[literal-required]` comments in rag_tool.py
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: unroll dynamic TypedDict key loops to satisfy mypy literal-required
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test: allow tmp paths in RAG data-type tests via CREWAI_TOOLS_ALLOW_UNSAFE_PATHS
TemporaryDirectory creates files under /tmp/ which is outside CWD and is
correctly blocked by the new path validation. These tests exercise
data-type handling, not security, so add an autouse fixture that sets
CREWAI_TOOLS_ALLOW_UNSAFE_PATHS=true for the whole file. Path/URL
security is covered by test_rag_tool_path_validation.py.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test: allow tmp paths in search-tool and rag_tool tests via CREWAI_TOOLS_ALLOW_UNSAFE_PATHS
test_search_tools.py has tests for TXTSearchTool, CSVSearchTool,
MDXSearchTool, JSONSearchTool, and DirectorySearchTool that create
files under /tmp/ via tempfile, which is outside CWD and correctly
blocked by the new path validation. rag_tool_test.py has one test
that calls tool.add() with a TemporaryDirectory path.
Add the same autouse allow_tmp_paths fixture used in
test_rag_tool_add_data_type.py. Security is covered separately by
test_rag_tool_path_validation.py.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: update tool specifications
* docs: document CodeInterpreterTool removal and RAG path/URL validation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: address three review comments on path/URL validation
- safe_path._is_private_or_reserved: after unwrapping IPv4-mapped IPv6
to IPv4, only check against IPv4 networks to avoid TypeError when
comparing an IPv4Address against IPv6Network objects.
- safe_path.validate_file_path: handle filesystem-root base_dir ('/')
by not appending os.sep when the base already ends with a separator,
preventing the '//'-prefix bug.
- rag_tool.add: path-detection heuristic now checks for both '/' and
os.sep so forward-slash paths are caught on Windows as well as Unix.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: remove unused _BLOCKED_NETWORKS variable after IPv4/IPv6 split
* chore: update tool specifications
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Problem
RAG search tools (JSONSearchTool, CSVSearchTool, PDFSearchTool, etc.) accept file paths and URLs at runtime from the LLM with no validation. This allows:
/etc/passwdor../../secrets.env→ tool reads ithttp://169.254.169.254/→ tool fetches cloud metadatafile:///etc/shadowwas explicitly supported in data_types.pyFix
New utility module:
crewai_tools.utilities.safe_pathvalidate_file_path(path, base_dir)../traversal viaos.path.realpath()base_dir(defaults to cwd)validate_url(url)file://scheme entirelyApplied in:
RagTool.add()— validates all paths/URLs before passing to adapter (catches all RAG tools)DataTypes.from_content()— removedfile://scheme supportBackward compatibility:
Set
CREWAI_TOOLS_ALLOW_UNSAFE_PATHS=trueto bypass all validation. Logs a warning when active.Tests
27 tests covering: path traversal, symlinks, absolute paths, file:// URLs, private IPs, cloud metadata, IPv6 localhost, unresolvable hosts, valid paths/URLs, and the escape hatch env var.
Breaking Changes
file://URLs no longer work in RAG tools (use file paths instead)All can be re-enabled with
CREWAI_TOOLS_ALLOW_UNSAFE_PATHS=true.Note
Medium Risk
Introduces stricter input validation for RAG tools (blocking
file://, private-network URLs, and paths outside the working directory by default), which can break existing workflows that relied on those inputs. Core runtime behavior changes are guarded by an escape-hatch env var and covered by new tests.Overview
Hardens RAG search tools against unsafe runtime inputs.
RagTool.add()now validates positional and keyword inputs as either safe file paths (must resolve under the current working directory) or safehttp/httpsURLs (blocksfile://and private/reserved IP ranges to mitigate SSRF), with an opt-out viaCREWAI_TOOLS_ALLOW_UNSAFE_PATHS=true.Adds a new
utilities/safe_path.pymodule plus focused test coverage, updatesDirectorySearchToolto validate directories before indexing, and tightensDataTypes.from_content()to stop treatingfile://as a URL type. Docs across locales are updated to describe the new security behavior and to markCodeInterpreterTool/allow_code_execution/code_execution_modeas deprecated/removed.Reviewed by Cursor Bugbot for commit a3d3ef0. Bugbot is set up for automated code reviews on this repo. Configure here.