Skip to content

fix: add path and URL validation to RAG tools#5310

Merged
joaomdmoura merged 13 commits into
mainfrom
fix/rag-tools-path-url-validation
Apr 7, 2026
Merged

fix: add path and URL validation to RAG tools#5310
joaomdmoura merged 13 commits into
mainfrom
fix/rag-tools-path-url-validation

Conversation

@alex-clawd
Copy link
Copy Markdown
Contributor

@alex-clawd alex-clawd commented Apr 7, 2026

Problem

RAG search tools (JSONSearchTool, CSVSearchTool, PDFSearchTool, etc.) accept file paths and URLs at runtime from the LLM with no validation. This allows:

  • File reads: LLM passes /etc/passwd or ../../secrets.env → tool reads it
  • SSRF: LLM passes http://169.254.169.254/ → tool fetches cloud metadata
  • file:// URLs: file:///etc/shadow was explicitly supported in data_types.py

Fix

New utility module: crewai_tools.utilities.safe_path

validate_file_path(path, base_dir)

  • Resolves symlinks and ../ traversal via os.path.realpath()
  • Ensures resolved path is within base_dir (defaults to cwd)
  • Rejects escape attempts

validate_url(url)

  • Blocks file:// scheme entirely
  • For http/https: resolves DNS, blocks private/reserved IPs:
    • 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8
    • 169.254.0.0/16 (cloud metadata), 0.0.0.0, ::1, fc00::/7

Applied in:

  • RagTool.add() — validates all paths/URLs before passing to adapter (catches all RAG tools)
  • DataTypes.from_content() — removed file:// scheme support

Backward compatibility:

Set CREWAI_TOOLS_ALLOW_UNSAFE_PATHS=true to bypass all validation. Logs a warning when active.

Tests

27 tests covering: path traversal, symlinks, absolute paths, file:// URLs, private IPs, cloud metadata, IPv6 localhost, unresolvable hosts, valid paths/URLs, and the escape hatch env var.

Breaking Changes

  • file:// URLs no longer work in RAG tools (use file paths instead)
  • File paths outside the working directory are blocked by default
  • URLs to private/internal networks are blocked by default

All can be re-enabled with CREWAI_TOOLS_ALLOW_UNSAFE_PATHS=true.


Note

Medium Risk
Introduces stricter input validation for RAG tools (blocking file://, private-network URLs, and paths outside the working directory by default), which can break existing workflows that relied on those inputs. Core runtime behavior changes are guarded by an escape-hatch env var and covered by new tests.

Overview
Hardens RAG search tools against unsafe runtime inputs. RagTool.add() now validates positional and keyword inputs as either safe file paths (must resolve under the current working directory) or safe http/https URLs (blocks file:// and private/reserved IP ranges to mitigate SSRF), with an opt-out via CREWAI_TOOLS_ALLOW_UNSAFE_PATHS=true.

Adds a new utilities/safe_path.py module plus focused test coverage, updates DirectorySearchTool to validate directories before indexing, and tightens DataTypes.from_content() to stop treating file:// as a URL type. Docs across locales are updated to describe the new security behavior and to mark CodeInterpreterTool / allow_code_execution / code_execution_mode as deprecated/removed.

Reviewed by Cursor Bugbot for commit a3d3ef0. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions github-actions Bot added the size/L label Apr 7, 2026
Comment thread lib/crewai-tools/tests/utilities/test_safe_path.py Fixed
Comment thread lib/crewai-tools/tests/utilities/test_safe_path.py Fixed
Comment thread lib/crewai-tools/src/crewai_tools/tools/rag/rag_tool.py Fixed
@alex-clawd alex-clawd force-pushed the fix/rag-tools-path-url-validation branch from ace35fb to fdc6439 Compare April 7, 2026 05:59
Comment thread lib/crewai-tools/src/crewai_tools/tools/rag/rag_tool.py Fixed
Comment thread lib/crewai-tools/src/crewai_tools/tools/rag/rag_tool.py Outdated
Comment thread lib/crewai-tools/src/crewai_tools/utilities/safe_path.py
@alex-clawd alex-clawd force-pushed the fix/rag-tools-path-url-validation branch from 4a07018 to 1f37345 Compare April 7, 2026 06:12
@alex-clawd alex-clawd force-pushed the fix/rag-tools-path-url-validation branch 2 times, most recently from 4214a8d to 7468dca Compare April 7, 2026 06:34
Comment thread lib/crewai-tools/src/crewai_tools/utilities/safe_path.py
@alex-clawd alex-clawd force-pushed the fix/rag-tools-path-url-validation branch 2 times, most recently from b0da5a7 to f41d86f Compare April 7, 2026 06:51
Comment thread lib/crewai-tools/src/crewai_tools/tools/rag/rag_tool.py
@alex-clawd alex-clawd force-pushed the fix/rag-tools-path-url-validation branch from fbe56bc to 4c8f289 Compare April 7, 2026 07:07
Comment thread lib/crewai-tools/src/crewai_tools/utilities/safe_path.py Outdated
@alex-clawd alex-clawd force-pushed the fix/rag-tools-path-url-validation branch from 02c8b42 to 774a3b8 Compare April 7, 2026 07:49
Comment thread lib/crewai-tools/src/crewai_tools/utilities/safe_path.py Fixed
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 892ee94. Configure here.

Comment thread lib/crewai-tools/src/crewai_tools/utilities/safe_path.py Outdated
Comment thread lib/crewai-tools/src/crewai_tools/utilities/safe_path.py Fixed
@alex-clawd alex-clawd force-pushed the fix/rag-tools-path-url-validation branch from 9d040fd to 4ca16d5 Compare April 7, 2026 15:38
alex-clawd and others added 9 commits April 7, 2026 09:16
Add validation utilities to prevent unauthorized file reads and SSRF
when RAG tools accept LLM-controlled paths/URLs at runtime.

Changes:
- New crewai_tools.utilities.safe_path module with validate_file_path(),
  validate_directory_path(), and validate_url()
- File paths validated against base directory (defaults to cwd).
  Resolves symlinks and ../ traversal. Rejects escape attempts.
- URLs validated: file:// blocked entirely. HTTP/HTTPS resolves DNS
  and blocks private/reserved IPs (10.x, 172.16-31.x, 192.168.x,
  127.x, 169.254.x, 0.0.0.0, ::1, fc00::/7).
- Validation applied in RagTool.add() — catches all RAG search tools
  (JSON, CSV, PDF, TXT, DOCX, MDX, Directory, etc.)
- Removed file:// scheme support from DataTypes.from_content()
- CREWAI_TOOLS_ALLOW_UNSAFE_PATHS=true env var for backward compat
- 27 tests covering traversal, symlinks, private IPs, cloud metadata,
  IPv6, escape hatch, and valid paths/URLs
The original patch validated positional *args but left all keyword
arguments (path=, file_path=, directory_path=, url=, website=,
github_url=, youtube_url=) unvalidated, providing a trivial bypass
for both path-traversal and SSRF checks.

Applies validate_file_path() to path/file_path/directory_path kwargs
and validate_url() to url/website/github_url/youtube_url kwargs before
they reach the adapter. Adds a regression-test file covering all eight
kwarg vectors plus the two existing positional-arg checks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace insecure tempfile.mktemp() with inline symlink target in test
- Remove unused 'target' variable and unused tempfile import
- Narrow broad except Exception: pass to only catch urlparse errors;
  validate_url ValueError now propagates instead of being silently swallowed
- Fix ruff B904 (raise-without-from-inside-except) in safe_path.py
- Fix ruff B007 (unused loop variable 'family') in safe_path.py
- Use validate_directory_path in DirectorySearchTool.add() so the
  public utility is exercised in production code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Cast sockaddr[0] to str() to satisfy mypy (socket.getaddrinfo returns
  sockaddr where [0] is str but typed as str | int)
- Remove now-unnecessary `type: ignore[assignment]` and
  `type: ignore[literal-required]` comments in rag_tool.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…NSAFE_PATHS

TemporaryDirectory creates files under /tmp/ which is outside CWD and is
correctly blocked by the new path validation.  These tests exercise
data-type handling, not security, so add an autouse fixture that sets
CREWAI_TOOLS_ALLOW_UNSAFE_PATHS=true for the whole file.  Path/URL
security is covered by test_rag_tool_path_validation.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…OLS_ALLOW_UNSAFE_PATHS

test_search_tools.py has tests for TXTSearchTool, CSVSearchTool,
MDXSearchTool, JSONSearchTool, and DirectorySearchTool that create
files under /tmp/ via tempfile, which is outside CWD and correctly
blocked by the new path validation.  rag_tool_test.py has one test
that calls tool.add() with a TemporaryDirectory path.

Add the same autouse allow_tmp_paths fixture used in
test_rag_tool_add_data_type.py.  Security is covered separately by
test_rag_tool_path_validation.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
alex-clawd and others added 3 commits April 7, 2026 09:16
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- safe_path._is_private_or_reserved: after unwrapping IPv4-mapped IPv6
  to IPv4, only check against IPv4 networks to avoid TypeError when
  comparing an IPv4Address against IPv6Network objects.
- safe_path.validate_file_path: handle filesystem-root base_dir ('/')
  by not appending os.sep when the base already ends with a separator,
  preventing the '//'-prefix bug.
- rag_tool.add: path-detection heuristic now checks for both '/' and
  os.sep so forward-slash paths are caught on Windows as well as Unix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@alex-clawd alex-clawd force-pushed the fix/rag-tools-path-url-validation branch from 4ca16d5 to 4e23226 Compare April 7, 2026 16:16
@joaomdmoura joaomdmoura merged commit 9325e2f into main Apr 7, 2026
52 checks passed
@joaomdmoura joaomdmoura deleted the fix/rag-tools-path-url-validation branch April 7, 2026 16:29
volkanozyildirim pushed a commit to volkanozyildirim/crew-ai that referenced this pull request Apr 15, 2026
* fix: add path and URL validation to RAG tools

Add validation utilities to prevent unauthorized file reads and SSRF
when RAG tools accept LLM-controlled paths/URLs at runtime.

Changes:
- New crewai_tools.utilities.safe_path module with validate_file_path(),
  validate_directory_path(), and validate_url()
- File paths validated against base directory (defaults to cwd).
  Resolves symlinks and ../ traversal. Rejects escape attempts.
- URLs validated: file:// blocked entirely. HTTP/HTTPS resolves DNS
  and blocks private/reserved IPs (10.x, 172.16-31.x, 192.168.x,
  127.x, 169.254.x, 0.0.0.0, ::1, fc00::/7).
- Validation applied in RagTool.add() — catches all RAG search tools
  (JSON, CSV, PDF, TXT, DOCX, MDX, Directory, etc.)
- Removed file:// scheme support from DataTypes.from_content()
- CREWAI_TOOLS_ALLOW_UNSAFE_PATHS=true env var for backward compat
- 27 tests covering traversal, symlinks, private IPs, cloud metadata,
  IPv6, escape hatch, and valid paths/URLs

* fix: validate path/URL keyword args in RagTool.add()

The original patch validated positional *args but left all keyword
arguments (path=, file_path=, directory_path=, url=, website=,
github_url=, youtube_url=) unvalidated, providing a trivial bypass
for both path-traversal and SSRF checks.

Applies validate_file_path() to path/file_path/directory_path kwargs
and validate_url() to url/website/github_url/youtube_url kwargs before
they reach the adapter. Adds a regression-test file covering all eight
kwarg vectors plus the two existing positional-arg checks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address CodeQL and review comments on RAG path/URL validation

- Replace insecure tempfile.mktemp() with inline symlink target in test
- Remove unused 'target' variable and unused tempfile import
- Narrow broad except Exception: pass to only catch urlparse errors;
  validate_url ValueError now propagates instead of being silently swallowed
- Fix ruff B904 (raise-without-from-inside-except) in safe_path.py
- Fix ruff B007 (unused loop variable 'family') in safe_path.py
- Use validate_directory_path in DirectorySearchTool.add() so the
  public utility is exercised in production code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: fix ruff format + remaining lint issues

* fix: resolve mypy type errors in RAG path/URL validation

- Cast sockaddr[0] to str() to satisfy mypy (socket.getaddrinfo returns
  sockaddr where [0] is str but typed as str | int)
- Remove now-unnecessary `type: ignore[assignment]` and
  `type: ignore[literal-required]` comments in rag_tool.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: unroll dynamic TypedDict key loops to satisfy mypy literal-required

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: allow tmp paths in RAG data-type tests via CREWAI_TOOLS_ALLOW_UNSAFE_PATHS

TemporaryDirectory creates files under /tmp/ which is outside CWD and is
correctly blocked by the new path validation.  These tests exercise
data-type handling, not security, so add an autouse fixture that sets
CREWAI_TOOLS_ALLOW_UNSAFE_PATHS=true for the whole file.  Path/URL
security is covered by test_rag_tool_path_validation.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: allow tmp paths in search-tool and rag_tool tests via CREWAI_TOOLS_ALLOW_UNSAFE_PATHS

test_search_tools.py has tests for TXTSearchTool, CSVSearchTool,
MDXSearchTool, JSONSearchTool, and DirectorySearchTool that create
files under /tmp/ via tempfile, which is outside CWD and correctly
blocked by the new path validation.  rag_tool_test.py has one test
that calls tool.add() with a TemporaryDirectory path.

Add the same autouse allow_tmp_paths fixture used in
test_rag_tool_add_data_type.py.  Security is covered separately by
test_rag_tool_path_validation.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: update tool specifications

* docs: document CodeInterpreterTool removal and RAG path/URL validation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address three review comments on path/URL validation

- safe_path._is_private_or_reserved: after unwrapping IPv4-mapped IPv6
  to IPv4, only check against IPv4 networks to avoid TypeError when
  comparing an IPv4Address against IPv6Network objects.
- safe_path.validate_file_path: handle filesystem-root base_dir ('/')
  by not appending os.sep when the base already ends with a separator,
  preventing the '//'-prefix bug.
- rag_tool.add: path-detection heuristic now checks for both '/' and
  os.sep so forward-slash paths are caught on Windows as well as Unix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: remove unused _BLOCKED_NETWORKS variable after IPv4/IPv6 split

* chore: update tool specifications

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants