Skip to content

Don't reject all .. symlinks#1127

Merged
ben-edna merged 2 commits intomainfrom
spoorcc/issue1122
Apr 13, 2026
Merged

Don't reject all .. symlinks#1127
ben-edna merged 2 commits intomainfrom
spoorcc/issue1122

Conversation

@spoorcc
Copy link
Copy Markdown
Contributor

@spoorcc spoorcc commented Apr 11, 2026

fixes #1122

Summary by CodeRabbit

  • Bug Fixes

    • Prevented archive extraction from escaping the project root: relative ".." symlinks that resolve inside the project are preserved; symlinks that would point outside are blocked.
  • Tests

    • Added and updated scenarios and assertions to verify symlink handling and extraction containment.
  • Documentation

    • Added a changelog entry documenting the symlink extraction fix.
  • Reliability

    • Improved command-line error output handling to avoid decode failures when reporting subprocess errors.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 11, 2026

Walkthrough

Moved symlink safety validation from archive-member inspection to a post-extraction filesystem check: relative .. symlink targets are allowed when they resolve inside the extracted project, while symlinks that resolve outside the manifest root raise a RuntimeError. Changelog entry added for the fix.

Changes

Cohort / File(s) Summary
Core symlink validation logic
dfetch/vcs/archive.py
Relaxed _is_unsafe_symlink_target() to focus on absolute/anchored targets; added _check_symlinks_in_dest(dest_dir) to walk the extracted tree (without following symlinks) and raise when a symlink resolves outside the manifest root; invoke this check immediately after extraction and before ignore-based pruning.
Feature tests & helpers
features/steps/archive_steps.py, features/fetch-archive.feature
Added add_symlink_to_tar_gz() to inject symlink members into .tar.gz archives; new Behave step and scenario that create an archive with an internal .. symlink and assert extraction preserves the symlink and does not escape the project root.
Generic test steps
features/steps/generic_steps.py
Added a @then step to assert a path is a symlink and its link target equals the expected value.
Unit tests
tests/test_archive.py
Adjusted expectations so member-level .. targets in ZIP/TAR members no longer raise during archive parsing; added tests for _check_symlinks_in_dest() covering internal relative symlink (no raise), escaping symlink (raises RuntimeError), and sibling-path symlink (no raise).
Changelog
CHANGELOG.rst
Added unreleased Release 0.14.0 entry noting fix for the "unsafe symlink target" error with relative .. symlinks (issue #1122).
Command-line utilities
dfetch/util/cmdline.py
Improved robustness of run_on_cmdline error handling by decoding stdout/stderr with errors="replace" to avoid UnicodeDecodeError when constructing error messages.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Don't reject all .. symlinks' directly summarizes the main change: allowing symlinks with .. components when their resolution stays within the project boundary.
Linked Issues check ✅ Passed The PR successfully addresses issue #1122 by changing symlink validation to distinguish unsafe external targets from safe internal relative symlinks, allowing symlinks whose resolved target remains inside the archive/project.
Out of Scope Changes check ✅ Passed All changes are scoped to the symlink validation fix [#1122]: archive extraction logic, symlink boundary checking, test coverage, feature scenarios, and subprocess error handling remain in scope.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spoorcc/issue1122

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dfetch/vcs/archive.py`:
- Around line 403-421: The check in _check_symlinks_in_dest uses os.getcwd() as
manifest_root which allows symlinks to point anywhere inside the overall
checkout; change it to validate against the extracted project root instead by
computing the canonical root from dest_dir (e.g. realpath/abspath of dest_dir)
and pass that as the manifest_root to check_no_path_traversal; update
_check_symlinks_in_dest (references: _check_symlinks_in_dest, dest_dir,
manifest_root, check_no_path_traversal) so every symlink found is validated
against the extracted project's real path rather than os.getcwd().
- Around line 389-400: The _is_unsafe_symlink_target function currently only
rejects absolute symlink targets but lets relative targets with '..' through,
which allows malicious TARs to escape during _extract_raw() via
TarFile.extractall(); update symlink validation to treat any target containing
'..' path components as unsafe for TAR extraction (i.e., have
_is_unsafe_symlink_target return True when pathlib.PurePosixPath(target).parts
includes '..' or the Windows PureWindowsPath has '..') so that _extract_raw()
rejects such members before calling TarFile.extractall(), and ensure
_check_symlinks_in_dest remains as a secondary check.

In `@features/fetch-archive.feature`:
- Around line 244-255: The scenario currently only checks that sub/dir/link.mk
exists but not that it is a symlink or that its target is preserved; update the
Gherkin assertions to explicitly assert that sub/dir/link.mk is a symlink and
that its target equals ../../other/target.mk (i.e., add a step that verifies the
file type is a symbolic link and that the link target is exactly
../../other/target.mk) so the test fails if the link was converted to a regular
file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7277c8a1-a596-45ec-8b7b-24a409d6bb2a

📥 Commits

Reviewing files that changed from the base of the PR and between 3ed97cc and cf12e4d.

📒 Files selected for processing (5)
  • CHANGELOG.rst
  • dfetch/vcs/archive.py
  • features/fetch-archive.feature
  • features/steps/archive_steps.py
  • tests/test_archive.py

Comment thread dfetch/vcs/archive.py
Comment thread dfetch/vcs/archive.py
Comment on lines +403 to +421
def _check_symlinks_in_dest(dest_dir: str) -> None:
"""Raise *RuntimeError* if any symlink in *dest_dir* escapes the manifest root.

Walks *dest_dir* without following symlinks and calls
:func:`~dfetch.util.util.check_no_path_traversal` for every symlink
found. Because :func:`os.path.realpath` follows the link to its
target, this catches any symlink—including deeply nested ones—whose
resolved destination lies outside the current working directory
(the manifest root, as set by :func:`~dfetch.util.util.in_directory`).

Raises:
RuntimeError: When a symlink resolves to a path outside the manifest root.
"""
manifest_root = os.getcwd()
for root, dirs, files in os.walk(dest_dir, followlinks=False):
for name in dirs + files:
path = os.path.join(root, name)
if os.path.islink(path):
check_no_path_traversal(path, manifest_root)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate against the extracted project root, not os.getcwd().

Using the manifest root here widens the allowed boundary from “inside the fetched project” to “anywhere in the checkout”. A fetched archive can now ship links like ../dfetch.yaml or ../OtherProject/file and still pass this check, which is broader than the #1122 fix target.

Suggested fix
-        manifest_root = os.getcwd()
+        project_root = os.path.realpath(dest_dir)
         for root, dirs, files in os.walk(dest_dir, followlinks=False):
             for name in dirs + files:
                 path = os.path.join(root, name)
                 if os.path.islink(path):
-                    check_no_path_traversal(path, manifest_root)
+                    check_no_path_traversal(path, project_root)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/archive.py` around lines 403 - 421, The check in
_check_symlinks_in_dest uses os.getcwd() as manifest_root which allows symlinks
to point anywhere inside the overall checkout; change it to validate against the
extracted project root instead by computing the canonical root from dest_dir
(e.g. realpath/abspath of dest_dir) and pass that as the manifest_root to
check_no_path_traversal; update _check_symlinks_in_dest (references:
_check_symlinks_in_dest, dest_dir, manifest_root, check_no_path_traversal) so
every symlink found is validated against the extracted project's real path
rather than os.getcwd().

Comment thread features/fetch-archive.feature
@spoorcc
Copy link
Copy Markdown
Contributor Author

spoorcc commented Apr 12, 2026

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 12, 2026

✅ Actions performed

Reviews resumed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
dfetch/vcs/archive.py (1)

410-428: ⚠️ Potential issue | 🟠 Major

Validate resolved symlinks against the extracted project root.

Using os.getcwd() widens the allowed boundary to the whole checkout, so a fetched archive can still ship links like ../dfetch.yaml and pass this check. The boundary here needs to be the canonical dest_dir.

Suggested fix
-        manifest_root = os.getcwd()
+        project_root = os.path.realpath(dest_dir)
         for root, dirs, files in os.walk(dest_dir, followlinks=False):
             for name in dirs + files:
                 path = os.path.join(root, name)
                 if os.path.islink(path):
-                    check_no_path_traversal(path, manifest_root)
+                    check_no_path_traversal(path, project_root)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/archive.py` around lines 410 - 428, The code in
_check_symlinks_in_dest uses os.getcwd() as the manifest_root which is too
broad; change it to the canonical extracted project root by resolving dest_dir
(e.g. set manifest_root = os.path.realpath(dest_dir) or
os.path.abspath+realpath) before the os.walk so check_no_path_traversal is
validating resolved symlink targets against the actual extracted dest_dir root
rather than the current working directory; ensure dest_dir is canonicalized once
at the top of _check_symlinks_in_dest and keep the existing os.path.islink and
check_no_path_traversal calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dfetch/vcs/archive.py`:
- Around line 389-407: The helper _is_unsafe_symlink_target currently rejects
any target containing "..", causing _check_tar_member_type and
_check_zip_member_type to bail out early and preventing the later
safe-internal-symlink logic in _check_symlinks_in_dest from running; update
_is_unsafe_symlink_target (used by those functions) to only consider truly
absolute/anchored paths unsafe (keep posix.is_absolute() and bool(win.anchor))
and remove the ".." checks so relative targets with ".." are allowed through for
the post-extraction validation in _check_symlinks_in_dest.

---

Duplicate comments:
In `@dfetch/vcs/archive.py`:
- Around line 410-428: The code in _check_symlinks_in_dest uses os.getcwd() as
the manifest_root which is too broad; change it to the canonical extracted
project root by resolving dest_dir (e.g. set manifest_root =
os.path.realpath(dest_dir) or os.path.abspath+realpath) before the os.walk so
check_no_path_traversal is validating resolved symlink targets against the
actual extracted dest_dir root rather than the current working directory; ensure
dest_dir is canonicalized once at the top of _check_symlinks_in_dest and keep
the existing os.path.islink and check_no_path_traversal calls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 55db1706-df42-4e2e-87f3-3e3c01a65d8a

📥 Commits

Reviewing files that changed from the base of the PR and between 942ebf5 and a785364.

📒 Files selected for processing (3)
  • dfetch/vcs/archive.py
  • features/fetch-archive.feature
  • features/steps/generic_steps.py

Comment thread dfetch/vcs/archive.py Outdated
@spoorcc
Copy link
Copy Markdown
Contributor Author

spoorcc commented Apr 12, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 12, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
dfetch/vcs/archive.py (1)

415-415: ⚠️ Potential issue | 🟠 Major

Validate against the extracted project root, not os.getcwd().

Using os.getcwd() (manifest root) widens the allowed boundary from "inside the fetched project" to "anywhere in the checkout". A fetched archive can ship symlinks like ../sibling_project/file that resolve within the manifest root but escape the extracted project's directory. Per issue #1122, the objective is to allow symlinks "whose resolved target remains inside the archive/project", not the entire manifest checkout.

Proposed fix
     `@staticmethod`
     def _check_symlinks_in_dest(dest_dir: str) -> None:
-        """Raise *RuntimeError* if any symlink in *dest_dir* escapes the manifest root.
+        """Raise *RuntimeError* if any symlink in *dest_dir* escapes the project root.

         Walks *dest_dir* without following symlinks and calls
         :func:`~dfetch.util.util.check_no_path_traversal` for every symlink
         found.  Because :func:`os.path.realpath` follows the link to its
         target, this catches any symlink—including deeply nested ones—whose
-        resolved destination lies outside the current working directory
-        (the manifest root, as set by :func:`~dfetch.util.util.in_directory`).
+        resolved destination lies outside the extracted project directory.

         Raises:
-            RuntimeError: When a symlink resolves to a path outside the manifest root.
+            RuntimeError: When a symlink resolves to a path outside the project root.
         """
-        manifest_root = os.getcwd()
+        project_root = os.path.realpath(dest_dir)
         for root, dirs, files in os.walk(dest_dir, followlinks=False):
             for name in dirs + files:
                 path = os.path.join(root, name)
                 if os.path.islink(path):
-                    check_no_path_traversal(path, manifest_root)
+                    check_no_path_traversal(path, project_root)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/archive.py` at line 415, The code currently sets manifest_root =
os.getcwd(), which permits symlink resolution to escape the extracted archive;
change manifest_root to the actual extracted project root used when unpacking
the archive (the variable that points to the archive's extraction directory,
e.g., extracted_root or project_dir used earlier in archive.py), and use that
real extracted_root (resolved with os.path.realpath) when validating symlink
targets so comparisons ensure resolved targets stay inside the extracted project
rather than the entire checkout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@dfetch/vcs/archive.py`:
- Line 415: The code currently sets manifest_root = os.getcwd(), which permits
symlink resolution to escape the extracted archive; change manifest_root to the
actual extracted project root used when unpacking the archive (the variable that
points to the archive's extraction directory, e.g., extracted_root or
project_dir used earlier in archive.py), and use that real extracted_root
(resolved with os.path.realpath) when validating symlink targets so comparisons
ensure resolved targets stay inside the extracted project rather than the entire
checkout.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d717795c-53f9-46c4-a4e1-59d417b4864d

📥 Commits

Reviewing files that changed from the base of the PR and between a785364 and 29bf939.

📒 Files selected for processing (4)
  • dfetch/util/cmdline.py
  • dfetch/vcs/archive.py
  • features/fetch-archive.feature
  • features/steps/generic_steps.py

@ben-edna
Copy link
Copy Markdown
Contributor

Original archive now works:

dfetch update TF-PSA-Crypto
Dfetch (0.13.0)
TF-PSA-Crypto:
Fetched sha256:a0b011b7f2c427cc8ee70116bb2d859543014534ae4d7020a69613aec10dc1b4

@ben-edna ben-edna merged commit 861b5e4 into main Apr 13, 2026
40 of 41 checks passed
@ben-edna ben-edna deleted the spoorcc/issue1122 branch April 13, 2026 06:26
@coderabbitai coderabbitai Bot mentioned this pull request Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Any symlink is now rejected

2 participants