Conversation
WalkthroughMoved symlink safety validation from archive-member inspection to a post-extraction filesystem check: relative Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dfetch/vcs/archive.py`:
- Around line 403-421: The check in _check_symlinks_in_dest uses os.getcwd() as
manifest_root which allows symlinks to point anywhere inside the overall
checkout; change it to validate against the extracted project root instead by
computing the canonical root from dest_dir (e.g. realpath/abspath of dest_dir)
and pass that as the manifest_root to check_no_path_traversal; update
_check_symlinks_in_dest (references: _check_symlinks_in_dest, dest_dir,
manifest_root, check_no_path_traversal) so every symlink found is validated
against the extracted project's real path rather than os.getcwd().
- Around line 389-400: The _is_unsafe_symlink_target function currently only
rejects absolute symlink targets but lets relative targets with '..' through,
which allows malicious TARs to escape during _extract_raw() via
TarFile.extractall(); update symlink validation to treat any target containing
'..' path components as unsafe for TAR extraction (i.e., have
_is_unsafe_symlink_target return True when pathlib.PurePosixPath(target).parts
includes '..' or the Windows PureWindowsPath has '..') so that _extract_raw()
rejects such members before calling TarFile.extractall(), and ensure
_check_symlinks_in_dest remains as a secondary check.
In `@features/fetch-archive.feature`:
- Around line 244-255: The scenario currently only checks that sub/dir/link.mk
exists but not that it is a symlink or that its target is preserved; update the
Gherkin assertions to explicitly assert that sub/dir/link.mk is a symlink and
that its target equals ../../other/target.mk (i.e., add a step that verifies the
file type is a symbolic link and that the link target is exactly
../../other/target.mk) so the test fails if the link was converted to a regular
file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7277c8a1-a596-45ec-8b7b-24a409d6bb2a
📒 Files selected for processing (5)
CHANGELOG.rstdfetch/vcs/archive.pyfeatures/fetch-archive.featurefeatures/steps/archive_steps.pytests/test_archive.py
| def _check_symlinks_in_dest(dest_dir: str) -> None: | ||
| """Raise *RuntimeError* if any symlink in *dest_dir* escapes the manifest root. | ||
|
|
||
| Walks *dest_dir* without following symlinks and calls | ||
| :func:`~dfetch.util.util.check_no_path_traversal` for every symlink | ||
| found. Because :func:`os.path.realpath` follows the link to its | ||
| target, this catches any symlink—including deeply nested ones—whose | ||
| resolved destination lies outside the current working directory | ||
| (the manifest root, as set by :func:`~dfetch.util.util.in_directory`). | ||
|
|
||
| Raises: | ||
| RuntimeError: When a symlink resolves to a path outside the manifest root. | ||
| """ | ||
| manifest_root = os.getcwd() | ||
| for root, dirs, files in os.walk(dest_dir, followlinks=False): | ||
| for name in dirs + files: | ||
| path = os.path.join(root, name) | ||
| if os.path.islink(path): | ||
| check_no_path_traversal(path, manifest_root) |
There was a problem hiding this comment.
Validate against the extracted project root, not os.getcwd().
Using the manifest root here widens the allowed boundary from “inside the fetched project” to “anywhere in the checkout”. A fetched archive can now ship links like ../dfetch.yaml or ../OtherProject/file and still pass this check, which is broader than the #1122 fix target.
Suggested fix
- manifest_root = os.getcwd()
+ project_root = os.path.realpath(dest_dir)
for root, dirs, files in os.walk(dest_dir, followlinks=False):
for name in dirs + files:
path = os.path.join(root, name)
if os.path.islink(path):
- check_no_path_traversal(path, manifest_root)
+ check_no_path_traversal(path, project_root)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dfetch/vcs/archive.py` around lines 403 - 421, The check in
_check_symlinks_in_dest uses os.getcwd() as manifest_root which allows symlinks
to point anywhere inside the overall checkout; change it to validate against the
extracted project root instead by computing the canonical root from dest_dir
(e.g. realpath/abspath of dest_dir) and pass that as the manifest_root to
check_no_path_traversal; update _check_symlinks_in_dest (references:
_check_symlinks_in_dest, dest_dir, manifest_root, check_no_path_traversal) so
every symlink found is validated against the extracted project's real path
rather than os.getcwd().
cf12e4d to
942ebf5
Compare
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
dfetch/vcs/archive.py (1)
410-428:⚠️ Potential issue | 🟠 MajorValidate resolved symlinks against the extracted project root.
Using
os.getcwd()widens the allowed boundary to the whole checkout, so a fetched archive can still ship links like../dfetch.yamland pass this check. The boundary here needs to be the canonicaldest_dir.Suggested fix
- manifest_root = os.getcwd() + project_root = os.path.realpath(dest_dir) for root, dirs, files in os.walk(dest_dir, followlinks=False): for name in dirs + files: path = os.path.join(root, name) if os.path.islink(path): - check_no_path_traversal(path, manifest_root) + check_no_path_traversal(path, project_root)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/vcs/archive.py` around lines 410 - 428, The code in _check_symlinks_in_dest uses os.getcwd() as the manifest_root which is too broad; change it to the canonical extracted project root by resolving dest_dir (e.g. set manifest_root = os.path.realpath(dest_dir) or os.path.abspath+realpath) before the os.walk so check_no_path_traversal is validating resolved symlink targets against the actual extracted dest_dir root rather than the current working directory; ensure dest_dir is canonicalized once at the top of _check_symlinks_in_dest and keep the existing os.path.islink and check_no_path_traversal calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dfetch/vcs/archive.py`:
- Around line 389-407: The helper _is_unsafe_symlink_target currently rejects
any target containing "..", causing _check_tar_member_type and
_check_zip_member_type to bail out early and preventing the later
safe-internal-symlink logic in _check_symlinks_in_dest from running; update
_is_unsafe_symlink_target (used by those functions) to only consider truly
absolute/anchored paths unsafe (keep posix.is_absolute() and bool(win.anchor))
and remove the ".." checks so relative targets with ".." are allowed through for
the post-extraction validation in _check_symlinks_in_dest.
---
Duplicate comments:
In `@dfetch/vcs/archive.py`:
- Around line 410-428: The code in _check_symlinks_in_dest uses os.getcwd() as
the manifest_root which is too broad; change it to the canonical extracted
project root by resolving dest_dir (e.g. set manifest_root =
os.path.realpath(dest_dir) or os.path.abspath+realpath) before the os.walk so
check_no_path_traversal is validating resolved symlink targets against the
actual extracted dest_dir root rather than the current working directory; ensure
dest_dir is canonicalized once at the top of _check_symlinks_in_dest and keep
the existing os.path.islink and check_no_path_traversal calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 55db1706-df42-4e2e-87f3-3e3c01a65d8a
📒 Files selected for processing (3)
dfetch/vcs/archive.pyfeatures/fetch-archive.featurefeatures/steps/generic_steps.py
a785364 to
29bf939
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
dfetch/vcs/archive.py (1)
415-415:⚠️ Potential issue | 🟠 MajorValidate against the extracted project root, not
os.getcwd().Using
os.getcwd()(manifest root) widens the allowed boundary from "inside the fetched project" to "anywhere in the checkout". A fetched archive can ship symlinks like../sibling_project/filethat resolve within the manifest root but escape the extracted project's directory. Per issue#1122, the objective is to allow symlinks "whose resolved target remains inside the archive/project", not the entire manifest checkout.Proposed fix
`@staticmethod` def _check_symlinks_in_dest(dest_dir: str) -> None: - """Raise *RuntimeError* if any symlink in *dest_dir* escapes the manifest root. + """Raise *RuntimeError* if any symlink in *dest_dir* escapes the project root. Walks *dest_dir* without following symlinks and calls :func:`~dfetch.util.util.check_no_path_traversal` for every symlink found. Because :func:`os.path.realpath` follows the link to its target, this catches any symlink—including deeply nested ones—whose - resolved destination lies outside the current working directory - (the manifest root, as set by :func:`~dfetch.util.util.in_directory`). + resolved destination lies outside the extracted project directory. Raises: - RuntimeError: When a symlink resolves to a path outside the manifest root. + RuntimeError: When a symlink resolves to a path outside the project root. """ - manifest_root = os.getcwd() + project_root = os.path.realpath(dest_dir) for root, dirs, files in os.walk(dest_dir, followlinks=False): for name in dirs + files: path = os.path.join(root, name) if os.path.islink(path): - check_no_path_traversal(path, manifest_root) + check_no_path_traversal(path, project_root)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/vcs/archive.py` at line 415, The code currently sets manifest_root = os.getcwd(), which permits symlink resolution to escape the extracted archive; change manifest_root to the actual extracted project root used when unpacking the archive (the variable that points to the archive's extraction directory, e.g., extracted_root or project_dir used earlier in archive.py), and use that real extracted_root (resolved with os.path.realpath) when validating symlink targets so comparisons ensure resolved targets stay inside the extracted project rather than the entire checkout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@dfetch/vcs/archive.py`:
- Line 415: The code currently sets manifest_root = os.getcwd(), which permits
symlink resolution to escape the extracted archive; change manifest_root to the
actual extracted project root used when unpacking the archive (the variable that
points to the archive's extraction directory, e.g., extracted_root or
project_dir used earlier in archive.py), and use that real extracted_root
(resolved with os.path.realpath) when validating symlink targets so comparisons
ensure resolved targets stay inside the extracted project rather than the entire
checkout.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d717795c-53f9-46c4-a4e1-59d417b4864d
📒 Files selected for processing (4)
dfetch/util/cmdline.pydfetch/vcs/archive.pyfeatures/fetch-archive.featurefeatures/steps/generic_steps.py
|
Original archive now works:
|
fixes #1122
Summary by CodeRabbit
Bug Fixes
Tests
Documentation
Reliability