google: reject GCS blob names that escape the target directory on download#67509
Open
potiuk wants to merge 1 commit into
Open
google: reject GCS blob names that escape the target directory on download#67509potiuk wants to merge 1 commit into
potiuk wants to merge 1 commit into
Conversation
…nload ``GCSHook.sync_to_local_dir`` and ``GCSTimeSpanFileTransformOperator._download`` joined GCS blob names into local paths without verifying the resolved path stayed within the intended directory. GCS allows object names containing ``..`` segments, so a hostile blob name could cause files to be written outside ``local_dir`` / the operator's temp dir — a classic CWE-22 path-traversal sink. The trust model matters: a DAG author's own bucket is trusted, but operators are routinely pointed at buckets shared with external partners or other tenants, where the write side may not be fully trusted. Resolve the destination and assert ``is_relative_to`` the target root before any download. On violation, raise ``ValueError`` with a clear message instead of silently writing outside the target.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
GCSHook.sync_to_local_dirandGCSTimeSpanFileTransformOperator._downloadjoined GCS blob names into local paths without verifying the resolved path stayed within the intended directory. GCS allows object names containing..segments, so a hostile blob name could cause files to be written outsidelocal_dir/ the operator's temp dir — a classic CWE-22 path-traversal sink.The trust model matters: a DAG author's own bucket is trusted, but these operators are routinely pointed at buckets shared with external partners or other tenants, where the write side may not be fully trusted.
Reported as F-005 + F-006 in the
apache/tooling-agentsL3 providers/google sweepb1aec75.Change
At both sites, resolve the destination path and assert
is_relative_tothe target root before any download. On violation, raiseValueErrorwith a clear message instead of silently writing outside the target.Sites touched:
hooks/gcs.pysync_to_local_dir— check before_sync_to_local_dir_if_changed.operators/gcs.pyGCSTimeSpanFileTransformOperator._download— check inside the per-blob download worker.Test plan
test_sync_to_local_dir_rejects_path_traversal(hook) — a../escape.pyblob raisesValueErrorand no file is created outsidelocal_dir.test_execute_rejects_path_traversal_in_blob_name(operator) — a../escape.pyblob raisesValueErroranddownload_to_filenameis never called.prek run ruffclean on touched files.test_sync_to_local_dir_behaviourstill passes (no behaviour change on safe blob names).Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (Opus 4.7) following the guidelines