Skip to content

google: reject GCS blob names that escape the target directory on download#67509

Open
potiuk wants to merge 1 commit into
apache:mainfrom
potiuk:google-gcs-blob-path-traversal-containment
Open

google: reject GCS blob names that escape the target directory on download#67509
potiuk wants to merge 1 commit into
apache:mainfrom
potiuk:google-gcs-blob-path-traversal-containment

Conversation

@potiuk
Copy link
Copy Markdown
Member

@potiuk potiuk commented May 25, 2026

GCSHook.sync_to_local_dir and GCSTimeSpanFileTransformOperator._download joined GCS blob names into local paths without verifying the resolved path stayed within the intended directory. GCS allows object names containing .. segments, so a hostile blob name could cause files to be written outside local_dir / the operator's temp dir — a classic CWE-22 path-traversal sink.

The trust model matters: a DAG author's own bucket is trusted, but these operators are routinely pointed at buckets shared with external partners or other tenants, where the write side may not be fully trusted.

Reported as F-005 + F-006 in the apache/tooling-agents L3 providers/google sweep b1aec75.

Change

At both sites, resolve the destination path and assert is_relative_to the target root before any download. On violation, raise ValueError with a clear message instead of silently writing outside the target.

Sites touched:

Test plan

  • test_sync_to_local_dir_rejects_path_traversal (hook) — a ../escape.py blob raises ValueError and no file is created outside local_dir.
  • test_execute_rejects_path_traversal_in_blob_name (operator) — a ../escape.py blob raises ValueError and download_to_filename is never called.
  • prek run ruff clean on touched files.
  • Existing test_sync_to_local_dir_behaviour still passes (no behaviour change on safe blob names).

Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (Opus 4.7)

Generated-by: Claude Code (Opus 4.7) following the guidelines

…nload

``GCSHook.sync_to_local_dir`` and ``GCSTimeSpanFileTransformOperator._download``
joined GCS blob names into local paths without verifying the resolved
path stayed within the intended directory. GCS allows object names
containing ``..`` segments, so a hostile blob name could cause files
to be written outside ``local_dir`` / the operator's temp dir — a
classic CWE-22 path-traversal sink. The trust model matters: a DAG
author's own bucket is trusted, but operators are routinely pointed
at buckets shared with external partners or other tenants, where the
write side may not be fully trusted.

Resolve the destination and assert ``is_relative_to`` the target root
before any download. On violation, raise ``ValueError`` with a clear
message instead of silently writing outside the target.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant