Skip to content

Add optional object-level retention support to GcsToGcsOperator#66204

Merged
shahar1 merged 5 commits intoapache:mainfrom
chrisqiqiu:add-gcs-retention-support
May 5, 2026
Merged

Add optional object-level retention support to GcsToGcsOperator#66204
shahar1 merged 5 commits intoapache:mainfrom
chrisqiqiu:add-gcs-retention-support

Conversation

@chrisqiqiu
Copy link
Copy Markdown
Contributor

closes: #60853

Reopens #64755 (the original PR was auto-closed because I accidentally deleted my fork — restored the branch and resubmitting with the same commits).

Add optional retain_until_time and retention_mode parameters to
GCSHook.rewrite() and GcsToGcsOperator, allowing object-level
retention to be applied to destination objects immediately after
copy/move operations.

This eliminates the need for a two-step "copy → set retention" workflow,
which is error-prone in compliance-driven pipelines where partial-success
can leave objects unprotected.

Changes:

  • GCSHook.rewrite() — accepts retain_until_time and retention_mode,
    applies retention via blob.patch() after the rewrite completes
  • GCSToGCSOperator — accepts the same parameters, passes them through
    _copy_single_object() to the hook. Retention kwargs are only passed
    when retain_until_time is set (fully backward compatible)
  • Unit tests for both hook and operator covering retention, default mode,
    and no-retention cases

Testing:

  • All existing tests passed for providers/google/tests/unit/google/cloud/hooks/test_gcs.py and providers/google/tests/unit/google/cloud/transfers/test_gcs_to_gcs.py
image
  • Validated against real GCS with a retention-enabled bucket — retention
    mode and retain_until_time correctly applied and verified
image
Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@chrisqiqiu chrisqiqiu requested a review from shahar1 as a code owner May 1, 2026 14:24
@boring-cyborg boring-cyborg Bot added area:providers provider:google Google (including GCP) related issues labels May 1, 2026
@chrisqiqiu chrisqiqiu force-pushed the add-gcs-retention-support branch from da55336 to 3cd4ce7 Compare May 1, 2026 14:30
Copy link
Copy Markdown
Contributor

@SameerMesiah97 SameerMesiah97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good PR. I only have 2 minor comments.

Comment thread providers/google/src/airflow/providers/google/cloud/hooks/gcs.py Outdated
Comment thread providers/google/tests/unit/google/cloud/hooks/test_gcs.py Outdated
Copy link
Copy Markdown
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please address Sameer's comments

@chrisqiqiu chrisqiqiu force-pushed the add-gcs-retention-support branch from 3cd4ce7 to f9f461f Compare May 2, 2026 07:09
@SameerMesiah97
Copy link
Copy Markdown
Contributor

LGTM, please address Sameer's comments

It looks good to me now but can you trigger CI?

@chrisqiqiu
Copy link
Copy Markdown
Contributor Author

Thanks! I don't have permission to trigger CI. Could a maintainer approve the workflow run?

@SameerMesiah97
Copy link
Copy Markdown
Contributor

Thanks! I don't have permission to trigger CI. Could a maintainer approve the workflow run?

I was asking @shahar1.

@shahar1
Copy link
Copy Markdown
Contributor

shahar1 commented May 2, 2026

Thanks! I don't have permission to trigger CI. Could a maintainer approve the workflow run?

I was asking @shahar1.

Done :)

@SameerMesiah97
Copy link
Copy Markdown
Contributor

I see a failing test but it looks unrelated. Can CI be triggered again to see it was a flake?

@chrisqiqiu
Copy link
Copy Markdown
Contributor Author

The failing test is TestBaseDatabricksHook::test_a_get_federated_token_with_projected_volume. It's a flaky timestamp comparison off by one second (1752325201 vs 1752325200) in the Databricks provider. looks like a race between when the token expiry is computed vs. when it's asserted. It's unrelated to my change. I only touched the Google provider's GCS code. The Google provider tests I changed in this PR shows OK for Test: Providers[google] https://github.com/apache/airflow/actions/runs/25255184172/job/74055520507?pr=66204#step:12:4669 . @shahar1
Could you pls re-run the failed job ?

@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label May 5, 2026
@shahar1 shahar1 merged commit eaeec36 into apache:main May 5, 2026
177 of 178 checks passed
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented May 5, 2026

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add optional object-level retention support to GcsToGcsOperator (Google provider)

4 participants