Add optional object-level retention support to GcsToGcsOperator#66204
Add optional object-level retention support to GcsToGcsOperator#66204shahar1 merged 5 commits intoapache:mainfrom
Conversation
da55336 to
3cd4ce7
Compare
SameerMesiah97
left a comment
There was a problem hiding this comment.
Good PR. I only have 2 minor comments.
shahar1
left a comment
There was a problem hiding this comment.
LGTM, please address Sameer's comments
3cd4ce7 to
f9f461f
Compare
It looks good to me now but can you trigger CI? |
|
Thanks! I don't have permission to trigger CI. Could a maintainer approve the workflow run? |
I was asking @shahar1. |
Done :) |
|
I see a failing test but it looks unrelated. Can CI be triggered again to see it was a flake? |
|
The failing test is TestBaseDatabricksHook::test_a_get_federated_token_with_projected_volume. It's a flaky timestamp comparison off by one second (1752325201 vs 1752325200) in the Databricks provider. looks like a race between when the token expiry is computed vs. when it's asserted. It's unrelated to my change. I only touched the Google provider's GCS code. The Google provider tests I changed in this PR shows OK for Test: Providers[google] https://github.com/apache/airflow/actions/runs/25255184172/job/74055520507?pr=66204#step:12:4669 . @shahar1 |
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
closes: #60853
Reopens #64755 (the original PR was auto-closed because I accidentally deleted my fork — restored the branch and resubmitting with the same commits).
Add optional
retain_until_timeandretention_modeparameters toGCSHook.rewrite()andGcsToGcsOperator, allowing object-levelretention to be applied to destination objects immediately after
copy/move operations.
This eliminates the need for a two-step "copy → set retention" workflow,
which is error-prone in compliance-driven pipelines where partial-success
can leave objects unprotected.
Changes:
GCSHook.rewrite()— acceptsretain_until_timeandretention_mode,applies retention via
blob.patch()after the rewrite completesGCSToGCSOperator— accepts the same parameters, passes them through_copy_single_object()to the hook. Retention kwargs are only passedwhen
retain_until_timeis set (fully backward compatible)and no-retention cases
Testing:
mode and retain_until_time correctly applied and verified
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.