Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix GCSToGCSOperator when copy files to folder without wildcard #32486

Merged
merged 1 commit into from
Jul 11, 2023

Conversation

moiseenkov
Copy link
Contributor

This PR fixes the following case.

The goal is to copy files with prefix source/foo.txt to the folder dest/ within a single GCS bucket.

  1. Create a GCS bucket and upload two files to source directory like this:
gs://my-bucket/source/foo.txt
gs://my-bucket/source/foo.txt.abc
gs://my-bucket/source/foo.txt/subfolder/file.txt
  1. Upload the following DAG to a Cloud Composer environment:
from airflow import DAG
from airflow.providers.google.cloud.transfers.gcs_to_gcs import GCSToGCSOperator
from datetime import datetime

with DAG(
    dag_id="gcs_to_gcs_fail_example",
    schedule_interval=None,
    catchup=False,
    start_date=datetime(2021,1,1)
) as dag:
    copy_file = GCSToGCSOperator(
        task_id="copy_file",
        source_bucket="my-bucket",
        source_object="source/foo.txt",
        destination_object="dest/",
    )
    copy_file
  1. Run the DAG

Expected bucket state:

gs://my-bucket/source/foo.txt
gs://my-bucket/source/foo.txt.abc
gs://my-bucket/source/foo.txt/subfolder/file.txt
gs://my-bucket/dest/foo.txt
gs://my-bucket/dest/foo.txt.abc
gs://my-bucket/dest/foo.txt/subfolder/file.txt

Actual (incorrect) bucket state:

gs://my-bucket/source/foo.txt
gs://my-bucket/source/foo.txt.abc
gs://my-bucket/source/foo.txt/subfolder/file.txt
gs://my-bucket/dest/source/foo.txt
gs://my-bucket/dest/source/foo.txt.abc
gs://my-bucket/dest/source/foo.txt/subfolder/file.txt

@moiseenkov moiseenkov force-pushed the gcs_to_gcs_bugfix2 branch 5 times, most recently from b54f9f9 to ace0da5 Compare July 11, 2023 07:16
@VladaZakharova
Copy link
Contributor

Hi @potiuk !
Could we please review these changes?

@potiuk potiuk merged commit 2ad91a7 into apache:main Jul 11, 2023
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants