-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix #22675
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
See the documentation (docstring) where you can have examples. As unitutitive as it is, source_object is a wildcard specification by default. If you want to copy single object you need specify it like that:
See examples here: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/_api/airflow/providers/google/cloud/transfers/gcs_to_gcs/index.html |
Unfortunately using source_objects instead of source_object doesn't help. |
I see I looked at the code and ideed. I marked it as good first issue and maybe someone woudl like to work on it. Note that the fastest and surest way to get it implemented is if you make a PR yourself and lead it to completion. Would you like to contribute such a change ? Happy to review the code. If not then ti will have to wait for someone to pick it up. |
Before anybody try to fix it, can we clarify the expected behavior? Let's limit the discussion only on objects without wild char. Which option is the one we should take? |
I think that can be discussed when PR is opened. I have no opinion. Maybe you can pick something as a proposal, and the reviewer reviewing the PR migh decide which one is ok. |
For sure it shoudl be backwards compatible ideally though. |
I think flag with "exact_match_when_no_wildcard" (default False) might be a good solution. |
@potiuk I'll be picking this up |
Assigned you! |
Apache Airflow Provider(s)
google
Versions of Apache Airflow Providers
No response
Apache Airflow version
2.2.4 (latest released)
Operating System
MacOS 12.2.1
Deployment
Composer
Deployment details
No response
What happened
I have file "hourse.jpeg" and "hourse.jpeg.copy" and a folder "hourse.jpeg.folder" in source bucket.
I use the following code to try to copy only "hourse.jpeg" to another bucket.
gcs_to_gcs_op = GCSToGCSOperator(
task_id="gcs_to_gcs",
source_bucket=my_source_bucket,
source_object="hourse.jpeg",
destination_bucket=my_destination_bucket
)
The result is the two files and one folder mentioned above are copied.
From the source code it seems there is no way to do what i want.
What you think should happen instead
Only the file specified should be copied, that means we should treat source_object as exact match instead of prefix.
To accomplish the current behavior as prefix, the user can/should use wild char
source_object="hourse.jpeg*"
How to reproduce
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: