Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally raise an error if source file does not exist in GCSToGCSOperator #21388

Closed
2 tasks done
davidpr91 opened this issue Feb 7, 2022 · 3 comments · Fixed by #21391
Closed
2 tasks done

Optionally raise an error if source file does not exist in GCSToGCSOperator #21388

davidpr91 opened this issue Feb 7, 2022 · 3 comments · Fixed by #21391
Assignees
Labels
kind:feature Feature Requests

Comments

@davidpr91
Copy link
Contributor

davidpr91 commented Feb 7, 2022

Description

Right now when using GCSToGCSOperator to copy a file from one bucket to another, if the source file does not exist, nothing happens and the task is considered successful. This could be good for some use cases, for example, when you want to copy all the files from a directory or that match a specific pattern.
But for some other cases, like when you only want to copy one specific blob, it might be useful to raise an exception if the source file can't be found. Otherwise, the task would be failing silently.
My proposal is to add a new flag to GCSToGCSOperator to enable this feature. By default, for backward compatibility, the behavior would be the current one. But it would be possible to force the source file to be required and mark the task as failed if it doesn't exist.

Use case/motivation

Task would fail if the source file to copy does not exist, but only in the case you enable it.

Related issues

If you want to be sure that the source file exists and it will be copied on every execution, currently the operator does not allow you to make the task fail. If the status is successful but nothing is written in the destination, it would be failing silently.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@davidpr91 davidpr91 added the kind:feature Feature Requests label Feb 7, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Feb 7, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@potiuk
Copy link
Member

potiuk commented Feb 7, 2022

Assigned you !

@kazanzhy
Copy link
Contributor

kazanzhy commented Feb 7, 2022

Sounds interesting. It's pretty similar to pandas functions parameter - errors={'ignore', 'raise', 'coerce'}. But I see that there are a lot of different operators and sensors in Airflow that help you build complex pipelines with very simple blocks.

I also implemented similar logic to RDS Operators with optional errors in the first iteration but after I decided to keep Operators as straightforward and simple as possible without any logic under the hood.
In this case, GCSObjectExistenceSensor might be useful. If something goes wrong you see it in the UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature Feature Requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants