-
Notifications
You must be signed in to change notification settings - Fork 16.4k
AIRFLOW-4809 | s3_delete_objects_operator should not fail on empty list of keys #5428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current implementation of s3_delete_objects_operator fails on empty list of keys. In case when list of keys to remove comes from dynamic task (like s3_list_operator via XCom) empty list of keys should be also considered valid. Currently it throws an exception from boto3 (malformed XML). The patch changes the behaviour so the operator exits immedietely if there are no keys to remove.
c33b841 to
ef3b278
Compare
|
One might argue that this is the expected behavior. Why shouldnt this be raised as PR in boto itself? BTW one might also ask if a list was given but there was nothing to delete should it fail ot success? This should be defined by the library (in this case boto) not by Airflow. What do you think? |
Codecov Report
@@ Coverage Diff @@
## master #5428 +/- ##
==========================================
+ Coverage 79.09% 79.09% +<.01%
==========================================
Files 485 485
Lines 30380 30382 +2
==========================================
+ Hits 24030 24032 +2
Misses 6350 6350
Continue to review full report at Codecov.
|
|
What we did in #4475 for a similar case was to add a parameter called |
|
@OmerJog I like your point of view! However, I checked boto3 documentation and it looks the list of objects is marked as REQUIRED (see: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.delete_objects) so I suspect the correct library usage is to "not call the According to your second question - in unix shell the result depends on the "force" flag, see: I just checked that the operator already works as in the "force" mode because deleting non-existing key results in task success. So currently it works in mixed-force mode: for deleting nonexisting key it works exactly as @ashb thank you for your suggestion, it makes 100% sense to add a flag here. Should it be named |
|
Force isn't quite right either - although that does stop In short: names are hard, I don't know. It doesn't have to match the name of the flag from the imap hook, no |
|
I see. OK, let me try with PR to boto3 to check their opinion. If they consider handling empty array an improvement then the only need is to bump boto3 version in airflow codebase. Otherwise, I'd suggest going with the current patch (so without parameters) to not complicate the simple case. This is anyway used when the operator input is fed dynamically. What do you think? @ashb @jomar83 |
| def execute(self, context): | ||
| s3_hook = S3Hook(aws_conn_id=self.aws_conn_id, verify=self.verify) | ||
|
|
||
| if isinstance(self.keys, list) and len(self.keys) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be simplified to:
| if isinstance(self.keys, list) and len(self.keys) == 0: | |
| if not self.keys: |
Additionally move it above the s3_hook = S3Hook() creation -- that connects to the AWS servers, and there's no need to do that if we're going to short-circuit.
|
After a few experiments I implemented this using ShortCircuitOperator by skipping delete operator if list of files is empty. It's more straightforward solution and doesn't require adding any extra params in the current operators. Therefore I'm closing this PR. Thanks for your review! |
Make sure you have checked all steps below.
Jira
Description
When s3_delete_objects_operator is used in a dynamic way (for example list of keys comes from s3_list_operator via XCom) there might be a case when the list of keys is empty. In my case it happens when chained operators are removing old files from S3 and there are no old files yet (because this is very first run of DAG).
In case of empty
keyshook raises an exception (via boto3):The provided patch modifies the operator behavior - if there is nothing to delete from S3 it just returns.
Tests
Commits
Documentation
Code Quality
flake8